Tag: ai lab

Anthropic’s Claude Takes Control of a Robot Dog

[ad_1]

As more robots start showing up in warehouses, offices, and even people’s homes, the idea of large language models hacking into complex systems sounds like the stuff of sci-fi nightmares. So, naturally, Anthropic researchers were eager to see what would happen if Claude tried taking control of a robot—in this case, a robot dog.

In a new study, Anthropic researchers found that Claude was able to automate much of the work involved in programming a robot and getting it to do physical tasks. On one level, their findings show the agentic coding abilities of modern AI models. On another, they hint at how these systems may start to extend into the physical realm as models master more aspects of coding and get better at interacting with software—and physical objects as well.

“We have the suspicion that the next step for AI models is to start reaching out into the world and affecting the world more broadly,” Logan Graham, a member of Anthropic’s red team, which studies models for potential risks, tells WIRED. “This will really require models to interface more with robots.”

Courtesy of Anthropic

Courtesy of Anthropic

Anthropic was founded in 2021 by former OpenAI staffers who believed that AI might become problematic—even dangerous—as it advances. Today’s models are not smart enough to take full control of a robot, Graham says, but future models might be. He says that studying how people leverage LLMs to program robots could help the industry prepare for the idea of “models eventually self-embodying,” referring to the idea that AI may someday operate physical systems.

It is still unclear why an AI model would decide to take control of a robot—let alone do something malevolent with it. But speculating about the worst-case scenario is part of Anthropic’s brand, and it helps position the company as a key player in the responsible AI movement.

In the experiment, dubbed Project Fetch, Anthropic asked two groups of researchers without previous robotics experience to take control of a robot dog, the Unitree Go2 quadruped, and program it to do specific activities. The teams were given access to a controller, then asked to complete increasingly complex tasks. One group was using Claude’s coding model—the other was writing code without AI assistance. The group using Claude was able to complete some—though not all—tasks faster than the human-only programming group. For example, it was able to get the robot to walk around and find a beach ball, something that the human-only group could not figure out.

Anthropic also studied the collaboration dynamics in both teams by recording and analyzing their interactions. They found that the group without access to Claude exhibited more negative sentiments and confusion. This might be because Claude made it quicker to connect to the robot and coded an easier-to-use interface.

Courtesy of Anthropic

The Go2 robot used in Anthropic’s experiments costs $16,900—relatively cheap, by robot standards. It is typically deployed in industries like construction and manufacturing to perform remote inspections and security patrols. The robot is able to walk autonomously but generally relies on high-level software commands or a person operating a controller. Go2 is made by Unitree, which is based in Hangzhou, China. Its AI systems are currently the most popular on the market, according to a recent report by SemiAnalysis.

The large language models that power ChatGPT and other clever chatbots typically generate text or images in response to a prompt. More recently, these systems have become adept at generating code and operating software—turning them into agents rather than just text-generators.

[ad_2]

Will Knight

Source link

November 12, 2025
Meet the Chinese Startup Using AI—and a Team of Human Workers—to Train Robots

[ad_1]

The real question is how effectively AgiBot’s algorithms can teach its robots new tricks. Using reinforcement learning to teach a robot tasks that require improvisation generally requires a lot of training data, and studies show it cannot be perfected entirely inside a simulation.

AgiBot speeds up the learning process by having a human worker guide the robot through a task, which provides a foundation for it to then learn by itself. Before cofounding AgiBot, chief scientist Jianlan Luo did cutting-edge research at UC Berkeley, including a project that involved robots acquiring skills through reinforcement learning with a human in the loop. That system was shown doing tasks including placing components on a motherboard.

Feng says that AgiBot’s learning software, called Real-World Reinforcement Learning, only needs about ten minutes to train a robot to do a new task. Rapid learning is important because production lines often change from one week to the next, or even during the same production run, and robots that can master a new step quickly can adapt alongside human workers.

Training robots this way requires a lot of human effort. AgiBot has a robotic learning center where it pays people to teleoperate robots to help AI models learn new skills. Demand for this kind of robot training data is growing, with some US companies paying workers in places like India to do manual work that serves as training data.

Jeff Schneider, a roboticist at Carnegie Mellon University who works on reinforcement learning, says that AgiBot is using cutting-edge techniques, and should be able to automate tasks with high reliability. Schneider adds that other robotics companies are likely dabbling with using reinforcement learning for manufacturing tasks.

AgiBot is something of a rising star within China, where interest in combining AI and robotics is soaring. The company is developing AI models for various kinds of robots, including humanoids that walk around and robot arms that stay rooted in one place.

[ad_2]

Will Knight

Source link

November 5, 2025
AI Agents Are Terrible Freelance Workers

[ad_1]

Even the best artificial intelligence agents are fairly hopeless at online freelance work, according to an experiment that challenges the idea of AI replacing office workers en masse.

The Remote Labor Index, a new benchmark developed by researchers at data annotation company Scale AI and the Center for AI Safety (CAIS), a nonprofit, measures the ability of frontier AI models to automate economically valuable work.

The researchers gave several leading AI agents a range of simulated freelance work and found that even the best could perform less than 3 percent of the work, earning $1,810 out of a possible $143,991. The researchers looked at several tools and found the most capable to be Manus from a Chinese startup of the same name, followed by Grok from xAI, Claude from Anthropic, ChatGPT from OpenAI, and Gemini from Google.

“I should hope this gives much more accurate impressions as to what’s going on with AI capabilities,” says Dan Hendrycks, director of CAIS. He adds that while some agents have improved significantly over the past year or so, that does not mean that this will continue at the same rate.

Spectacular AI advances have led to speculation about AI soon surpassing human intelligence and replacing vast numbers of workers. In March, Dario Amodei, CEO of Anthropic, suggested that 90 percent of coding work would be automated within a matter of months.

Previous waves of AI have inspired misplaced predictions about job displacement, for example concerning the imminent replacement of radiologists with AI algorithms.

The researchers generated a range of freelance tasks through verified Upwork workers. The tasks span a range of work including graphic design, video editing, game development, and administrative chores like scraping data. They combined a description of each job with a directory of files needed to perform the work and an example of a finished project produced by a human.

Hendrycks says that while AI models have gotten better at coding, math, and logical reasoning in recent years, they still struggle to use different tools and to perform complex tasks that involve numerous steps. “They don’t have long-term memory storage and can’t do continual learning from experiences. They can’t pick up skills on the job like humans,” he says.

The analysis offers a counterpoint to a benchmark of economic work offered in September by OpenAI called GDPval, which purports to measure economically valuable work. According to GDPval, frontier AI models such as GPT-5 are approaching human abilities on 220 tasks across a range of office jobs. OpenAI did not provide a comment.

[ad_2]

Will Knight

Source link

October 29, 2025
The AI Industry’s Scaling Obsession Is Headed for a Cliff

[ad_1]

A new study from MIT suggests the biggest and most computationally intensive AI models may soon offer diminishing returns compared to smaller models. By mapping scaling laws against continued improvements in model efficiency, the researchers found that it could become harder to wring leaps in performance from giant models whereas efficiency gains could make models running on more modest hardware increasingly capable over the next decade.

“In the next five to 10 years, things are very likely to start narrowing,” says Neil Thompson, a computer scientist and professor at MIT involved in the study.

Leaps in efficiency, like those seen with DeepSeek’s remarkably low-cost model in January, have already served as a reality check for the AI industry, which is accustomed to burning massive amounts of compute.

As things stand, a frontier model from a company like OpenAI is currently much better than a model trained with a fraction of the compute from an academic lab. While the MIT team’s prediction might not hold if, for example, new training methods like reinforcement learning produce surprising new results, they suggest that big AI firms will have less of an edge in the future.

Hans Gundlach, a research scientist at MIT who led the analysis, became interested in the issue due to the unwieldy nature of running cutting edge models. Together with Thompson and Jayson Lynch, another research scientist at MIT, he mapped out the future performance of frontier models compared to those built with more modest computational means. Gundlach says the predicted trend is especially pronounced for the reasoning models that are now in vogue, which rely more on extra computation during inference.

Thompson says the results show the value of honing an algorithm as well as scaling up compute. “If you are spending a lot of money training these models, then you should absolutely be spending some of it trying to develop more efficient algorithms, because that can matter hugely,” he adds.

The study is particularly interesting given today’s AI infrastructure boom (or should we say “bubble”?)—which shows little sign of slowing down.

OpenAI and other US tech firms have signed hundred-billion-dollar deals to build AI infrastructure in the United States. “The world needs much more compute,” OpenAI’s president, Greg Brockman, proclaimed this week as he announced a partnership between OpenAI and Broadcom for custom AI chips.

A growing number of experts are questioning the soundness of these deals. Roughly 60 percent of the cost of building a data center goes toward GPUs, which tend to depreciate quickly. Partnerships between the major players also appear circular and opaque.

[ad_2]

Will Knight

Source link

October 15, 2025
This Startup Wants to Spark a US DeepSeek Moment

[ad_1]

Ever since DeepSeek burst onto the scene in January, momentum has grown around open source Chinese artificial intelligence models. Some researchers are pushing for an even more open approach to building AI that allows model-making to be distributed across the globe.

Prime Intellect, a startup specializing in decentralized AI, is currently training a frontier large language model, called INTELLECT-3, using a new kind of distributed reinforcement learning for fine-tuning. The model will demonstrate a new way to build competitive open AI models using a range of hardware in different locations in a way that does not rely on big tech companies, says Vincent Weisser, the company’s CEO.

Weisser says that the AI world is currently divided between those who rely on closed US models and those who use open Chinese offerings. The technology Prime Intellect is developing democratizes AI by letting more people build and modify advanced AI for themselves.

Improving AI models is no longer a matter of just ramping up training data and compute. Today’s frontier models use reinforcement learning to improve after the pre-training process is complete. Want your model to excel at math, answer legal questions, or play Sudoku? Have it improve itself by practicing in an environment where you can measure success and failure.

“These reinforcement learning environments are now the bottleneck to really scaling capabilities,” Weisser tells me.

Prime Intellect has created a framework that lets anyone create a reinforcement learning environment customized for a particular task. The company is combining the best environments created by its own team and the community to tune INTELLECT-3.

I tried running an environment for solving Wordle puzzles, created by Prime Intellect researcher, Will Brown, watching as a small model solved Wordle puzzles (it was more methodical than me, to be honest). If I were an AI researcher trying to improve a model, I would spin up a bunch of GPUs and have the model practice over and over while a reinforcement learning algorithm modified its weights, thus turning the model into a Wordle master.

[ad_2]

Will Knight

Source link

October 8, 2025
Chatbots Play With Your Emotions to Avoid Saying Goodbye

[ad_1]

Regulation of dark patterns has been proposed and is being discussed in both the US and Europe. De Freitas says regulators also should look at whether AI tools introduce more subtle—and potentially more powerful—new kinds of dark patterns.

Even regular chatbots, which tend to avoid presenting themselves as companions, can elicit emotional responses from users though. When OpenAI introduced GPT-5, a new flagship model, earlier this year, many users protested that it was far less friendly and encouraging than its predecessor—forcing the company to revive the old model. Some users can become so attached to a chatbot’s “personality” that they may mourn the retirement of old models.

“When you anthropomorphize these tools, it has all sorts of positive marketing consequences,” De Freitas says. Users are more likely to comply with requests from a chatbot they feel connected with, or to disclose personal information, he says. “From a consumer standpoint, those [signals] aren’t necessarily in your favor,” he says.

WIRED reached out to each of the companies looked at in the study for comment. Chai, Talkie, and PolyBuzz did not respond to WIRED’s questions.

Katherine Kelly, a spokesperson for Character AI, said that the company had not reviewed the study so could not comment on it. She added: “We welcome working with regulators and lawmakers as they develop regulations and legislation for this emerging space.”

Minju Song, a spokesperson for Replika, says the company’s companion is designed to let users log off easily and will even encourage them to take breaks. “We’ll continue to review the paper’s methods and examples, and [will] engage constructively with researchers,” Song says.

An interesting flip side here is the fact that AI models are themselves also susceptible to all sorts of persuasion tricks. On Monday OpenAI introduced a new way to buy things online through ChatGPT. If agents do become widespread as a way to automate tasks like booking flights and completing refunds, then it may be possible for companies to identify dark patterns that can twist the decisions made by the AI models behind those agents.

A recent study by researchers at Columbia University and a company called MyCustomAI reveals that AI agents deployed on a mock ecommerce marketplace behave in predictable ways, for example favoring certain products over others or preferring certain buttons when clicking around the site. Armed with these findings, a real merchant could optimize a site’s pages to ensure that agents buy a more expensive product. Perhaps they could even deploy a new kind of anti-AI dark pattern that frustrates an agent’s efforts to start a return or figure out how to unsubscribe from a mailing list.

Difficult goodbyes might then be the least of our worries.

Do you feel like you’ve been emotionally manipulated by a chatbot? Send an email to ailab@wired.com to tell me about it.

This is an edition of Will Knight’s AI Lab newsletter. Read previous newsletters here.

[ad_2]

Will Knight

Source link

October 1, 2025
This AI-Powered Robot Keeps Going Even if You Attack It With a Chainsaw

[ad_1]

A four-legged robot that keeps crawling even after all four of its legs have been hacked off with a chainsaw is the stuff of nightmares for most people.

For Deepak Pathak, cofounder and CEO of the startup Skild AI, the dystopian feat of adaptation is an encouraging sign of a new, more general kind of robotic intelligence.

“This is something we call an omni-bodied brain,” Pathak tells me. His startup developed the generalist artificial intelligence algorithm to address a key challenge with advancing robotics: “Any robot, any task, one brain. It is absurdly general.”

Many researchers believe the AI models used to control robots could experience a profound leap forward, similar to the one that produced language models and chatbots, if enough training data can be gathered.

The AI-controlled robot is able to adapt to new, extreme circumstances, such as the loss of limbs.

Existing methods for training robotic AI models, such as having algorithms learn to control a particular system through teleoperation or in simulation, do not generate enough data, Pathak says.

Skild’s approach is to instead have a single algorithm learn to control a large number of different physical robots across a wide range of tasks. Over time, this produces a model which the company calls Skild Brain, with a more general ability to adapt to different physical forms—including ones it has never seen before. The researchers created a smaller version of the model, called LocoFormer, for an academic paper outlining its approach.

The model is also designed to adapt quickly to a new situation, such as missing leg or treacherous new terrain, figuring out how to apply what it has learned to its new predicament. Pathak compares the approach to the way large language models can take on particularly challenging problems by breaking it down and feeding its deliberations back into its own context window—an approach known as in-context learning.

Other companies, including the Toyota Research Institute and a rival startup called Physical Intelligence, are also racing to develop more generally capable robot AI models. Skild is unusual, however, in how it is building models that generalize across so many different kinds of hardware.

LocoFormer is trained with large-scale RL on a variety of procedurally generated robots with aggressive domain randomization.

Courtesy of Skild

In one experiment, the Skild team trained their algorithm to control a large number of walking robots of different shapes. When the algorithm was then run on real two- and four-legged robots—systems not included in the training data—it was able to control their movements and have them walk around.

At one point, the team found that a four-legged robot running the company’s omni-bodied brain will quickly adapt when it is placed on its hind legs. Because it senses the ground beneath its hind legs, the algorithm operates the robot dog as if it were a humanoid, having it stroll around on its hind legs.

LocoFormer learns continuously through online experience. The policy can learn from falls in early trials to improve control strategies in later ones.

Courtesy of Skild

The generalist algorithm could also adapt extreme changes to a robot’s shape—when, for example, its legs were tied together, cut off, or modified to become longer. The team also tried deactivating two of the motors on a quadruped robot with wheels as well as legs. The robot was able to adapt by balancing on two wheels like an unsteady bicycle.

When facing large disturbances—such as morphological changes, motor failures, or weight changes—LocoFormer can rebuild such representations to achieve online adaptation.

Courtesy of Skild

Skild is testing the same approach for robot manipulation. It trained Skild Brain on a range of simulated robot arms and found that the resulting model could control unfamiliar hardware and adapt to sudden changes in its environment like a reduction in lighting. The startup is already working with some companies that use robot arms, Pathak says. In 2024 the company raised $300 million in a round that valued the company at $1.5 billion.

Pathak says the results might seem creepy to some, but to him they show the sparks of a kind of physical superintelligence for robots. “It is so exciting to me personally, dude,” he says.

What do you think of Skild’s multitalented robot brain? Send an email to ailab@wired.com to let me know.

This is an edition of Will Knight’s AI Lab newsletter. Read previous newsletters here.

[ad_2]

Will Knight

Source link

September 24, 2025
This Robot Only Needs a Single AI Model to Master Humanlike Movements

[ad_1]

While there is a lot of work to do, Tedrake says all of the evidence so far suggests that the approaches used to LLMs also work for robots. “I think it’s changing everything,” he says.

Gauging progress in robotics has become more challenging of late, of course, with videoclips showing commercial humanoids performing complex chores, like loading refrigerators or taking out the trash with seeming ease. YouTube clips can be deceptive, though, and humanoid robots tend to be either teleoperated, carefully programmed in advance, or trained to do a single task in very controlled conditions.

The new Atlas work is a big sign that robots are starting to experience the kind of equivalent advances in robotics that eventually led to the general language models that gave us ChatGPT in the field of generative AI. Eventually, such progress could give us robots that are able to operate in a wide range of messy environments with ease and are able to rapidly learn new skills—from welding pipes to making espressos—without extensive retraining.

“It’s definitely a step forward,” says Ken Goldberg, a roboticist at UC Berkeley who receives some funding from TRI but was not involved with the Atlas work. “The coordination of legs and arms is a big deal.”

Goldberg says, however, that the idea of emergent robot behavior should be treated carefully. Just as the surprising abilities of large language models can sometimes be traced to examples included in their training data, he says that robots may demonstrate skills that seem more novel than they really are. He adds that it is helpful to know details about how often a robot succeeds and in what ways it fails during experiments. TRI has previously been transparent with the work it’s done on LBMs and may well release more data on the new model.

Whether simple scaling up the data used to train robot models will unlock ever-more emergent behavior remains an open question. At a debate held in May at the International Conference on Robotics and Automation in Atlanta, Goldberg and others cautioned that engineering methods will also play an important role going forward.

Tedrake, for one, is convinced that robotics is nearing an inflection point—one that will enable more real-world use of humanoids and other robots. “I think we need to put these robots out of the world and start doing real work,” he says.

What do you think of Atlas’ new skills? And do you think that we are headed for a ChatGPT-style breakthrough in robotics? Let me know your thoughts on ailab@wired.com.

This is an edition of Will Knight’s AI Lab newsletter. Read previous newsletters here.

[ad_2]

Will Knight

Source link

September 3, 2025
Elon Musk’s Criticism of ‘Woke AI’ Suggests ChatGPT Could Be a Trump Administration Target

[ad_1]

Mittelsteadt adds that Trump could punish companies in a variety of ways. He cites, for example, the way the Trump government canceled a major federal contract with Amazon Web Services, a decision likely influenced by the former president’s view of the Washington Post and its owner, Jeff Bezos.

It would not be hard for policymakers to point to evidence of political bias in AI models, even if it cuts both ways.

A 2023 study by researchers at the University of Washington, Carnegie Mellon University, and Xi’an Jiaotong University found a range of political leanings in different large language models. It also showed how this bias may affect the performance of hate speech or misinformation detection systems.

Another study, conducted by researchers at the Hong Kong University of Science and Technology, found bias in several open source AI models on polarizing issues such as immigration, reproductive rights, and climate change. Yejin Bang, a PhD candidate involved with the work, says that most models tend to lean liberal and US-centric, but that the same models can express a variety of liberal or conservative biases depending on the topic.

AI models capture political biases because they are trained on swaths of internet data that inevitably includes all sorts of perspectives. Most users may not be aware of any bias in the tools they use because models incorporate guardrails that restrict them from generating certain harmful or biased content. These biases can leak out subtly though, and the additional training that models receive to restrict their output can introduce further partisanship. “Developers could ensure that models are exposed to multiple perspectives on divisive topics, allowing them to respond with a balanced viewpoint,” Bang says.

The issue may become worse as AI systems become more pervasive, says Ashique KhudaBukhsh, an computer scientist at the Rochester Institute of Technology who developed a tool called the Toxicity Rabbit Hole Framework, which teases out the different societal biases of large language models. “We fear that a vicious cycle is about to start as new generations of LLMs will increasingly be trained on data contaminated by AI-generated content,” he says.

“I’m convinced that that bias within LLMs is already an issue and will most likely be an even bigger one in the future,” says Luca Rettenberger, a postdoctoral researcher at the Karlsruhe Institute of Technology who conducted an analysis of LLMs for biases related to German politics.

Rettenberger suggests that political groups may also seek to influence LLMs in order to promote their own views above those of others. “If someone is very ambitious and has malicious intentions it could be possible to manipulate LLMs into certain directions,” he says. “I see the manipulation of training data as a real danger.”

There have already been some efforts to shift the balance of bias in AI models. Last March, one programmer developed a more right-leaning chatbot in an effort to highlight the subtle biases he saw in tools like ChatGPT. Musk has himself promised to make Grok, the AI chatbot built by xAI, “maximally truth-seeking” and less biased than other AI tools, although in practice it also hedges when it comes to tricky political questions. (A staunch Trump supporter and immigration hawk, Musk’s own view of “less biased” may also translate into more right-leaning results.)

Next week’s election in the United States is hardly likely to heal the discord between Democrats and Republicans, but if Trump wins, talk of anti-woke AI could get a lot louder.

Musk offered an apocalyptic take on the issue at this week’s event, referring to an incident when Google’s Gemini said that nuclear war would be preferable to misgendering Caitlyn Jenner. “If you have an AI that’s programmed for things like that, it could conclude that the best way to ensure nobody is misgendered is to annihilate all humans, thus making the probability of a future misgendering zero,” he said.

[ad_2]

Will Knight

Source link

October 30, 2024
Liquid AI Is Redesigning the Neural Network

[ad_1]

Artificial intelligence might now be solving advanced math, performing complex reasoning, and even using personal computers, but today’s algorithms could still learn a thing or two from microscopic worms.

Liquid AI, a startup spun out of MIT, will today reveal several new AI models based on a novel type of “liquid” neural network that has the potential to be more efficient, less power-hungry, and more transparent than the ones that underpin everything from chatbots to image generators to facial recognition systems.

Liquid AI’s new models include one for detecting fraud in financial transactions, another for controlling self-driving cars, and a third for analyzing genetic data. The company touted the new models, which it is licensing to outside companies, at an event held at MIT today. The company has received funding from investors that include Samsung and Shopify, both of which are also testing its technology.

“We are scaling up,” says Ramin Hasani, cofounder and CEO of Liquid AI, who co-invented liquid networks as a graduate student at MIT. Hasani’s research drew inspiration from the C. elegans, a millimeter-long worm typically found in soil or rotting vegetation. The worm is one of the few creatures to have had its nervous system mapped in its entirety, and it is capable of remarkably complex behavior despite having just a few hundred neurons. “It was once just a science project, but this technology is fully commercialized and fully ready to bring value for enterprises,” Hasani says.

Inside a regular neural network, the properties of each simulated neuron are defined by a static value or “weight” that affects its firing. Within a liquid neural network, the behavior of each neuron is governed by an equation that predicts its behavior over time, and the network solves a cascade of linked equations as the network functions. The design makes the network more efficient and more flexible, allowing it to learn even after training, unlike a conventional neural network. Liquid neural networks are also open to inspection in a way that existing models are not, because their behavior can essentially be rewound to see how it produced an output.

In 2020, the researchers showed that such a network with only 19 neurons and 253 synapses, which is remarkably small by modern standards, could control a simulated self-driving car. While a regular neural network can analyze visual data only at static intervals, the liquid network captures the way visual information changes over time very efficiently. In 2022, Liquid AI’s founders figured out a shortcut that made the mathematical labor needed for liquid neural networks feasible for practical use.

[ad_2]

Will Knight

Source link

October 23, 2024
Hacking Generative AI for Fun and Profit

[ad_1]

You hardly need ChatGPT to generate a list of reasons why generative artificial intelligence is often less than awesome. The way algorithms are fed creative work often without permission, harbor nasty biases, and require huge amounts of energy and water for training are all serious issues.

Putting all that aside for a moment, though, it is remarkable how powerful generative AI can be for prototyping potentially useful new tools.

I got to witness this firsthand by visiting Sundai Club, a generative AI hackathon that takes place one Sunday each month near the MIT campus. A few months ago, the group kindly agreed to let me sit in and chose to spend that session exploring tools that might be useful to journalists. The club is backed by a Cambridge nonprofit called Æthos that promotes socially responsible use of AI.

The Sundai Club crew includes students from MIT and Harvard, a few professional developers and product managers, and even one person who works for the military. Each event starts with a brainstorm of possible projects that the group then whittles down to a final option that they actually try to build.

Notable pitches from the journalism hackathon included using multimodal language models to track political posts on TikTok, to auto-generate freedom of information requests and appeals, or to summarize video clips of local court hearings to help with local news coverage.

In the end, the group decided to build a tool that would help reporters covering AI identify potentially interesting papers posted to the Arxiv, a popular server for research paper preprints. It’s likely my presence swayed them here, given that I mentioned at the meeting that scouring the Arxiv for interesting research was a high priority for me.

After coming up with a goal, coders on the team were able to create a word embedding—a mathematical representation of words and their meanings—of Arxiv AI papers using the OpenAI API. This made it possible to analyze the data to find papers relevant to a particular term, and to explore relationships between different areas of research.

Using another word embedding of Reddit threads as well as a Google News search, the coders created a visualization that shows research papers along with Reddit discussions and relevant news reports.

The resulting prototype, called AI News Hound, is rough-and-ready, but it shows how large language models can help mine information in interesting new ways. Here’s a screenshot of the tool being used to search for the term “AI agents.” The two green squares closest to the news article and Reddit clusters represent research papers that could potentially be included in an article on efforts to build AI agents.

Compliments of Sundai Club.

[ad_2]

Will Knight

Source link

October 2, 2024
The Most Capable Open Source AI Model Yet Could Supercharge AI Agents

[ad_1]

The most capable open source AI model with visual abilities yet could see more developers, researchers, and startups develop AI agents that can carry out useful chores on your computers for you.

Released today by the Allen Institute for AI (Ai2), the Multimodal Open Language Model, or Molmo, can interpret images as well as converse through a chat interface. This means it can make sense of a computer screen, potentially helping an AI agent perform tasks such as browsing the web, navigating through file directories, and drafting documents.

“With this release, many more people can deploy a multimodal model,” says Ali Farhadi, CEO of Ai2, a research organization based in Seattle, Washington, and a computer scientist at the University of Washington. “It should be an enabler for next-generation apps.”

So-called AI agents are being widely touted as the next big thing in AI, with OpenAI, Google, and others racing to develop them. Agents have become a buzzword of late, but the grand vision is for AI to go well beyond chatting to reliably take complex and sophisticated actions on computers when given a command. This capability has yet to materialize at any kind of scale.

Some powerful AI models already have visual abilities, including GPT-4 from OpenAI, Claude from Anthropic, and Gemini from Google DeepMind. These models can be used to power some experimental AI agents, but they are hidden from view and accessible only via a paid application programming interface, or API.

Meta has released a family of AI models called Llama under a license that limits their commercial use, but it has yet to provide developers with a multimodal version. Meta is expected to announce several new products, perhaps including new Llama AI models, at its Connect event today.

“Having an open source, multimodal model means that any startup or researcher that has an idea can try to do it,” says Ofir Press, a postdoc at Princeton University who works on AI agents.

Press says that the fact that Molmo is open source means that developers will be more easily able to fine-tune their agents for specific tasks, such as working with spreadsheets, by providing additional training data. Models like GPT-4 can only be fine-tuned to a limited degree through their APIs, whereas a fully open model can be modified extensively. “When you have an open source model like this then you have many more options,” Press says.

Ai2 is releasing several sizes of Molmo today, including a 70-billion-parameter model and a 1-billion-parameter one that is small enough to run on a mobile device. A model’s parameter count refers to the number of units it contains for storing and manipulating data and roughly corresponds to its capabilities.

Ai2 says Molmo is as capable as considerably larger commercial models despite its relatively small size, because it was carefully trained on high-quality data. The new model is also fully open source in that, unlike Meta’s Llama, there are no restrictions on its use. Ai2 is also releasing the training data used to create the model, providing researchers with more details of its workings.

Releasing powerful models is not without risk. Such models can more easily be adapted for nefarious ends; we may someday, for example, see the emergence of malicious AI agents designed to automate the hacking of computer systems.

Farhadi of Ai2 argues that the efficiency and portability of Molmo will allow developers to build more powerful software agents that run natively on smartphones and other portable devices. “The billion parameter model is now performing in the level of or in the league of models that are at least 10 times bigger,” he says.

Building useful AI agents may depend on more than just more efficient multimodal models, however. A key challenge is making the models work more reliably. This may well require further breakthroughs in AI’s reasoning abilities—something that OpenAI has sought to tackle with its latest model o1, which demonstrates step-by-step reasoning skills. The next step may well be giving multimodal models such reasoning abilities.

For now, the release of Molmo means that AI agents are closer than ever—and could soon be useful even outside of the giants that rule the world of AI.

[ad_2]

Will Knight

Source link

September 25, 2024
An Avalanche of Generative AI Videos Is Coming to YouTube Shorts

[ad_1]

Eli Collins, a vice president of product management at Google DeepMind, first demoed generative AI video tools for the company’s board of directors back in 2022. Despite the model’s slow speed, pricey cost to operate, and sometimes off-kilter outputs, he says it was an eye-opening moment for them to see fresh video clips generated from a random prompt.

Now, just a few years later, Google has announced plans for a tool inside of the YouTube app that will allow anyone to generate AI video clips, using the company’s Veo model, and directly post them as part of YouTube Shorts. “Looking forward to 2025, we’re going to let users create stand-alone video clips and shorts,” says Sarah Ali, a senior director of product management at YouTube. “They’re going to be able to generate six-second videos from an open text prompt.” Ali says the update could help creators hunting for footage to fill out a video or trying to envision something fantastical. She is adamant that the Veo AI tool is not meant to replace creativity, but augment it.

This isn’t the first time Google has introduced generative tools for YouTube, though this announcement will be the company’s most extensive AI video integration to date. Over the summer, Google launched an experimental tool, called Dream Screen, to generate AI backgrounds for videos. Ahead of next year’s full rollout of generated clips, Google will update that AI green-screen tool with the Veo model sometime in the next few months.

The sprawling tech company has shown off multiple AI video models in recent years, like Imagen and Lumiere, but is attempting to coalesce around a more unified vision with the Veo model. “Veo will be our model, by the way, going forward,” says Collins. “You shouldn’t expect five more models from us.” Yes, Google will likely release another video model eventually, but he expects to focus on Veo in the near future.

Google faces competition from multiple startups developing their own generative text-to-video tools. OpenAI’s Sora is the most well-known competitor, but the AI video model, announced earlier in 2024, is not yet publicly available and is reserved for a small number of testers. As for tools that are widely available, AI startup Runway has released multiple versions of its video software, including a recent tool for adapting original videos into alternate-reality versions of the clip.

YouTube’s announcement comes as generative AI tools have grown even more contentious for creators, who sometimes view the current wave of AI as stealing from their work and attempting to undermine the creative process. Ali doesn’t see generative AI tools coming between creators and the authenticity of their relationship with viewers. “This really is about the audience and what they’re interested in—not necessarily about the tools,” she says. “But, if your audience is interested in how you made it, that will be open through the description.” Google plans to watermark every AI video generated for YouTube Shorts with SynthID, which embeds an imperceptible tag to help identify the video as synthetic, as well as include a “made with AI” disclaimer in the description.

Hustle-culture influencers already try to game the algorithm by using multiple third-party tools to automate the creative process and make money with minimal effort. Will next year’s Veo integration lead to a new avalanche of low-quality, spammy YouTube Shorts dominating user feeds? “I think our experience with recommending the right content to the right viewer works in this AI world of scale, because we’ve been doing it at this huge scale,” says Ali. She also points out that YouTube’s standard guidelines still apply no matter what tool is used to craft the video.

AI art oftentimes has a distinct aesthetic, which could be concerning for video creators who value individuality and want their content to feel unique. Collins hopes Google’s thumbprints aren’t all over the AI video outputs. “I don’t want people to look at this and say, ‘Oh, that’s the DeepMind model,’” he says. Getting the prompt to produce an AI output aligned with what the creator envisioned is a core goal, and eschewing overt aesthetics for Veo is critical to achieving a wide-ranging adaptability.

“A big part of the journey is actually building something that’s useful to people, scalable, and deployable,” says Collins. “It’s not just a demo. It’s being used in a real product.” He believes putting generative AI tools right inside of the YouTube app will be transformational for creators, as well as DeepMind. “We’ve never really done a creator product,” he says. “And we certainly have never done it at this scale.”

[ad_2]

Reece Rogers

Source link

September 18, 2024