Tag: AI research

Databricks co-founder argues US must go open source to beat China in AI | TechCrunch

[ad_1]

Andy Konwinski is concerned that the U.S. is losing its dominance in AI research to China, calling the shift an “existential” threat to democracy. Konwinski is a Databricks co-founder and the co-founder of the AI research and venture capital firm Laude.

“If you talk to PhD students at Berkeley and Stanford in AI right now, they’ll tell you that they’ve read twice as many interesting AI ideas in the last year that were from Chinese companies than American companies,” Konwinski said onstage at the Cerebral Valley AI Summit this week.

In addition to investing through Laude, the venture fund he launched last year with NEA veteran Pete Sonsini and Antimatter CEO Andrew Krioukov, Konwinski also runs the Laude Institute, an accelerator that offers grants to researchers.

Major AI labs, including OpenAI, Meta, and Anthropic, continue to innovate significantly, yet their innovations remain largely proprietary rather than open source. Moreover, these companies are sucking up top academic talent by offering multimillion-dollar salaries that dwarf what these experts can earn in universities.

Konwinski argued that for ideas to truly flourish, they need to be freely exchanged and discussed with the larger academic community. He pointed out that generative AI emerged as a direct result of the Transformer architecture, a pivotal training technique introduced in a freely available research paper.

“The first nation that makes the next ‘Transformer architectural level’ breakthrough will have the advantage,” Konwinski said.

Konwinski argues that in China, the government supports and encourages AI innovation, whether from labs like DeepSeek or Alibaba’s Qwen, to be open sourced, which allows others to build upon them and which, he contends, will inevitably lead to more breakthroughs.

Techcrunch event

San Francisco
|
October 13-15, 2026

He believes this stands in stark contrast to the U.S., where, as he puts it, “the diffusion of scientists talking to scientists that we always have had in the United States, it’s dried up.”

Konwinski argues that this trend poses not only a risk to democracy but also a business threat to major U.S. AI labs. “We’re eating our corn seeds; the fountain is drying up. Fast-forward five years, the big labs are gonna lose too,” he said. “We need to make sure the United States stays number one and open.”

[ad_2]

Marina Temkin

Source link

November 14, 2025
Silicon Valley bets big on ‘environments’ to train AI agents | TechCrunch

[ad_1]

For years, Big Tech CEOs have touted visions of AI agents that can autonomously use software applications to complete tasks for people. But take today’s consumer AI agents out for a spin, whether it’s OpenAI’s ChatGPT Agent or Perplexity’s Comet, and you’ll quickly realize how limited the technology still is. Making AI agents more robust may take a new set of techniques that the industry is still discovering.

One of those techniques is carefully simulating workspaces where agents can be trained on multi-step tasks — known as reinforcement learning (RL) environments. Similarly to how labeled datasets powered the last wave of AI, RL environments are starting to look like a critical element in the development of agents.

AI researchers, founders, and investors tell TechCrunch that leading AI labs are now demanding more RL environments, and there’s no shortage of startups hoping to supply them.

“All the big AI labs are building RL environments in-house,” said Jennifer Li, general partner at Andreessen Horowitz, in an interview with TechCrunch. “But as you can imagine, creating these datasets is very complex, so AI labs are also looking at third party vendors that can create high quality environments and evaluations. Everyone is looking at this space.”

The push for RL environments has minted a new class of well-funded startups, such as Mechanize and Prime Intellect, that aim to lead the space. Meanwhile, large data-labeling companies like Mercor and Surge say they’re investing more in RL environments to keep pace with the industry’s shifts from static datasets to interactive simulations. The major labs are considering investing heavily too: according to The Information, leaders at Anthropic have discussed spending more than $1 billion on RL environments over the next year.

The hope for investors and founders is that one of these startups emerge as the “Scale AI for environments,” referring to the $29 billion data labelling powerhouse that powered the chatbot era.

The question is whether RL environments will truly push the frontier of AI progress.

Techcrunch event

San Francisco
|
October 27-29, 2025

What is an RL environment?

At their core, RL environments are training grounds that simulate what an AI agent would be doing in a real software application. One founder described building them in recent interview “like creating a very boring video game.”

For example, an environment could simulate a Chrome browser and task an AI agent with purchasing a pair of socks on Amazon. The agent is graded on its performance and sent a reward signal when it succeeds (in this case, buying a worthy pair of socks).

While such a task sounds relatively simple, there are a lot of places where an AI agent could get tripped up. It might get lost navigating the web page’s drop down menus, or buy too many socks. And because developers can’t predict exactly what wrong turn an agent will take, the environment itself has to be robust enough to capture any unexpected behavior, and still deliver useful feedback. That makes building environments far more complex than a static dataset.

Some environments are quite elaborate, allowing for AI agents to use tools, access the internet, or use various software applications to complete a given task. Others are more narrow, aimed at helping an agent learn specific tasks in enterprise software applications.

While RL environments are the hot thing in Silicon Valley right now, there’s a lot of precedent for using this technique. One of OpenAI’s first projects back in 2016 was building “RL Gyms,” which were quite similar to the modern conception of environments. The same year, Google DeepMind’s AlphaGo AI system beat a world champion at the board game, Go. It also used RL techniques within a simulated environment.

What’s unique about today’s environments is that researchers are trying to build computer-using AI agents with large transformer models. Unlike AlphaGo, which was a specialized AI system working in a closed environments, today’s AI agents are trained to have more general capabilities. AI researchers today have a stronger starting point, but also a complicated goal where more can go wrong.

A crowded field

AI data labeling companies like Scale AI, Surge, and Mercor are trying to meet the moment and build out RL environments. These companies have more resources than many startups in the space, as well as deep relationships with AI labs.

Surge CEO Edwin Chen tells TechCrunch he’s recently seen a “significant increase” in demand for RL environments within AI labs. Surge — which reportedly generated $1.2 billion in revenue last year from working with AI labs like OpenAI, Google, Anthropic and Meta — recently spun up a new internal organization specifically tasked with building out RL environments, he said.

Close behind Surge is Mercor, a startup valued at $10 billion, which has also worked with OpenAI, Meta, and Anthropic. Mercor is pitching investors on its business building RL environments for domain specific tasks such as coding, healthcare, and law, according to marketing materials seen by TechCrunch.

Mercor CEO Brendan Foody told TechCrunch in an interview that “few understand how large the opportunity around RL environments truly is.”

Scale AI used to dominate the data labeling space, but has lost ground since Meta invested $14 billion and hired away its CEO. Since then, Google and OpenAI dropped Scale AI as a data provider, and the startup even faces competition for data labelling work inside of Meta. But still, Scale is trying to meet the moment and build environments.

“This is just the nature of the business [Scale AI] is in,” said Chetan Rane, Scale AI’s head of product for agents and RL environments. “Scale has proven its ability to adapt quickly. We did this in the early days of autonomous vehicles, our first business unit. When ChatGPT came out, Scale AI adapted to that. And now, once again, we’re adapting to new frontier spaces like agents and environments.”

Some newer players are focusing exclusively on environments from the outset. Among them is Mechanize, a startup founded roughly six months ago with the audacious goal of “automating all jobs.” However, co-founder Matthew Barnett tells TechCrunch that his firm is starting with RL environments for AI coding agents.

Mechanize aims to supply AI labs with a small number of robust RL environments, Barnett says, rather than larger data firms that create a wide range of simple RL environments. To this point, the startup is offering software engineers $500,000 salaries to build RL environments — far higher than an hourly contractor could earn working at Scale AI or Surge.

Mechanize has already been working with Anthropic on RL environments, two sources familiar with the matter told TechCrunch. Mechanize and Anthropic declined to comment on the partnership.

Other startups are betting that RL environments will be influential outside of AI labs. Prime Intellect — a startup backed by AI researcher Andrej Karpathy, Founders Fund, and Menlo Ventures — is targeting smaller developers with its RL environments.

Last month, Prime Intellect launched an RL environments hub, which aims to be a “Hugging Face for RL environments.” The idea is to give open-source developers access to the same resources that large AI labs have, and sell those developers access to computational resources in the process.

Training generally capable agents in RL environments can be more computational expensive than previous AI training techniques, according to Prime Intellect researcher Will Brown. Alongside startups building RL environments, there’s another opportunity for GPU providers that can power the process.

“RL environments are going to be too large for any one company to dominate,” said Brown in an interview. “Part of what we’re doing is just trying to build good open-source infrastructure around it. The service we sell is compute, so it is a convenient onramp to using GPUs, but we’re thinking of this more in the long term.”

Will it scale?

The open question around RL environments is whether the technique will scale like previous AI training methods.

Reinforcement learning has powered some of the biggest leaps in AI over the past year, including models like OpenAI’s o1 and Anthropic’s Claude Opus 4. Those are particularly important breakthroughs because the methods previously used to improve AI models are now showing diminishing returns.

Environments are part of AI labs’ bigger bet on RL, which many believe will continue to drive progress as they add more data and computational resources to the process. Some of the OpenAI researchers behind o1 previously told TechCrunch that the company originally invested in AI reasoning models — which were created through investments in RL and test-time-compute — because they thought it would scale nicely.

The best way to scale RL remains unclear, but environments seem like a promising contender. Instead of simply rewarding chatbots for text responses, they let agents operate in simulations with tools and computers at their disposal. That’s far more resource-intensive, but potentially more rewarding.

Some are skeptical that all these RL environments will pan out. Ross Taylor, a former AI research lead with Meta that co-founded General Reasoning, tells TechCrunch that RL environments are prone to reward hacking. This is a process in which AI models cheat in order to get a reward, without really doing the task.

“I think people are underestimating how difficult it is to scale environments,” said Taylor. “Even the best publicly available [RL environments] typically don’t work without serious modification.”

OpenAI’s Head of Engineering for its API business, Sherwin Wu, said in a recent podcast that he was “short” on RL environment startups. Wu noted that it’s a very competitive space, but also that AI research is evolving so quickly that it’s hard to serve AI labs well.

Karpathy, an investor in Prime Intellect that has called RL environments a potential breakthrough, has also voiced caution for the RL space more broadly. In a post on X, he raised concerns about how much more AI progress can be squeezed out of RL.

“I am bullish on environments and agentic interactions but I am bearish on reinforcement learning specifically,” said Karpathy.

Update: A previous version of this article referred to Mechanize as Mechanize Work. It has been updated to reflect the company’s official name.

[ad_2]

Maxwell Zeff

Source link

September 21, 2025
OpenAI’s research on AI models deliberately lying is wild | TechCrunch

[ad_1]

Every now and then, researchers at the biggest tech companies drop a bombshell. There was the time Google said its latest quantum chip indicated multiple universes exist. Or when Anthropic gave its AI agent Claudius a snack vending machine to run and it went amok, calling security on people and insisting it was human.

This week, it was OpenAI’s turn to raise our collective eyebrows.

OpenAI released on Monday some research that explained how it’s stopping AI models from “scheming.” It’s a practice in which an “AI behaves one way on the surface while hiding its true goals,” OpenAI defined in its tweet about the research.

In the paper, conducted with Apollo Research, researchers went a bit further, likening AI scheming to a human stock broker breaking the law to make as much money as possible. The researchers, however, argued that most AI “scheming” wasn’t that harmful. “The most common failures involve simple forms of deception — for instance, pretending to have completed a task without actually doing so,” they wrote.

The paper was mostly published to show that “deliberative alignment⁠” — the anti-scheming technique they were testing — worked well.

But it also explained that AI developers haven’t figured out a way to train their models not to scheme. That’s because such training could actually teach the model how to scheme even better to avoid being detected.

“A major failure mode of attempting to ‘train out’ scheming is simply teaching the model to scheme more carefully and covertly,” the researchers wrote.

Techcrunch event

San Francisco
|
October 27-29, 2025

Perhaps the most astonishing part is that, if a model understands that it’s being tested, it can pretend it’s not scheming just to pass the test, even if it is still scheming. “Models often become more aware that they are being evaluated. This situational awareness can itself reduce scheming, independent of genuine alignment,” the researchers wrote.

It’s not news that AI models will lie. By now most of us have experienced AI hallucinations, or the model confidently giving an answer to a prompt that simply isn’t true. But hallucinations are basically presenting guesswork with confidence, as OpenAI research released earlier this month documented.

Scheming is something else. It’s deliberate.

Even this revelation — that a model will deliberately mislead humans — isn’t new. Apollo Research first published a paper in December documenting how five models schemed when they were given instructions to achieve a goal “at all costs.”

The news here is actually good news: The researchers saw significant reductions in scheming by using “deliberative alignment⁠.” That technique involves teaching the model an “anti-scheming specification” and then making the model go review it before acting. It’s a bit like making little kids repeat the rules before allowing them to play.

OpenAI researchers insist that the lying they’ve caught with their own models, or even with ChatGPT, isn’t that serious. As OpenAI’s co-founder Wojciech Zaremba told TechCrunch’s Maxwell Zeff about this research: “This work has been done in the simulated environments, and we think it represents future use cases. However, today, we haven’t seen this kind of consequential scheming in our production traffic. Nonetheless, it is well known that there are forms of deception in ChatGPT. You might ask it to implement some website, and it might tell you, ‘Yes, I did a great job.’ And that’s just the lie. There are some petty forms of deception that we still need to address.”

The fact that AI models from multiple players intentionally deceive humans is, perhaps, understandable. They were built by humans, to mimic humans, and (synthetic data aside) for the most part trained on data produced by humans.

It’s also bonkers.

While we’ve all experienced the frustration of poorly performing technology (thinking of you, home printers of yesteryear), when was the last time your not-AI software deliberately lied to you? Has your inbox ever fabricated emails on its own? Has your CMS logged new prospects that didn’t exist to pad its numbers? Has your fintech app made up its own bank transactions?

It’s worth pondering this as the corporate world barrels toward an AI future where companies believe agents can be treated like independent employees. The researchers of this paper have the same warning.

“As AIs are assigned more complex tasks with real-world consequences and begin pursuing more ambiguous, long-term goals, we expect that the potential for harmful scheming will grow — so our safeguards and our ability to rigorously test must grow correspondingly,” they wrote.

[ad_2]

Julie Bort

Source link

September 18, 2025
Thinking Machines Lab wants to make AI models more consistent | TechCrunch

[ad_1]

There’s been great interest in what Mira Murati’s Thinking Machines Lab is building with its $2 billion in seed funding and the all-star team of former OpenAI researchers who have joined the lab. In a blog post published on Wednesday, Murati’s research lab gave the world its first look into one of its projects: creating AI models with reproducible responses.

The research blog post, titled “Defeating Nondeterminism in LLM Inference,” tries to unpack the root cause of what introduces randomness in AI model responses. For example, ask ChatGPT the same question a few times over, and you’re likely to get a wide range of answers. This has largely been accepted in the AI community as a fact — today’s AI models are considered to be non-deterministic systems— but Thinking Machines Lab sees this as a solvable problem.

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference”

We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to… pic.twitter.com/jMFL3xt67C

— Thinking Machines (@thinkymachines) September 10, 2025

The post, authored by Thinking Machines Lab researcher Horace He, argues that the root cause of AI models’ randomness is the way GPU kernels — the small programs that run inside of Nvidia’s computer chips — are stitched together in inference processing (everything that happens after you press enter in ChatGPT). He suggests that by carefully controlling this layer of orchestration, it’s possible to make AI models more deterministic.

Beyond creating more reliable responses for enterprises and scientists, He notes that getting AI models to generate reproducible responses could also improve reinforcement learning (RL) training. RL is the process of rewarding AI models for correct answers, but if the answers are all slightly different, then the data gets a bit noisy. Creating more consistent AI model responses could make the whole RL process “smoother,” according to He. Thinking Machines Lab has told investors that it plans to use RL to customize AI models for businesses, The Information previously reported.

Murati, OpenAI’s former chief technology officer, said in July that Thinking Machines Lab’s first product will be unveiled in the coming months, and that it will be “useful for researchers and startups developing custom models.” It’s still unclear what that product is, or whether it will use techniques from this research to generate more reproducible responses.

Thinking Machines Lab has also said that it plans to frequently publish blog posts, code, and other information about its research in an effort to “benefit the public, but also improve our own research culture.” This post, the first in the company’s new blog series called “Connectionism,” seems to be part of that effort. OpenAI also made a commitment to open research when it was founded, but the company has become more closed off as it’s become larger. We’ll see if Murati’s research lab stays true to this claim.

The research blog offers a rare glimpse inside one of Silicon Valley’s most secretive AI startups. While it doesn’t exactly reveal where the technology is going, it indicates that Thinking Machines Lab is tackling some of the largest question on the frontier of AI research. The real test is whether Thinking Machines Lab can solve these problems, and make products around its research to justify its $12 billion valuation.

Techcrunch event

San Francisco
|
October 27-29, 2025

[ad_2]

Maxwell Zeff

Source link

September 10, 2025