A.I. pioneer Fei-Fei Li is lending her support to Simile’s effort to simulate human behavior at scale. John Nacion/Variety via Getty Images
Every three months, public companies brace for analyst questions during quarterly earnings calls. But what if firms could predict these queries in advance and rehearse their responses? That’s one of the capabilities touted by Simile, a new A.I. startup spun out of Stanford and backed by acclaimed researcher Fei-Fei Li and OpenAI co-founder Andrej Karpathy.
Simile emerged from stealth yesterday (Feb. 12) with $100 million in funding from a round led by Index Ventures. Alongside Li and Karpathy, the startup—which hasn’t disclosed its valuation—also counts investors including Quora co-founder Adam D’Angelo and Scott Belsky, a partner at A24 Films.
Li and Karpathy both have close ties to Simile’s founding team, which includes Stanford researchers Joon Park, Percy Liang and Michael Bernstein. Li is the co-director of Stanford’s Human-Centered A.I. Institute and advised Karpathy during his Ph.D. study at the university. She is widely known for foundational work such as ImageNet, a large-scale image database that helped drive major breakthroughs in computer vision. Karpathy and Bernstein also contributed to that project.
Simile’s mission of using A.I. to reflect and model societal behavior taps into an underexplored research area, according to Karpathy, who previously worked at OpenAI and Tesla before launching his own education-focused A.I. startup. While large language models typically present a single, cohesive personality, Karpathy argues they are actually trained on data drawn from vast numbers of people. “Why not lean into that statistical power: Why simulate one ‘person’ when you could try to simulate a population?” he wrote in a post on X.
That idea underpins Simile’s broader goal. The Palo Alto-based startup aims to simulate the real-world effects of major decisions, from public policy to product launches, across virtual populations that mirror human behavior. The team has already tested this concept on a smaller scale through projects like Smallville, a 2023 Stanford experiment in which 25 autonomous A.I. agents interacted in a virtual environment.
Now, Simile is scaling the approach for business use. After spending the past seven months developing its model, the company is already working with clients on applications ranging from product development to litigation forecasting. CVS Health Corporation, for example, uses Simile to create simulated focus groups, while Gallup uses the platform to build digital polling panels. For earning calls, Simile can predict about 80 percent of the questions that analysts ultimately ask, said Park, the startup’s CEO, during a recent appearance on TBPN.
At present, Simile’s models are based on data from hundreds of thousands of people who have signed up for its studies. Over time, the company hopes to expand that to simulations representing the world’s entire population of roughly 8 billion people.
Simile joins a growing wave of A.I. companies focused on using simulation to model real-world scenarios. Much of the existing research in this space has centered on physical systems, such as robotics and autonomous vehicles, through “world model” platforms developed by firms like Google and Nvidia.
From infrastructure battles to physical-world intelligence, A.I.’s next chapter is already taking shape. Unsplash
In November, ChatGPT turned three, with a global user base rapidly approaching one billion. At this point, A.I. is no longer an esoteric acronym that needs explaining in news stories. It has become a daily utility, woven into how we work, learn, shop and even love. The field is also far more crowded than it was just a few years ago, with competitors emerging at every layer of the stack.
Over the past year, conversation around A.I. has taken on a more complicated tone. Some argue that consumer chatbots are nearing a plateau. Others warn that startup valuations are inflating into a bubble. And, as always, there’s the persistent anxiety that A.I. may one day outgrow human control altogether.
So what comes next? Much of the industry’s energy is now focused on the infrastructure side of A.I. Big Tech companies are racing to solve the hardware bottlenecks that limit today’s systems, while startups experiment with applications far beyond chatbots. At the same time, researchers are beginning to look past language models altogether, toward models that can reason about the physical world.
Below are the key themes Observer has identified over the past year of covering this space. Many of these developments are still unfolding and are likely to shape the field well into 2026 and beyond.
A.I. chips
Even as OpenAI faces growing competition at the model level, its primary chip supplier, Nvidia, remains in a league of its own. Demand for its GPUs continues to outstrip supply, and no rival has yet meaningfully disrupted its dominance. Traditional semiconductor companies such as AMD and Intel are racing to claw back market share, while some of Nvidia’s largest customers are designing their own chips to reduce dependence on a single supplier.
To borrow from philosopher Ludwig Wittgenstein, the limits of language are the limits of our world. Today’s A.I. systems have grown remarkably fluent in human language—especially English—but language captures only a narrow slice of intelligence. That limitation has prompted some researchers to argue that large language models alone can never reach human-level understanding.
That belief is fueling a push toward so-called “world models,” which aim to teach machines how the physical world works—how objects move, how space is structured, and how cause and effect unfold. LeCun is now leaving Meta to build such a system himself. Fei-Fei Li’s startup, World Labs, unveiled its first model in November after nearly two years of development. Google DeepMind has released early versions through its Genie projects, and Nvidia is betting heavily on physical A.I. with its Cosmos models.
Language-specific A.I.
While pioneering researchers look beyond language, linguistic barriers remain one of A.I.’s most practical challenges. More than half of the internet’s content is written in English, skewing training data and limiting performance in other languages.
It’s only natural that there’s a consumer hardware angle of A.I. This year brought a wave of experiments in wearable A.I.—some met with curiosity, others with discomfort.
Friend, a startup selling an A.I. pendant, sparked backlash after a New York City subway campaign framed its product as a substitute for human companionship. In December, Meta acquired Limitless, the maker of a $99 wearable that records and summarizes conversations. Earlier in the year, Amazon bought Bee, which produces a $50 bracelet designed to transcribe daily activity and generate summaries.
Meta is also developing a new line of smart glasses with EssilorLuxottica, the company behind Ray-Ban and Oakley. In July, Mark Zuckerberg went so far as to suggest that people without A.I.-enhanced glasses could eventually face a “significant cognitive disadvantage.” Meanwhile, OpenAI is quietly collaborating with former Apple design chief Jony Ive on a mysterious hardware project of its own. This all suggests the next phase of A.I. may be something we wear, not just something we type into.
World Labs, the startup founded by AI pioneer Fei-Fei Li, is launching its first commercial world model product. Marble is now available via freemium and paid tiers that let users turn text prompts, photos, videos, 3D layouts or panoramas into editable, downloadable 3D environments.
The launch of the generative world model, first released in limited beta preview two months ago, comes a little over a year after World Labs came out of stealth with $230 million in funding, and puts the startup ahead of competitors building world models. World models are AI systems that generate an internal representation of an environment, and can be used to predict future outcomes and plan actions.
Startups like Decart and Odyssey have released free demos, and Google’s Genie is still in limited research preview. Marble differs from these — and even World Labs’s own real-time model, RTFM — because it creates persistent, downloadable 3D environments rather than generating worlds on-the-fly as you explore. This, the company says, results in less morphing or inconsistency, and lets users export worlds as Gaussian splats, meshes or videos.
Marble is also the first model of its kind to offer AI-native editing tools and a hybrid 3D editor that lets users block out spatial structures before AI fills in the visual details.
Image Credits:World Labs
“This is a brand new category of model that’s generating 3D worlds, and this is something that’s going to get better over time. It’s something we’ve already improved quite a lot,” Justin Johnson, co-founder of World Labs, told TechCrunch.
Last December, World Labs showed how its early models could generate interactive 3D scenes based on a single image. While impressive, the somewhat cartoonish scenes weren’t fully explorable since movements were limited to a small area, and there were occasional rendering errors.
In my trial of the beta preview, I found Marble generated impressive worlds from image prompts alone — from game-like environments to photorealistic versions of my living room. Scenes morphed at the edges, though that’s apparently been improved in today’s launch. That said, a world I’d generated in the beta using a single prompt looked better and matched my intent more closely than the same prompt does now.
Techcrunch event
San Francisco | October 13-15, 2026
I haven’t yet tested the editing features, though Johnson says they make Marble practical for near-term gaming, VFX and virtual reality (VR) projects.
“One of our main themes for Marble going forward is creative control,” Johnson said. “There should always be a quick pathway to generate something, but you should be able to dive even deeper and get a lot of control over the things that you’re generating. You don’t want the machine to just take the wheel and pull all that creativity away from you.”
Marble’s input to output pipeline.Image Credits:World Labs
Marble’s take on creative control starts with input flexibility. The beta only accepted single images, forcing the model to invent unseen details for a 360-degree view. With the full launch, users can now upload multiple images or short clips to show a space from different angles and have the model generate fairly realistic digital twins.
Then we have Chisel, an experimental 3D editor that lets users block out coarse spatial layouts (think walls, boxes, or planes) and then add text prompts to guide the visual style. Marble generates the world, decoupling structure from style — similar to how HTML provides the structure of a website and CSS adds in color. Unlike text-based editing, Chisel lets you directly manipulate objects.
Marble’s Chisel feature decouples structure from style. Image Credits:World Labs
“I can just go in there and grab the 3D block that represents the couch and move it somewhere else,” Johnson said.
Another new feature that gives you more editing control is the ability to expand a world.
“Once you generate a world, you can expand it up to once,” Johnson said. “When you move to a piece of the world that’s starting to break apart, you can basically tell the model to expand there or generate more world in the vicinity of where you currently are, and then it can add more detail in that region.”
Users who want to create extremely large spaces can combine multiple worlds with “composer mode.” Johnson demonstrated this for me with two worlds he had already built – a room made of cheese with grape chairs, and another of a futuristic meeting room in space.
The path to spatial intelligence
Space ship environment created in Marble with text prompt overlayed. Note how the lights are realistically reflected in the hub’s walls.Image Credits:World Labs/TechCrunch
Marble is available via four subscription tiers: Free (four generations from text, image, or panorama), Standard ($20/month, 12 generations plus multi-image/video input and advanced editing), Pro ($35/month, 25 generations with scene expansion and commercial rights), and Max ($95/month, all features and 75 generations).
Johnson thinks the initial use cases for Marble will be gaming, visual effects for film, and virtual reality.
Game developers have mixed feelings about the tech. A recent Game Developers Conference survey found a third of respondents believed generative AI has a negative impact on the games industry – 12% more than the survey indicated year earlier. Intellectual property theft, energy consumption and a decrease in quality from AI-generated content were among the top concerns aired. And last year, a Wired investigation found game studios like Activision Blizzard are using AI to cut corners and combat attrition.
In gaming, Johnson sees developers using Marble to generate background environments and ambient spaces and then importing those assets into game engines like Unity or Unreal Engine to add interactive elements, logic and code.
“It’s not designed to replace the entire existing pipeline for gaming, but to just give you assets that you can drop into that pipeline,” he said.
For VFX work, Marble sidesteps the inconsistency and poor camera control that plague AI video generators, per Johnson. Its 3D assets let artists stage scenes and control camera movements with frame-perfect precision, he said.
While Johnson said World Labs isn’t focusing on virtual reality (VR) applications right now, he noted the industry is “starved for content” and excited about the launch. Marble is already compatible with the Vision Pro and Quest 3 VR headsets, and every generated world can be viewed in VR today.
Marble may also have potential use cases for robotics. Johnson noted that unlike image and video generation, robotics doesn’t have the benefit of a large repository of training data. But with generators like Marble, it becomes easier to simulate training environments.
According to a recent manifesto by Fei-Fei Li, CEO and co-founder of World Labs, Marble represents the first step towards creating “a truly spatially intelligent world model.”
Li believes “the next generation of world models will enable machines to achieve spatial intelligence on an entirely new level.” If large language models can teach machines to read and write, Li hopes systems like Marble can teach them to see and build. She says the ability to understand how things exist and interact in three-dimensional spaces can eventually help machines make breakthroughs beyond gaming and robotics, and even into science and medicine.
“Our dreams of truly intelligent machines will not be complete without spatial intelligence,” Li wrote.
Got a sensitive tip or confidential documents? We’re reporting on the inner workings of the AI industry — from the companies shaping its future to the people impacted by their decisions. Reach out to Rebecca Bellan at rebecca.bellan@techcrunch.comor Russell Brandom at russell.brandom@techcrunch.com. For secure communication, you can contact them via Signal at @rebeccabellan.491and russellbrandom.49.
Soyoung Lee, co-founder and head of GTM at Twelve Labs, pictured at Web Summit Vancouver 2025. Photo by Vaughn Ridley/Web Summit via Sportsfile via Getty Images
Sure, the score of a football game is important. But sporting events can also foster cultural moments that slip under the radar—such as Travis Kelce signing a heart to Taylor Swift in the stands. While such footage could be social-media gold, it’s easily missed by traditional content tagging systems. That’s where Twelve Labs comes in.
“Every sports team or sports league has decades of footage that they’ve captured in-game, around the stadium, about players,” Soyoung Lee, co-founder and head of GTM at Twelve Labs, told Observer. However, these archives are often underutilized due to inconsistent and outdated content management. “To date, most of the processes for tagging content have been manual.”
Twelve Labs, a San Francisco-based startup specializing in video-understanding A.I., wants to unlock the value of video content by offering models that can search vast archives, generate text summaries and create short-form clips from long-form footage. Its work extends far beyond sports, touching industries from entertainment and advertising to security.
“Large language models can read and write really well,” said Lee. “But we want to move on to create a world in which A.I. can also see.”
Is Twelve Labs related to Eleven Labs?
Founded in 2021, Twelve Labs isn’t to be confused with ElevenLabs, an A.I. startup that specializes in audio. “We started a year earlier,” Lee joked, adding that Twelve Labs—which named itself after the initial size of its founding team—often partners with ElevenLabs for hackathons, including one dubbed “23Labs.”
The startup’s ambitious vision has drawn interest from deep-pocketed backers. It has raised more than $100 million from investors such as Nvidia, Intel, and Firstman Studio, the studio of Squid Game creator Hwang Dong-hyuk. Its advisory bench is equally star-studded, featuring Fei-Fei Li, Jeffrey Katzenberg and Alexandr Wang.
Twelve Labs counts thousands of developers and hundreds of enterprise customers. Demand is highest in entertainment and media, spanning Hollywood studios, sports leagues, social media influencers and advertising firms that rely on Twelve Labs tools to automate clip generation, assist with scene selection or enable contextual ad placements.
Government agencies also use the startup’s technology for video search and event retrieval. Beyond its work with the U.S. and other nations, Lee said that Twelve Labs has a deployment in South Korea’s Sejong City to help CCTV operators monitor thousands of camera feeds and locate specific incidents. To reduce security risks, the company has removed capabilities for facial and biometric recognition, she added.
Will video-native A.I. come for human jobs?
Many of the industries Twelve Labs serves are already debating whether A.I. threatens humans jobs—a concern Lee argues is only partly warranted. “I don’t know if jobs will be lost, per se, but jobs will have to transition,” she said, comparing the shift to how tools like Photoshop reshaped creative roles.
If anything, Lee believes systems like Twelve Labs’ will democratize creative work traditionally limited to companies with big budgets. “You are now able to do things with less, which means you have more stories that can be created from independent creatives who do not have that same capital,” she said. “It actually allows for the scaling of content creation and personalizing distribution.”
Twelve Labs is not the only A.I. player eyeing video, but the company insists it serves a different need than its much larger competitors. “We’re excited that video is now starting to get more attention, but the way we’re seeing it is a lot of innovation in large language models, a lot of innovation in video generation models and image generation models like Sora—but not in video understanding,” said Lee, referencing OpenAI’s text-to-video A.I. model and app.
For now, Twelve Labs offers video search, video analysis and video-to-text capabilities. The company plans to expand into agentic platforms that can not only understand video but also build narratives from it. Such models could be useful beyond creative fields, Lee said, pointing to examples like retailers identifying peak foot-traffic hours or security clients mapping the sequence of events surrounding an accident.
While A.I. might help a Hollywood director assemble a movie, Lee believes it won’t ever be the director. Even if the technology can provide narrative options, humans still decide which story is most compelling, identify gaps and supply the footage. “At the end of the day, I think there’s nothing that can replace human creative intent.”