Tag: Hallucinations

OpenAI’s research on AI models deliberately lying is wild | TechCrunch

[ad_1]

Every now and then, researchers at the biggest tech companies drop a bombshell. There was the time Google said its latest quantum chip indicated multiple universes exist. Or when Anthropic gave its AI agent Claudius a snack vending machine to run and it went amok, calling security on people and insisting it was human.

This week, it was OpenAI’s turn to raise our collective eyebrows.

OpenAI released on Monday some research that explained how it’s stopping AI models from “scheming.” It’s a practice in which an “AI behaves one way on the surface while hiding its true goals,” OpenAI defined in its tweet about the research.

In the paper, conducted with Apollo Research, researchers went a bit further, likening AI scheming to a human stock broker breaking the law to make as much money as possible. The researchers, however, argued that most AI “scheming” wasn’t that harmful. “The most common failures involve simple forms of deception — for instance, pretending to have completed a task without actually doing so,” they wrote.

The paper was mostly published to show that “deliberative alignment⁠” — the anti-scheming technique they were testing — worked well.

But it also explained that AI developers haven’t figured out a way to train their models not to scheme. That’s because such training could actually teach the model how to scheme even better to avoid being detected.

“A major failure mode of attempting to ‘train out’ scheming is simply teaching the model to scheme more carefully and covertly,” the researchers wrote.

Techcrunch event

San Francisco
|
October 27-29, 2025

Perhaps the most astonishing part is that, if a model understands that it’s being tested, it can pretend it’s not scheming just to pass the test, even if it is still scheming. “Models often become more aware that they are being evaluated. This situational awareness can itself reduce scheming, independent of genuine alignment,” the researchers wrote.

It’s not news that AI models will lie. By now most of us have experienced AI hallucinations, or the model confidently giving an answer to a prompt that simply isn’t true. But hallucinations are basically presenting guesswork with confidence, as OpenAI research released earlier this month documented.

Scheming is something else. It’s deliberate.

Even this revelation — that a model will deliberately mislead humans — isn’t new. Apollo Research first published a paper in December documenting how five models schemed when they were given instructions to achieve a goal “at all costs.”

The news here is actually good news: The researchers saw significant reductions in scheming by using “deliberative alignment⁠.” That technique involves teaching the model an “anti-scheming specification” and then making the model go review it before acting. It’s a bit like making little kids repeat the rules before allowing them to play.

OpenAI researchers insist that the lying they’ve caught with their own models, or even with ChatGPT, isn’t that serious. As OpenAI’s co-founder Wojciech Zaremba told TechCrunch’s Maxwell Zeff about this research: “This work has been done in the simulated environments, and we think it represents future use cases. However, today, we haven’t seen this kind of consequential scheming in our production traffic. Nonetheless, it is well known that there are forms of deception in ChatGPT. You might ask it to implement some website, and it might tell you, ‘Yes, I did a great job.’ And that’s just the lie. There are some petty forms of deception that we still need to address.”

The fact that AI models from multiple players intentionally deceive humans is, perhaps, understandable. They were built by humans, to mimic humans, and (synthetic data aside) for the most part trained on data produced by humans.

It’s also bonkers.

While we’ve all experienced the frustration of poorly performing technology (thinking of you, home printers of yesteryear), when was the last time your not-AI software deliberately lied to you? Has your inbox ever fabricated emails on its own? Has your CMS logged new prospects that didn’t exist to pad its numbers? Has your fintech app made up its own bank transactions?

It’s worth pondering this as the corporate world barrels toward an AI future where companies believe agents can be treated like independent employees. The researchers of this paper have the same warning.

“As AIs are assigned more complex tasks with real-world consequences and begin pursuing more ambiguous, long-term goals, we expect that the potential for harmful scheming will grow — so our safeguards and our ability to rigorously test must grow correspondingly,” they wrote.

[ad_2]

Julie Bort

Source link

September 18, 2025
Haunting ‘Demon Faces’ Show What It’s Like to Have Rare Distorted Face Syndrome

[ad_1]

A 58-year-old man with a rare medical condition sees faces normally on screens and paper, but in person, they take on a demonic quality. The patient has a unique case of prosopometamorphopsia (PMO), a condition that causes peoples’ faces to appear distorted, reptilian, or otherwise inhuman.

The Fujifilm X100VI is the Most Fun I’ve Had With a Camera in Years

A new study published in The Lancet describes the case, which is unique in that, to the man, the faces only appear demonic when the individuals are physically present. The patient has been perceiving faces as distorted for 31 months; at first, it was distressing to him, but now, he has “become habituated to them,” the paper states.

Because faces appear ordinary to him on screens and in person, the research team had a unique opportunity to probe how the distortions manifest and create accurate visualizations of the “demonic” countenances.

“In other studies of the condition, patients with PMO are unable to assess how accurately a visualization of their distortions represents what they see because the visualization itself also depicts a face, so the patients will perceive distortions on it too,” said Antônio Mello, a researcher at Dartmouth College and lead author of the study, in a university release. “Through the process, we were able to visualize the patient’s real-time perception of the face distortions.”

For the patient, faces in person are unsettlingly distorted. Eyes are stretched and angular, nostrils flare out and lips stretch outwards to comprise the entire width of the face. Grooves appear in the forehead, and ears warp into an elvish shape, ending in sharp points. In milder cases, facial features merely droop, appear out of position, or are smaller or larger than they are in real life.

In another case, published in The Lancet in 2014, a 52-year-old woman in The Netherlands reported:

A life-long history of seeing people’s faces change into dragon-like faces and hallucinating similar faces many times a day. She could perceive and recognise actual faces, but after several minutes they turned black, grew long, pointy ears and a protruding snout, and displayed a reptiloid skin and huge eyes in bright yellow, green, blue, or red. She saw similar dragon-like faces drifting towards her many times a day from the walls, electrical sockets, or the computer screen, in both the presence and absence of face-like patterns, and at night she saw many dragon-like faces in the dark.

According to Brad Duchaine, senior author on the study and principal investigator of Dartmouth’s Social Perception Lab, people suffering from PMO are often diagnosed with other disorders, like schizophrenia, and prescribed anti-psychotics.

“It’s not uncommon for people who have PMO to not tell others about their problem with face perception because they fear others will think the distortions are a sign of a psychiatric disorder,” Duchaine said. “It’s a problem that people often don’t understand.”

The 58-year-old patient had a history of bipolar affective disorder and post-traumatic stress disorder (PTSD), the research team noted, as well as a head injury when he was 43 years old. The patient had no impairments to eyesight and a small round lesion on his left hippocampus, which the team concluded was a cyst. Other individuals suffering from an Alice in Wonderland syndrome (a catch-all term for perceptual distortions) also were reported to have brain lesions; encephalitis, migraines, and psychoactive drug use are also linked with the syndrome, though none were observed in the recent patient’s case.

To characterize the facial distortions, the researchers had the man describe perceived differences between the face of a person in the room with him and a photo of that person. Due to his PMO, the in-person face was distorted, and the on-screen face looked like an ordinary face.

PMO can last just days for some, and years for others. Only 75 case reports of PMO have been published, according to the researchers. It’s certainly one of the rare—and more disturbing—perceptual disorders, but knowing how it manifests means that fewer patients will be misdiagnosed in the future.

More: Vital Clues to Chronic Fatigue Syndrome Found in Major New Study

[ad_2]

Isaac Schultz

Source link

March 21, 2024

Tag: Hallucinations

OpenAI’s research on AI models deliberately lying is wild | TechCrunch

Haunting ‘Demon Faces’ Show What It’s Like to Have Rare Distorted Face Syndrome