About 12,300,000 results
Open links in new tab
  1. OpenAI has trained its LLM to confess to bad behavior

    3 days ago · OpenAI has trained its LLM to confess to bad behavior Large language models often lie and cheat. We can’t stop that—but we can make them own up.

  2. OpenAI is training models to 'confess' when they lie - what ...

    1 day ago · OpenAI is training models to 'confess' when they lie - what it means for future AI A new study made a version of GPT-5 Thinking admit its own misbehavior.

    Missing:
    • bad
    Must include:
  3. OpenAI prompts AI models to ‘confess’ when they cheat

    1 day ago · OpenAI’s research team has trained its GPT-5 large language model to “confess” when it doesn’t follow instructions, providing a second output after its main answer that reports when the ...

  4. 3 days ago · To demonstrate the viability of our approach, we train GPT-5-Thinking to produce confessions, and we evaluate its honesty in out-of-distribution scenarios measuring hallucination, …

    Missing:
    • bad
    Must include:
  5. OpenAI has trained its LLM to admit to bad behavior

    4 days ago · To check their idea, Barak and his colleagues trained OpenAI’s GPT-5-Pondering, the corporate’s flagship reasoning model, to supply confessions. After they arrange the model to fail, by …

  6. OpenAI AI Confessions Train Models to Admit Mistakes

    4 days ago · OpenAI explains that confessions are effective because they separate objectives entirely. While the main answer optimizes for multiple factors, the confession is trained solely on honesty. The …

  7. OpenAI is teaching AI models to 'confess' when they ...

    OpenAI has introduced a new research method called “confessions,” which trains AI models to self-report when they take shortcuts or break instructions. Here’s how it works.

    Missing:
    • its ·
    • bad
    Must include: