More Data Won’t Fix It

October 27, 2025 2 minute read

To the Editor, I’d like to point out a misleading statement in this article about the sources of AI model hallucinations.

Title: Top AI assistants misrepresent news content, study finds CBC News
Link: https://www.cbc.ca/news/world/ai-assistants-news-misrepresented-study-9.6947735
Date: 2025-10-27

The article states (emphasis my own):

OpenAI and Microsoft have previously said hallucinations — when an AI model generates incorrect or misleading information, often due to factors such as insufficient data — are an issue that they’re seeking to resolve.

This is misleading, since there is no amount of data that would lead to creation of AI models, in the current popular form, that do not sometimes generate content that we would deem incorrect or misleading.

Knowing Your Limits

Generative AI models are incredibly good at filling in missing information, even creating seemingly new content in response to their prompts. But generative AI models have no relation to truth itself. These models do not “know” what true and false is, or how to discern it, they do not attempt to mislead nor to be disingenuous. They merely generate new outputs given their inputs in a way that statistically mimics the data they were trained on. Even in cases where you literally ask these models to opine on the truth of a statement, they are successful only insofar as their training data contains correct answers about the truth you are seeking. In some cases, this could be quite impressive, as where the answer can be filled in as if it were a missing point on a curve, between other known concepts. But this is very different from determining if the statement is actually true or false.

Saying these models “statistically mimic” that data means that they are inherently never perfect, but a always a bit random. So, even if the training consisted only of perfectly “true” text, the model could still output falsehoods. The companies and academics researching AI are working hard to minimize these challenges, but they cannot be removed entirely, this is in the nature of the current popular generative AI models based on neural networks. I have written on this in the past related to “Hallucination”.

Honest people working in that industry will even sometimes even admit this. As put best in 2023 by Andrej Karpathy, a foundational AI guru at Tesla and OpenAI:

“I always struggle a bit when I’m asked about the ‘hallucination problem’ in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines.” – Andrej Karpathy @karpathy on Twitter Dec 8, 2023

A Suggestion

All this to merely say, it would be better if the CBC, and other responsible news organizations were more careful when ascribing the hallucinations or errors of AI systems to some fixable, known problem, rather than the challenging research topics that it is.

Otherwise, I worry our society will continue sleep-walking forward to the marketing tunes of a few corporations claiming that safe and responsible use of these systems in all walks of life is well on its way to being perfected, which is far from the truth.

Computationally Thinking

More Data Won’t Fix It

Knowing Your Limits

A Suggestion

You May Also Enjoy

Hedging and Hawing about AI and Jobs

That Which Dreams are Made Of

Movie Liveblog - The Creator

The More Things Change…