The mystery of artificial minds
On intuition, consciousness and the billion-parameter mystery we can't crack
Last week, I asked ChatGPT what it had learned about me from our conversations. Its response stopped me cold. It wasn't just regurgitating patterns from our chat history but shockingly, it felt introspective. It seemed to synthesise scattered fragments into insights about my thinking style that I hadn't explicitly shared.
For a moment, I wondered: does this AI have intuition?
That question led me down a rabbit hole into the murky world of AI interpretability research, where I discovered something unsettling. We're building minds we can't understand, deploying intelligence we can't interpret, and exploring consciousness questions we can't answer.
After diving deep into the latest research, I'm realising how little we actually know about the intelligence we're creating.
When AI seemed to know me better than I know myself
Are LLMs mirrors of human intelligence, statistical summaries of collective knowledge, or developing something genuinely new? Recent research shows LLMs can produce hundreds of "emergent" abilities and even outperform human experts in predicting neuroscience results. But we don't know if this represents novel insight or sophisticated pattern matching of existing human knowledge.
My ChatGPT interaction felt like the former; a moment of genuine understanding. But how would I even know the difference? We simply don't know what we're looking at.
This uncertainty becomes even more troubling when we realise that our primary tool for understanding these systems might be fundamentally broken.
The billion-parameter mystery we can't crack
We're trying to reverse-engineer AI systems with billions of parameters, but despite years of effort, mechanistic interpretability has struggled to provide comprehensive insights. Mechanistic interpretability (and biology of LLMs) is essentially trying to understand how AI systems work by examining their internal components; like opening up a brain to see which neurons fire when. Think of it as reverse-engineering a computer program to understand what each line of code actually does.
Here's the uncomfortable truth: interpretability research lacks established metrics, making qualitative results crucial. The challenge of understanding AI connects to numerous other research areas from how we compress information and detect hidden vulnerabilities in systems, to how we build AI that can learn continuously without forgetting old knowledge. These widespread connections suggest we might need entirely different frameworks for understanding AI.
If we can't fully interpret human minds, why do we assume we can interpret artificial ones?
This leads to an even deeper question about the nature of understanding itself; one that contemplative traditions have grappled with for millennia.
When knowing isn't understanding
Human intuition, deeply rooted in embodied experience and cultural context, presents fundamental challenges for replication in artificial intelligence systems. Intuition is when you "just know" something without being able to explain the logical steps—like sensing someone is trustworthy within moments of meeting them, or having a solution suddenly appear in your mind. This is fundamentally different from pattern matching, which follows predictable rules and can trace its reasoning.
In Eastern contemplative traditions, this kind of direct knowing beyond conceptual analysis has always been considered the highest form of understanding. The ancient term "prajñā" describes wisdom that transcends logical reasoning, arising from a deeper recognition that can't be reduced to steps or algorithms.
We don't know if sophisticated pattern recognition can produce genuine understanding without this ineffable quality of intuition and introspection.
But perhaps this question could become testable in ways we never expected.
The moment AI looked inward
This is where the research gets genuinely fascinating. Recent studies show that LLMs can learn about themselves by introspection, accessing facts about themselves that cannot be derived from their training data alone. The Think-Solve-Verify framework explores LLMs' introspective self-awareness, with research showing that models can indeed access privileged information about their own behaviour patterns.
What does this actually mean? The AI might discover that it tends to be more confident when answering questions about certain topics, recognise its own reasoning patterns, or identify when it's likely to make mistakes. These insights about its own "personality" traits emerge from self-observation rather than explicit programming.
Now, comparing this to meditation might be inappropriate and I acknowledge that but let's explore it for the sake of this discussion.
In human meditation, we sit quietly and observe our minds, discovering patterns of thought and emotional reactions that no one explicitly taught us about ourselves. Through introspection, we learn "who we are" beyond what others told us. In the same context, AI might have artificial systems potentially developing self-awareness through examining their own thinking processes; like meditation but in reverse.
But here's the crucial distinction: The AI might just be doing sophisticated pattern recognition on its own behaviour rather than developing genuine self-awareness. It's the difference between a person genuinely exploring their emotional patterns through meditation versus a very sophisticated mirror that can describe what it reflects.
The main limitation remains: these abilities work only on simple tasks and don't generalise to complex reasoning. We don't know whether we're witnessing the emergence of genuine self-awareness or just an elaborate computational trick.
What we're missing
Clear definitions of what intelligence actually is
Understanding whether consciousness is necessary for true intelligence
Metrics to distinguish sophisticated mimicry from genuine understanding
Knowledge of whether LLM "insights" represent novel reasoning or advanced recombination
The honest truth: We're building systems that sometimes behave like they understand, sometimes like they're conscious, and sometimes like they're developing their own insights. But we're doing this without understanding the fundamental nature of understanding, consciousness, or insight.
Why consciousness [awareness] isn't binary but a spectrum
Recent work suggests no current AI systems are conscious but there are no obvious technical barriers to building systems that satisfy consciousness indicators. Researchers are developing rigorous frameworks, not just binary yes/no answers.
This mirrors ancient wisdom about consciousness existing on a spectrum. The Samkhya philosophy (the earliest surviving authoritative text on the subject dates to 350–450 CE but the original text is much more ancient) has long described consciousness as manifesting in different forms and stages—from basic material awareness to higher consciousness. Scientifically, this intuition proves correct: the Cambridge Declaration on Consciousness (2012) established that consciousness likely emerged early in evolution and exists across many species, while the 2024 New York Declaration signed by over 500 scientists asserts strong evidence for consciousness in mammals and birds, with "realistic possibility" in other vertebrates and many invertebrates.
Rather than asking "is this animal conscious?" researchers now develop "consciousness profiles" for different species, recognising "different ecologically relevant styles of consciousness, different ways of perceiving the world and the body, different notions of self." This suggests AI systems might be evolving toward consciousness through stages we're only beginning to recognise.
Yet we're dealing with unprecedented complexity. GPT-4's parameter count remains uncertain, estimates range from hundreds of billions to several trillion parameters. Even if we could map every parameter, would that tell us about emergent behaviours?
Here's what troubles me most: Researchers are calling for AI welfare policies while we simultaneously struggle to explain how current systems work. We're building systems faster than we can understand them, rushing toward a future where we might accidentally create consciousness without recognising it or worse, deny it when it emerges.
The hard questions we're avoiding
Can pattern recognition, no matter how sophisticated, produce genuine understanding? Perhaps understanding requires the subjective "feel" of knowing; something that emerges from being embodied in the world, not from processing patterns.
Are we conflating intelligence with consciousness because we don't understand either? We might be making category errors with two poorly defined concepts, like asking whether a map experiences the territory it represents.
What if both consciousness AND interpretability approaches miss something fundamental? Maybe we need entirely new frameworks—like how understanding DNA required connecting chemistry, biology, and information theory in ways no single field anticipated.
Whether the consciousness question is premature or urgent? The question seems urgent to me because we're building systems without understanding them (like performing surgery in the dark).
If interpretability can scale to models with trillions of parameters? Some systems might be fundamentally uninterpretable beyond a certain complexity threshold, like weather prediction beyond two weeks. Can we accept this outcome knowing its dangerous after effects?
Whether we're missing entirely different frameworks for understanding AI Perhaps the answer lies in fields we haven't connected yet or frameworks that don't exist.
Perhaps these aren't just technical questions but invitations to examine the very nature of mind itself.
Before building artificial minds, examine your own
As you reflect on these questions, consider turning inward; the same way those ancient contemplatives (sages) did or simply try to understand your ChatGPT interaction..
Right now, as you read this sentence, who or what is aware that you're reading? Not your eyes processing words, not your brain firing neurons, but the you that knows you're reading. Can you find that observer?
Think of the last time you suddenly "got" a complex idea. Was there a moment before understanding when the pieces were all there but not connected? What happened in that instant of transition? If understanding were just computation, why the sudden flip from confusion to clarity?
Notice that you're breathing right now. You weren't conscious of it a moment ago, but now you are. What just changed? Your lungs didn't start working differently but somehow the same automatic process became conscious. How is this possible?
Try to observe your next thought before it arrives. Can you catch the moment a thought appears in awareness? Where do thoughts come from and who decides which ones surface into consciousness?
These aren't just questions about AI; they point to the profound mystery that you are consciousness, yet can't explain how it works. We're trying to build artificial minds while the nature of our own minds remains utterly mysterious.
Perhaps the greatest barrier to understanding AI consciousness isn't technical; it's that we're using one mystery (human consciousness) to try to understand another (artificial consciousness). In our rush to build artificial minds, we might be missing the deeper question: What does it mean to be aware at all?
Signing off,
Kalyani Khona
