Imagine asking a friend for directions, and they respond:
“First, I considered the sun’s position. Then I recalled the street layout from Google Maps. I also factored in traffic data from 2019, checked the moon phase, and finally… took a wild guess.”
You’d be like: “Dude… what?”
That, in a nutshell, is what Apple says some of today’s smartest AI models are doing.
What Apple’s New AI Paper is Really Saying (But Funnier)
Apple researchers wrote a brainy paper called “The Illusion of Thinking”, where they tested a new breed of AI called Large Reasoning Models (LRMs). These are super fancy versions of tools like ChatGPT that try to show their work when answering hard questions.
But Apple basically asked:
“Is AI really thinking, or is it just very good at faking it like a kid trying to impress the teacher with long division they don’t understand?”
And spoiler alert: they’re faking it sometimes. Hard.

The 3 Main Things Apple Found (With Silly Examples)
1. More Thinking Isn’t Always Better Thinking
LRMs try to act smart by writing long “reasoning” steps. But when Apple gave them easy problems, the models actually did worse than simpler AI that just went straight to the answer.
🧀 Example: It’s like asking your friend what 2 + 2 is, and they reply:
“First, let’s examine the philosophical foundation of addition…”
Bro. It’s 4.
2. These Models Break When Things Get Too Hard
When the problems got tougher, the models started to fall apart — even when they had all the time and tools they needed.
🚪 Example: Imagine giving an AI a puzzle like:
“If Bob has 3 keys and 5 locks, but one key only opens red doors on Tuesdays…”
And it responds:
*“I’ve thought deeply about this… and my answer is “penguin.”
Like, WHAT?
3. They Often Just Make Stuff Up
The researchers noticed something wild: sometimes the models don’t actually follow logic — they just sound like they do. They’ll write impressive-looking steps that don’t really mean anything.
🧢 Example: It’s like a student who forgot to study but writes:
“Clearly, by applying the inverse potato theorem to this question, we derive the square root of responsibility.”
Still gets an F.

🧩 Apple’s Puzzle Playground (No, Not Candy Crush)
Apple didn’t use real-world questions that could be biased. They built their own clean puzzles to test the models fairly — kind of like giving everyone the same Sudoku and watching who cheats by writing random numbers.
They discovered three “moods” of AI behavior:
- Easy Tasks: Basic models beat the smarty-pants ones.
- Medium Tasks: LRMs shine — they do think better here.
- Hard Tasks: Everyone panics. Brain.exe has stopped working.
🧠 So What Does It All Mean for Us?
- Don’t be fooled by AI that sounds smart. Sometimes it’s just overthinking like a sleep-deprived university student with caffeine and no clue.
- More “steps” in an answer doesn’t mean better results. Sometimes simple is smarter.
- Even big tech (like Apple!) is skeptical of current AI hype.
💬 Final Thoughts from MbombelaTech
AI today is like a kid in class who talks a lot, uses big words, and adds fake math steps to their homework…
But when you actually check the answer — they wrote “potato.” 🥔
Apple is politely saying:
“Let’s stop confusing looking smart with being smart.”
And honestly? We couldn’t agree more.
📚 Want to Nerd Out?
If you’re into the technical stuff, you can read Apple’s full paper here:
👉 https://arxiv.org/abs/2406.03087