Apple study exposes deep cracks in LLMs’ “reasoning” capabilities

misk@sopuli.xyz · 18 days ago

Apple study exposes deep cracks in LLMs’ “reasoning” capabilities

anon_8675309@lemmy.world · 18 days ago

Did anyone believe they had the ability to reason?

Kairos@lemmy.today · 18 days ago

Aeri@lemmy.world · 18 days ago

People are stupid OK? I’ve had people who think that it can in fact do math, “better than a calculator”

Furbag@lemmy.world · 17 days ago

Like 90% of the consumers using this tech are totally fine handing over tasks that require reasoning to LLMs and not checking the answers for accuracy.

Semperverus@lemmy.world · 18 days ago

I still believe they have the ability to reason to a very limited capacity. Everyone says that they’re just very sophisticated parrots, but there is something emergent going on. These AIs need to have a world-model inside of themselves to be able to parrot things as correctly as they currently do (yes, including the hallucinations and the incorrect answers). Sure they are using tokens instead of real dictionary words, which comes with things like the strawberry problem, but just because they are not nearly as sophisticated as us doesnt mean there is no reasoning happening.

We are not special.

Excrubulent@slrpnk.net · edit-2 17 days ago

It’s an illusion. People think that because the language model puts words into sequences like we do, there must be something there. But we know for a fact that it is just word associations. It is fundamentally just predicting the most likely next word and generating it.

If it helps, we have something akin to an LLM inside our brain, and it does the same limited task. Our brains have distinct centres that do all sorts of recognition and generative tasks, including images, sounds and languge. We’ve made neural networks that do these tasks too, but the difference is that we have a unifying structure that we call “consciousness” that is able to grasp context, and is able to loopback the different centres into one another to achieve all sorts of varied results.

So we get our internal LLM to sequence words, one word after another, then we loop back those words via the language recognition centre into the context engine, so it can check if the words match the message it intended to create, it checks them against its internal model of the world. If there’s a mismatch, it might ask for different words till it sees the message it wanted to see. This can all be done very fast, and we’re barely aware of it. Or, if it’s feeling lazy today, it might just blurt out the first sentence that sprang to mind and it won’t make sense, and we might call that a brain fart.

Back in the 80s “automatic writing” took off, which was essentially people tapping into this internal LLM and just letting the words flow out without editing. It was nonesense, but it had this uncanny resemblance to human language, and people thought they were contacting ghosts, because obviously there has to be something there, right? But it’s not, it’s just that it sounds like people.

These LLMs only produce text forwards, they have no ability to create a sentence, then examine that sentence and see if it matches some internal model of the world. They have no capacity for context. That’s why any question involving A inside B trips them up, because that is fundamentally a question about context. "How many Ws in the sentence “Howard likes strawberries” is a question about context, that’s why they screw it up.

I don’t think you solve that without creating a real intelligence, because a context engine would necessarily be able to expand its own context arbitrarily. I think allowing an LLM to read its own words back and do some sort of check for fidelity might be one way to bootstrap a context engine into existence, because that check would require it to begin to build an internal model of the world. I suspect the processing power and insights required for that are beyond us for now.

galanthus@lemmy.world · 18 days ago

If the only thing you feed an AI is words, then how would it possibly understand what these words mean if it does not have access to the things the words are referring to?

If it does not know the meaning of words, then what can it do but find patterns in the ways they are used?

This is a shitpost.

We are special, I am in any case.

Semperverus@lemmy.world · edit-2 18 days ago

It is akin to the relativity problem in physics. Where is the center of the universe? What “grid” do things move through? The answer is that everything moves relative to one another, and somehow that fact causes the phenomena in our universe (and in these language models) to emerge.

Likewise, our brains do a significantly more sophisticated but not entirely different version of this. There are more “cores” in our brains that are good at differen tasks that all constantly talk back and forth between eachother, and our frontal lobe provides the advanced thinking and networking on top of that. The LLMs are more equivalent to the broca’s area, they havent built out the full frontal lobe yet (or rather, the “Multiple Demand network”)

You are right in that an AI will never know what an apple tastes like, or what a breeze on its face feels like until we give them sensory equipment to read from.

In this case though, its the equivalent of a college student having no real world experience and only the knowledge from their books, lectures, and labs. You can still work with the concepts of and reason against things you have never touched if you are given enough information about them beforehand.

galanthus@lemmy.world · edit-2 17 days ago

The two rhetorical questions in your first paragraph assume the universe is discrete and finite, and I am not sure why. But also, that has nothing to do with what we are talking about. You think that if you show the computers and brains work the same way(they don’t), or in a similar way(maybe) I will have to accept an AI can do everything a human can, but that is not true at all.

Treating an AI like a subject capable of receiving information is inaccurate, but I will still assume it is identical to a human in that regard for the sake of argument.

It would still be nothing like a college student grappling with abstract concepts. It would be like giving you university textbooks on quantum mechanics written in chinese, and making you study them(it would be even more accurate if you didn’t know any language at all). You would be able to notice patterns in the ways the words are placed relative to each other, and also use this information(theoretically) to make a combination of characters that resembles the texts you have, but you wouldn’t be able to understand what they reference. Even if you had a dictionary you wouldn’t be, because you wouldn’t be able to understand the definitions. Words don’t magically have their meanings stored inside, they are jnterpreted in our heads, but an AI can’t do that, the word means nothing to it.

trolololol@lemmy.world · 18 days ago

What’s the strawberry problem? Does it think it’s a berry? I wonder why

xthexder@l.sw0.com · edit-2 18 days ago

I think the strawberry problem is to ask it how many R’s are in strawberry. Current AI gets it wrong almost every time.

Terrasque@infosec.pub · 17 days ago

That’s because they don’t see the letters, but tokens instead. A token can be one letter, but is usually bigger. So what the llm sees might be something like

st
raw
be
r
r
y

When seeing it like that it’s more obvious why the llm’s are struggling with it

tempest@lemmy.ca · 18 days ago

Ask an LLM how many Rs there are in strawberry

rain_worl@lemmy.world · 18 days ago

not a problem limited to llms, they perfectly replicate my stupidity ;)

tempest@lemmy.ca · 17 days ago

For reference Bing chat is still confidently sure there are 2