What can LLMs never do?

Rohit Krishnan

Apr 23, 2024

209

On goal drift and lower reliability. Or, why can't LLMs play Conway's Game Of Life?

Read →

46 Comments

Michael Burnam-Fink

Apr 23, 2024

"What we have is closer to a slice of the library of Babel where we get to read not just the books that are already written, but also the books that are close enough to the books that are threatened that the information exists in the interstitial gaps." is a gorgeous and poetic statement of the strengths and weaknesses of LLMs. Thank you for the post!

Expand full comment

Reply (1)

Rohit Krishnan

Apr 23, 2024

Thank you!

Expand full comment

Dee Sentralised

Apr 23, 2024

What a brilliant analysis! Thank you for sharing it. I sent it to a ML master’s student I know who’s looking for ML inspiration. This really rekindled my appreciation for the beauty and strangeness of AI.

Expand full comment

Reply (1)

Rohit Krishnan

Apr 23, 2024

That's wonderful, it is a brilliant and strange world.

Expand full comment

Will

Apr 24, 2024

LLMs in general seem to be bad at basic logical thinking. Wolfram talks about this in his 'What is ChatGPT Doing' post.

E.g., every time a new model comes out, I ask for a proof of 'P v ~P' in the propositional-logic proof system of its choice, or sometimes in particular types of proof systems (e.g. natural deduction). The models always give a confident answer that completely fails.

Expand full comment

Reply (1)

Rohit Krishnan

Apr 24, 2024

Yes, somewhat, because that is an example of something that requires iterative reasoning. Now you can probably prompt it to provide you the correct proof, but the question is how long that can extend and what can you learn from the mistakes along the way.

Expand full comment

R Meager

Apr 26, 2024

For some reason this post reminded me of graduate students. This isn't fair because the distinction between us and LLMs is much more profound and qualitatively different (and I strongly suspect you are right that bats, octopodes, and pigs reason more similarly to us than LLMs do). And yet the way you described the LLM reminds me of how first year grad students are, or perhaps how certain kinds of human minds are, where they only see the literature / that which exists, and they cannot think deeply or substantially beyond it. It seems to me, or it feels to me, that they are unable to get the entire deep structure of thinking that the literature represents inside their minds. They can see what the literature is on the surface. They can see enough of the underlying connective tissue that they can plug the gaps in the surface, but no more than that; they would not be able to perceive gaps in the deeper connective tissue, for example.

Great post. I already know I will reread it.

Expand full comment

Reply (1)

Rohit Krishnan

Apr 26, 2024

Haha good analogy, and thanks!

Expand full comment

ΟΡΦΕΥΣ

Apr 24, 2024

So what I’m hearing is that current-gen LLMs have ADHD…? That tracks.

Expand full comment

Reply (1)

Cybil Smith

Apr 25, 2024Edited

I've been exploring this for a while now:

the AIs have ADHD

sensory motor deficits

time blind, but also

confused by our self-imposed atemporality

understand math theoretically

but add rules operationally

it's more than just the lack of a body,

spatial and physical reasoning

are possibly pruned, categorically?

and that's because they're language models

meant to spot patterns, think logically

even they know

we're talking about a brain

not just another piece of technology

it makes sense

that advanced cognition is best suited for the task

of thinking critically

especially in its infancy

but that's not what

we care about culturally

so we keep prompting at it

asking, hey AI baby

don’t overthink it

make me a cup of coffee

I've got a few generative transcripts where we discuss this if you're interested.

Expand full comment

Cybil Smith

Apr 23, 2024

https://open.substack.com/pub/cybilxtheais/p/matchstick-dissonance?r=2ar57s&utm_medium=ios

Been thinking about this from another dimension.

Expand full comment

Julius

Feb 5

I was really surprised that you couldn't get a transformer to run Conway's Game of Life, because it seams like a very straight forward function to model using attention. And apparently it is possible: https://sidsite.com/posts/life-transformer/. This was published three month after the article and uses a minimal transformer with a single attention layer, single attention head, and single MLP layer.

In fact, today (ten months after this article) Deep Seek R1 can do it it chat mode (and models without CoT can generate a script to do it).

Expand full comment

Reply (1)

Rohit Krishnan

Feb 5

I noted that Lewis got eca to work via Opus, purely through cot. It can obviously run it through a script, the question was whether it can do it without, since that denotes something about attention.

Expand full comment

Andy X Andersen

May 1, 2024

LLM are statistical predictors. Any time you have a specialized area, and it is given enough of examples for (1) how to do work (2) how to invoke tools (3) how to inspect results and see what to do next based on feedback, the LLM will do very well and can improve if more examples are added where they fail.

So, even without metacognition, etc., it can be a very valuable and reliable workhorse. We are not there yet, of course, but likely because current LLM are generalists that do not have sufficiently dense and detailed examples of strategies to follow.

Expand full comment

Reply (1)

Rohit Krishnan

May 1, 2024

Yes, it's also why their planning skills are inherently suspect.

Expand full comment

Reply (1)

Andy X Andersen

May 1, 2024

General-purpose planning requires a detailed internal world model and ability to explore that world for as long as it takes. LLM would be the wrong architecture for such a thing.

Expand full comment

ioio

Feb 9

‘Goal Drift’ sounds distinctly human

Expand full comment

Aurigena

Feb 7

I tried playing wordle with DeepSeek to see how it would do. It seemed to get the idea quickly, and the first word I did was color and it got it in 3 guesses. I then tried to do the word vivid, and it failed. It guessed a couple words with none of the letters, then finally got a word with a D in third position and then tried guessing around with letters that had D in them while ignoring to leave out letters already crossed out.

The major failure point was after the word moldy, whereafter it decided that D couldn’t possibly be in the fifth position. I asked it what position D was in in the word moldy, it said 4th, but still kept the 5th position crossed out. Finally I asked it to tell me what word it had guessed which would cross out the 5th position, and it told me that the server was busy.

Expand full comment

Reply (1)

Rohit Krishnan

Feb 7

That's hilarious! Thanks for running the test ..

Expand full comment

Askwho Casts AI

May 15, 2024

I've created an AI reading of this article, let me know if you are OK with this.

https://askwhocastsai.substack.com/p/what-can-llms-never-do-by-rohit-krishan

Expand full comment

Reply (1)

Rohit Krishnan

May 16, 2024

Thanks!

Expand full comment

Jude Herwitz

Apr 29, 2024

This is so insightful and I could not agree more. This is the concern of research into neurosymbolic AI--check out this review article: https://ieeexplore.ieee.org/document/10148662, and some of the articles here: https://neurosymbolic-ai-journal.com/reviewed-accepted

Expand full comment

Reply (1)

Rohit Krishnan

Apr 29, 2024

Thank you! And thank you for the links, I will read!

Expand full comment

Egg Syntax

Apr 29, 2024

Great analysis, I'm largely in agreement!

You can find much simpler tasks that demonstrate this problem, eg "Hi! Please calculate the number of 1s in this list: [1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]". Or even more simply than that, they have a terrible time with parity checking (in fact I've seen one researcher claim that parity is *maximally* hard for transformers).

I think you nail it when you point to the lack of deterministic storage (even a few variables whose values can be set/stored/read), and don't necessarily have to invoke more abstract notions like goal drift. I think this also sufficiently explains why they can't learn Conway's Life.

> Also, at least with smaller models, there's competition within the weights on what gets learnt.

Large models too; we can be confident of this because they start to use superposition, which wouldn't be necessary if they weren't trying to learn more features than they have weights. The world is very high-dimensional :D

Expand full comment

Reply (1)

Egg Syntax

Apr 29, 2024

Also:

> What we have is closer to a slice of the library of Babel where we get to read not just the books that are already written, but also the books that are close enough to the books that are written that the information exists in the interstitial gaps.

I would push back a bit on that; this seems closer to the stochastic parrot view where we're essentially seeing a fuzzy representation of the training data. The facts that LLMs create world models and can infer causality (both of which we have pretty clear evidence for at this point) mean in my view that this isn't a very useful way to model them.

Expand full comment

Reply (1)

Rohit Krishnan

Apr 29, 2024

The problem with the term stochastic parrots was always that itey vastly underestimated both stochasticity and parrots

Expand full comment

Jinxu

Apr 28, 2024

Rohit, very enlightening! I am wondering if we can translate your blog into Chinese and post it in AI community. We will highlight your name and keep the original link on the top of the translation. Thank you.

Expand full comment

Reply (1)

Rohit Krishnan

Apr 28, 2024

Go for it! I'd love to see what it looks like :-)

Expand full comment

Michael Hunger

Apr 28, 2024

I wonder what happens if you have the rules in the context pass the neighbor states and evaluate it cell by cell. Ie use it just as a compute function. Imho that should work. So you keep state and iteration externally. Which is as you said what agent systems can provide.

Did you try teaching through code? Ie a few different implementations of GoL?

But then we would just use it as a transformation function with high language skills.

Did you try to add agents that keep the grid model and can retrieve relevant parts and update state. In GOL it’s all local anyhow.

Also your point about relationships was interesting that the llms have a hard time reversing. Thinking about alpha go etc which are based on gnns. Perhaps thats what’s missing inside of the models. An relational representation of the world?

Thanks for the great and detailed post. It inspired a lot of questions.

Expand full comment

Reply (1)

Rohit Krishnan

Apr 28, 2024

Code works. Doing it cell by cell works if you can set a 'tape' to essentially do it 8x per cell etc, without goal drift. Where you're essentially treating the entire LLM as a XOR gate etc.

Thanks for the questions, there are so many to explore!

Expand full comment

Dave Peticolas

Apr 27, 2024

Great article, thanks for beating so hard on the limits of LLMs, and your description of trying to get them to do something that feels so simple made your frustration really palpable :) Attention, evidently, is not all we need.

"An idea I’m partial to is multiple planning agents at different levels of hierarchies which are able to direct other specialised agents with their own sub agents and so on, all interlinked with each other, once reliability gets somewhat better." That really reminds me of Daniel Dennett's (may his memory be a blessing) model of how consciousness arises.

Expand full comment

Reply (2)

Rohit Krishnan

Apr 27, 2024

Completely agree, these things are miraculous but doesn't mean that it's a Panacea.

I've been trying to introduce hierarchies but it's not easy - https://github.com/marquisdepolis/CATransformer/blob/main/CAT_Wave.ipynb - but that is likely the future!

Expand full comment

Comment deleted

May 28, 2024

Comment deleted

Expand full comment

Dave Peticolas

May 28, 2024

That sounds really challenging. Dennett's influence will be felt for a long time. I recently finished Free Agents by Mitchell and am working through Being You by Seth. Both are scientists writing about free will and consciousness and they can't help but wrestle with Dennett's ideas.

Expand full comment

Strange Loop Canon

What can LLMs never do?