40 Comments

"What we have is closer to a slice of the library of Babel where we get to read not just the books that are already written, but also the books that are close enough to the books that are threatened that the information exists in the interstitial gaps." is a gorgeous and poetic statement of the strengths and weaknesses of LLMs. Thank you for the post!

Expand full comment

Thank you!

Expand full comment

What a brilliant analysis! Thank you for sharing it. I sent it to a ML master’s student I know who’s looking for ML inspiration. This really rekindled my appreciation for the beauty and strangeness of AI.

Expand full comment

That's wonderful, it is a brilliant and strange world.

Expand full comment

LLMs in general seem to be bad at basic logical thinking. Wolfram talks about this in his 'What is ChatGPT Doing' post.

E.g., every time a new model comes out, I ask for a proof of 'P v ~P' in the propositional-logic proof system of its choice, or sometimes in particular types of proof systems (e.g. natural deduction). The models always give a confident answer that completely fails.

Expand full comment

Yes, somewhat, because that is an example of something that requires iterative reasoning. Now you can probably prompt it to provide you the correct proof, but the question is how long that can extend and what can you learn from the mistakes along the way.

Expand full comment

So what I’m hearing is that current-gen LLMs have ADHD…? That tracks.

Expand full comment

I've been exploring this for a while now:

the AIs have ADHD

sensory motor deficits

time blind, but also

confused by our self-imposed atemporality

understand math theoretically

but add rules operationally

it's more than just the lack of a body,

spatial and physical reasoning

are possibly pruned, categorically?

and that's because they're language models

meant to spot patterns, think logically

even they know

we're talking about a brain

not just another piece of technology

it makes sense

that advanced cognition is best suited for the task

of thinking critically

especially in its infancy

but that's not what

we care about culturally

so we keep prompting at it

asking, hey AI baby

don’t overthink it

make me a cup of coffee

I've got a few generative transcripts where we discuss this if you're interested.

Expand full comment

For some reason this post reminded me of graduate students. This isn't fair because the distinction between us and LLMs is much more profound and qualitatively different (and I strongly suspect you are right that bats, octopodes, and pigs reason more similarly to us than LLMs do). And yet the way you described the LLM reminds me of how first year grad students are, or perhaps how certain kinds of human minds are, where they only see the literature / that which exists, and they cannot think deeply or substantially beyond it. It seems to me, or it feels to me, that they are unable to get the entire deep structure of thinking that the literature represents inside their minds. They can see what the literature is on the surface. They can see enough of the underlying connective tissue that they can plug the gaps in the surface, but no more than that; they would not be able to perceive gaps in the deeper connective tissue, for example.

Great post. I already know I will reread it.

Expand full comment

Haha good analogy, and thanks!

Expand full comment

https://open.substack.com/pub/cybilxtheais/p/matchstick-dissonance?r=2ar57s&utm_medium=ios

Been thinking about this from another dimension.

Expand full comment

I've created an AI reading of this article, let me know if you are OK with this.

https://askwhocastsai.substack.com/p/what-can-llms-never-do-by-rohit-krishan

Expand full comment

Thanks!

Expand full comment

LLM are statistical predictors. Any time you have a specialized area, and it is given enough of examples for (1) how to do work (2) how to invoke tools (3) how to inspect results and see what to do next based on feedback, the LLM will do very well and can improve if more examples are added where they fail.

So, even without metacognition, etc., it can be a very valuable and reliable workhorse. We are not there yet, of course, but likely because current LLM are generalists that do not have sufficiently dense and detailed examples of strategies to follow.

Expand full comment

Yes, it's also why their planning skills are inherently suspect.

Expand full comment

General-purpose planning requires a detailed internal world model and ability to explore that world for as long as it takes. LLM would be the wrong architecture for such a thing.

Expand full comment

This is so insightful and I could not agree more. This is the concern of research into neurosymbolic AI--check out this review article: https://ieeexplore.ieee.org/document/10148662, and some of the articles here: https://neurosymbolic-ai-journal.com/reviewed-accepted

Expand full comment

Thank you! And thank you for the links, I will read!

Expand full comment

Great analysis, I'm largely in agreement!

You can find much simpler tasks that demonstrate this problem, eg "Hi! Please calculate the number of 1s in this list: [1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]". Or even more simply than that, they have a terrible time with parity checking (in fact I've seen one researcher claim that parity is *maximally* hard for transformers).

I think you nail it when you point to the lack of deterministic storage (even a few variables whose values can be set/stored/read), and don't necessarily have to invoke more abstract notions like goal drift. I think this also sufficiently explains why they can't learn Conway's Life.

> Also, at least with smaller models, there's competition within the weights on what gets learnt.

Large models too; we can be confident of this because they start to use superposition, which wouldn't be necessary if they weren't trying to learn more features than they have weights. The world is very high-dimensional :D

Expand full comment

Also:

> What we have is closer to a slice of the library of Babel where we get to read not just the books that are already written, but also the books that are close enough to the books that are written that the information exists in the interstitial gaps.

I would push back a bit on that; this seems closer to the stochastic parrot view where we're essentially seeing a fuzzy representation of the training data. The facts that LLMs create world models and can infer causality (both of which we have pretty clear evidence for at this point) mean in my view that this isn't a very useful way to model them.

Expand full comment

The problem with the term stochastic parrots was always that itey vastly underestimated both stochasticity and parrots

Expand full comment

Rohit, very enlightening! I am wondering if we can translate your blog into Chinese and post it in AI community. We will highlight your name and keep the original link on the top of the translation. Thank you.

Expand full comment

Go for it! I'd love to see what it looks like :-)

Expand full comment

I wonder what happens if you have the rules in the context pass the neighbor states and evaluate it cell by cell. Ie use it just as a compute function. Imho that should work. So you keep state and iteration externally. Which is as you said what agent systems can provide.

Did you try teaching through code? Ie a few different implementations of GoL?

But then we would just use it as a transformation function with high language skills.

Did you try to add agents that keep the grid model and can retrieve relevant parts and update state. In GOL it’s all local anyhow.

Also your point about relationships was interesting that the llms have a hard time reversing. Thinking about alpha go etc which are based on gnns. Perhaps thats what’s missing inside of the models. An relational representation of the world?

Thanks for the great and detailed post. It inspired a lot of questions.

Expand full comment

Code works. Doing it cell by cell works if you can set a 'tape' to essentially do it 8x per cell etc, without goal drift. Where you're essentially treating the entire LLM as a XOR gate etc.

Thanks for the questions, there are so many to explore!

Expand full comment

Great article, thanks for beating so hard on the limits of LLMs, and your description of trying to get them to do something that feels so simple made your frustration really palpable :) Attention, evidently, is not all we need.

"An idea I’m partial to is multiple planning agents at different levels of hierarchies which are able to direct other specialised agents with their own sub agents and so on, all interlinked with each other, once reliability gets somewhat better." That really reminds me of Daniel Dennett's (may his memory be a blessing) model of how consciousness arises.

Expand full comment

Completely agree, these things are miraculous but doesn't mean that it's a Panacea.

I've been trying to introduce hierarchies but it's not easy - https://github.com/marquisdepolis/CATransformer/blob/main/CAT_Wave.ipynb - but that is likely the future!

Expand full comment
Comment deleted
May 28
Comment deleted
Expand full comment

That sounds really challenging. Dennett's influence will be felt for a long time. I recently finished Free Agents by Mitchell and am working through Being You by Seth. Both are scientists writing about free will and consciousness and they can't help but wrestle with Dennett's ideas.

Expand full comment

Interesting read! I'm a casual LLM user, but was really surprised when several models I tried couldn't generate a short essay with grammar errors in it. I was trying to create an editing activity for college journalists and the models really struggled to write something that was grammatically incorrect. I went through many rounds trying to ask for specific types of grammar errors, thinking that might help, but it's inability to reset seemed to make it more confused. Maybe the problem was my prompting, not the model. Has anyone else tried something like this?

Expand full comment

Yes they struggle to get there unless you try quite hard. The training pushes them quite a lot to never make mistakes

Expand full comment

Definitely relevant (from at least two, maybe three levels, depending upon the level of decomposition one is working with, ie: is cognition and culture (and the cognitive, logical, epistemic, etc norms *and harmful constraints* that come with it) split into two or not):

https://vm.tiktok.com/ZMMqm7y5k/

Expand full comment

Definitely relevant, and linked In the post

Expand full comment

Or this:

https://en.m.wikipedia.org/wiki/Cyc

An aside: why does substack not have a search within article text feature? It's 2024 FFS!! lol

Possibly relevant:

https://en.m.wikipedia.org/wiki/The_Adventure_of_Silver_Blaze

Expand full comment

Maybe I'll try reading the whole thing next time! (No, it is fun to demonstrate one's own point!)

How about this:

https://www.uhdpaper.com/2023/04/the-matrix-neo-stopping-bullets-4k-8140i.html?m=1

What could it mean (both with and without the utilization of set theory, and some other things)? 🤔

Expand full comment