30 Comments

imo the ones who think ai is stalling are those who are too unimaginative to think up novel frameworks/applications. with current LLMs we can already automate the vast majority of the annoying processes that plague our minds in corpo jobs but we just dont yet have the integrations.

in a lot of cases where ppl are mad that AI cant solve their problems they are primarily mad that AI can't understand what they are saying/asking for because they themselves haven't articulated it well enough or broken it down well enough.

Expand full comment

>imo the ones who think ai is stalling are those who are too unimaginative to think up novel frameworks/applications.

Novel frameworks/applications are orthogonal to the AI stalling. The claim is the AI reasoning capabilities are stalling, not the AI applications

Expand full comment

That's a different claim and not the one that I see most often when they say AI is hitting a wall

Expand full comment

Hi Rohit – This certainly isn’t my domain, so feel free to ignore this comment if it’s dumb. Perhaps it is.

But when I think of “innovation”, I sort of think of it in two categories. The first is just applying newly discovered techniques to areas that have not yet benefited from those new methods. To me, your “More Data” and “Synthetic Data” fall into this category. It’s really just applying more money and time to techniques we’re pretty sure will work.

But the “S-Curve” thing… which I think amounts to just new inventions… that seems to be a different matter. Doesn’t that rely on someone inventing a new technique?

When I listen to the Silicon Valley experts on the All-in Podcast, for example, they all seem so utterly certain that AI will reach new heights in the years to come. I agree, if you think of those heights in terms of your “More Data” and “Synthetic Data”. But on the other hand, when it comes to new algorithmic innovations, you just never know if that will happen or not until someone actually does it.

Geez… not only do the Silicon Valley types seem certain total AGI is going to happen, but they actually put short term time tables on it. Didn’t Marc Andreesen say it would be in two more years? How do you know when someone is going to invent something? Will quantum computers ever be able to isolate particles from nature efficiently enough to have lots of qubits? Will fusion energy ever happen? Will someone invent antigravity boots? You just don’t know until someone actually does it.

So the confidence that Silicon Valley has in AGI baffles me a bit. I’m definitely a believer in applying existing techniques to new domains, of course. But I’m not at all confident that someone will invent something that hasn’t been invented yet. And certainly not when that invention will occur.

You do make a good case though, for there being many lines of attack... lots of ways that innovation might take place. That certainly does make me think the odds of innovation do seem pretty good. I wish I could be confident that was a good thing though, as I'm not sure what the human spirit will be when intellectual capital is worth nothing.

Admittedly, this comment might not age well at all! :)

Expand full comment

I’m curious about what the improvement curve on LLMs looks like when compared to the scaling of investment. True there is still lots of progress to be made, but the timeframe of the original boom also saw a multi-order increase in the amount of money being spent on research.

Most businesses offering LLM technology operate at a loss. At the moment the net effect of the LLM boom on the economy is obviously positive, but at present only because it boosts market sentiment, not because it yields productivity at anywhere near the scale of dollars put in.

For the progress of LLM technology to continue at the current rate, it will eventually become necessary for it to justify itself by an increase in economic productivity. Replacing call centres and content farms won’t cut it.

Rohit, you’re showing us examples of how this equation can change. If LLMs are able to assist with cutting-edge research, that is a step toward positive feedback loops of innovation with AI at the center. It is even a step toward the dream/nightmare scenario a few people have warned about for decades: AI spearheading research into itself, improving its own capabilities, raising its own capital, buying its own infrastructure.

Up to now I’ve been bearish on two things: the ability of LLMs to meaningfully improve the human condition (and not just act as a band-aid for the escalating suffering caused by an absurd, Byzantine rules-driven order the average human being increasingly lacks the brain-power to navigate); and the ability of LLMs to destroy us all. I’m happy to see I might be wrong about the first one.

Expand full comment

Exponential investments with linear improvement is the human condition, and that provides enough money eventually to be worth it, as it always has

Expand full comment

o3 benchmarks though…

Expand full comment

have Rohit's conclusions been justified by the release of 03?

Expand full comment

> We already train using the raw data we have multiple times to learn better. (...) We can convert the data that we have into different formats in order to extract the most from it.

This sounds like saying "the low hanging fruits have been picked, expect modest incremental returns" with extra steps...

Expand full comment

It's true that the lowest hanging fruits have been picked, it's also true that we can expend effort in order to get more fruit.

Expand full comment

The problem with that is it usually takes a lot more effort for ever diminishing returns though, no?

Expand full comment

Yes, with the same techniques, until we break that and start a new s curve

Expand full comment

I don't think we'll see much innovation necessarily out of OpenAI any more. You are right that since GPT-4 it's all been fairly lackluster. Meanwhile open AI as a product company has a job of trying to convince us mostly otherwise. I'm not sure benchmarks or what Ilya says are the best guide posts to be honest. OpenAI marketing and reality are two very different things and smuggling the term AGI into the equation doesn't make the lack of progress anymore palatable. In fact it probably does the opposite.

Expand full comment

o1 is really good though

Expand full comment

So I was thinking of this excellent piece when I tested Anthropic Claude 3.5 OpenAI 4.o, and Gemini research pro on whether scholars of Reconstruction advised on the post WWII Marshall Plan. All wrote me beautiful compelling essays swiftly on of course the lessons learned from Reconstruction (here are many ways they are similar even though decades apart!) were applied to the Marshall Plan. But I pushed and there is no evidence in the vast archive of any lessons understood, no scholarship pointed to, no thinkers named, etc. But the desire to say yes! these things are similar! overrode attention to what it was I was really asking.

Expand full comment

Rohit, I enjoyed your post—your optimism about AI's trajectory is refreshing and resonates with much of what I’ve observed.

Marc Benioff made an important point in his interview with Kara Swisher (*On with Kara Swisher*, December 9, 2024) about leveraging generative AI effectively. From my perspective, Salesforce exemplifies a practical, grounded approach to AI: starting with manageable, well-defined tasks rather than relying on super-intelligent models. They’ve integrated foundation models—developed by others—into their workflows to enhance internal operations and customer-facing products.

While I know Marc is giving a sales pitch, his approach matches my experience.

This strategy reflects a broader trend I’ve noticed: AI’s real value lies not in massive leaps in intelligence but in its ability to scale the complexity of tasks it can manage. Over the past two years, I’ve seen systems like ChatGPT and Claude handle tasks that are 100 times larger than what was possible before, even if the models themselves haven’t become dramatically “smarter.”

Salesforce’s approach is particularly noteworthy because it demonstrates how to redeploy human workers to higher-value activities while automating simpler tasks. This balance of human-AI collaboration highlights the importance of creating environments where users can effectively interact with AI, escalating issues seamlessly when the model hits its limits.

I agree with your overall sentiment: we’re not hitting a wall but instead redefining the scope of what AI can accomplish. The challenge—and the opportunity—is in applying these tools in ways that maximize their utility without overreaching their current capabilities.

Expand full comment

OK - Good discussion, but if I look at things as a user of AI (which I am) not a researcher. I am focusing on a separate set of questions. I am concerned can the current implementation of a model I am using to the task I am looking to accomplish. If yes, then do it. If no, then new question is can some other different model do that now?

The issue if all the models can't do some tasks, I care about, is that things then become a waiting game, and I try again in 3 or 6 months. Repeat until I can do the desired tasks.

There are too many unknowns to make accurate predictions on speed of improvement if you are not working in the middle of one of the frontier development teams, from all that I can see!

Expand full comment

Really interesting piece. On the point about using O1 pro to read literature--I have seen a number of people make this claim. Yet you can't upload PDFs to it, and it can't browse the web. I assume the assumption here is that at some unknown future point OpenAI will provide those capabilities.

Expand full comment

You don't need to upload a PDF, you can often just ask for clarifications or information or even exposition. If it's entirely plot related or book related it might not be able to do, but for everything else it still opens up and enormous vista.

Expand full comment

Great piece 💚 🥃

Expand full comment

Many great points here. Regarding the saturation of benchmarks, one could draw an analogy to how human performance on standardized tests tends to correlate well with career prospects and job performance later in life, but isn't deterministic. (i.e there is a lot of contextual knowledge even a brilliant human test-taker will need to pick up before becoming becoming a productive member of the workforce)

The ultimate benchmarks then are whatever managers (and ultimately customers) decide are important for evaluating output. This makes me even more convinced that good management will be more important than ever in the age of AI (which I wrote more about a few weeks ago): https://www.2120insights.com/i/151437867/managerial-jobs

Ethan Mollick also wrote a nice tactical piece back in October that outlines some of the management practices that leading companies will likely follow: https://www.oneusefulthing.org/p/ai-in-organizations-some-tactics

That said, if contextual knowledge is a human advantage now, will we still have a strong enough labor market later when the primary human advantage is just accountability for decision-making?

Expand full comment

I fully agree, just like human performance cannot be captured by benchmarks, the same principle applies here. You might appreciate this: https://www.strangeloopcanon.com/p/evaluations-are-all-we-need

Expand full comment

Very much so, thank you for sharing!

A lot to think about here. In the process of thinking about what it would take for AI to be capable of doing each job, I suspect we'll uncover more underappreciated subtleties in how humans do work.

Expand full comment

Yes, OK fine to all of that. But you're forgetting about the critique advanced perhaps most visibly by Gary Marcus. And if you find him annoying, well, he's not the only one offering that critique. There are others. For example, there's David Ferrucci, who headed up IBM's Watson project and is now quietly developing hybrid technology with his own company, Elemental Cognition (funded in part by Ray Diallo). He doesn't say much publicly, but that's because he's putting his time, effort, and money where is mouth is, on developing classical symbolic technology in tandem with neural nets.

Scaling on machine learning, even with the added oomph of inference scaling, won't take us all the way up Mount AGI or off to planet Super-intelligence. But then it doesn't have to. Even if the Big Boyz and their Big Money are too mesmerized by machine learning in the neural mesh to pay attention to anything else, we are going to figure out how to link classical techniques with neural nets. In a way, that's what Hassabis and his team are doing.

The problem with symbolic programming is that it has to hand-coded. That takes time and few can do it well.

From my investigations it’s clear to me that there’s a lot of structure in those statistical neural nets and that that structure (most likely) tracks the structure of human cognition. The trick is going to be to come up with a way to programmatically track that structure, make it explicit, and then link that explicit structure back to the neural net. I haven’t got a clue about how that’s going to fall out.

Yet...

Expand full comment

Let’s connect!

Expand full comment

I buy the premise that there's more data out there -- plenty of it.

But was surprised not to see a deeper discussion (or even tip of the hat) to different types of benchmarks especially ones like those driving the ARC prize and the newer ones to come....

Expand full comment

Mostly only because I've written about that in the past. And even Arc itself isn't driving progresss but measuring it, albeit ina lossy fashion

Expand full comment

Thanks I'll see if I can find that article…

I'd wager that we might max out current benchmarks and still find AI wanting in critical ways... that's why bring up alternative benchmarks like ARC (and other ones in the pipeline) are interesting.

Expand full comment

I suggest evaluations are all we need essay

Expand full comment