imo the ones who think ai is stalling are those who are too unimaginative to think up novel frameworks/applications. with current LLMs we can already automate the vast majority of the annoying processes that plague our minds in corpo jobs but we just dont yet have the integrations.
in a lot of cases where ppl are mad that AI cant solve their problems they are primarily mad that AI can't understand what they are saying/asking for because they themselves haven't articulated it well enough or broken it down well enough.
> We already train using the raw data we have multiple times to learn better. (...) We can convert the data that we have into different formats in order to extract the most from it.
This sounds like saying "the low hanging fruits have been picked, expect modest incremental returns" with extra steps...
I don't think we'll see much innovation necessarily out of OpenAI any more. You are right that since GPT-4 it's all been fairly lackluster. Meanwhile open AI as a product company has a job of trying to convince us mostly otherwise. I'm not sure benchmarks or what Ilya says are the best guide posts to be honest. OpenAI marketing and reality are two very different things and smuggling the term AGI into the equation doesn't make the lack of progress anymore palatable. In fact it probably does the opposite.
So I was thinking of this excellent piece when I tested Anthropic Claude 3.5 OpenAI 4.o, and Gemini research pro on whether scholars of Reconstruction advised on the post WWII Marshall Plan. All wrote me beautiful compelling essays swiftly on of course the lessons learned from Reconstruction (here are many ways they are similar even though decades apart!) were applied to the Marshall Plan. But I pushed and there is no evidence in the vast archive of any lessons understood, no scholarship pointed to, no thinkers named, etc. But the desire to say yes! these things are similar! overrode attention to what it was I was really asking.
Rohit, I enjoyed your post—your optimism about AI's trajectory is refreshing and resonates with much of what I’ve observed.
Marc Benioff made an important point in his interview with Kara Swisher (*On with Kara Swisher*, December 9, 2024) about leveraging generative AI effectively. From my perspective, Salesforce exemplifies a practical, grounded approach to AI: starting with manageable, well-defined tasks rather than relying on super-intelligent models. They’ve integrated foundation models—developed by others—into their workflows to enhance internal operations and customer-facing products.
While I know Marc is giving a sales pitch, his approach matches my experience.
This strategy reflects a broader trend I’ve noticed: AI’s real value lies not in massive leaps in intelligence but in its ability to scale the complexity of tasks it can manage. Over the past two years, I’ve seen systems like ChatGPT and Claude handle tasks that are 100 times larger than what was possible before, even if the models themselves haven’t become dramatically “smarter.”
Salesforce’s approach is particularly noteworthy because it demonstrates how to redeploy human workers to higher-value activities while automating simpler tasks. This balance of human-AI collaboration highlights the importance of creating environments where users can effectively interact with AI, escalating issues seamlessly when the model hits its limits.
I agree with your overall sentiment: we’re not hitting a wall but instead redefining the scope of what AI can accomplish. The challenge—and the opportunity—is in applying these tools in ways that maximize their utility without overreaching their current capabilities.
OK - Good discussion, but if I look at things as a user of AI (which I am) not a researcher. I am focusing on a separate set of questions. I am concerned can the current implementation of a model I am using to the task I am looking to accomplish. If yes, then do it. If no, then new question is can some other different model do that now?
The issue if all the models can't do some tasks, I care about, is that things then become a waiting game, and I try again in 3 or 6 months. Repeat until I can do the desired tasks.
There are too many unknowns to make accurate predictions on speed of improvement if you are not working in the middle of one of the frontier development teams, from all that I can see!
Really interesting piece. On the point about using O1 pro to read literature--I have seen a number of people make this claim. Yet you can't upload PDFs to it, and it can't browse the web. I assume the assumption here is that at some unknown future point OpenAI will provide those capabilities.
You don't need to upload a PDF, you can often just ask for clarifications or information or even exposition. If it's entirely plot related or book related it might not be able to do, but for everything else it still opens up and enormous vista.
Many great points here. Regarding the saturation of benchmarks, one could draw an analogy to how human performance on standardized tests tends to correlate well with career prospects and job performance later in life, but isn't deterministic. (i.e there is a lot of contextual knowledge even a brilliant human test-taker will need to pick up before becoming becoming a productive member of the workforce)
The ultimate benchmarks then are whatever managers (and ultimately customers) decide are important for evaluating output. This makes me even more convinced that good management will be more important than ever in the age of AI (which I wrote more about a few weeks ago): https://www.2120insights.com/i/151437867/managerial-jobs
That said, if contextual knowledge is a human advantage now, will we still have a strong enough labor market later when the primary human advantage is just accountability for decision-making?
A lot to think about here. In the process of thinking about what it would take for AI to be capable of doing each job, I suspect we'll uncover more underappreciated subtleties in how humans do work.
Yes, OK fine to all of that. But you're forgetting about the critique advanced perhaps most visibly by Gary Marcus. And if you find him annoying, well, he's not the only one offering that critique. There are others. For example, there's David Ferrucci, who headed up IBM's Watson project and is now quietly developing hybrid technology with his own company, Elemental Cognition (funded in part by Ray Diallo). He doesn't say much publicly, but that's because he's putting his time, effort, and money where is mouth is, on developing classical symbolic technology in tandem with neural nets.
Scaling on machine learning, even with the added oomph of inference scaling, won't take us all the way up Mount AGI or off to planet Super-intelligence. But then it doesn't have to. Even if the Big Boyz and their Big Money are too mesmerized by machine learning in the neural mesh to pay attention to anything else, we are going to figure out how to link classical techniques with neural nets. In a way, that's what Hassabis and his team are doing.
The problem with symbolic programming is that it has to hand-coded. That takes time and few can do it well.
From my investigations it’s clear to me that there’s a lot of structure in those statistical neural nets and that that structure (most likely) tracks the structure of human cognition. The trick is going to be to come up with a way to programmatically track that structure, make it explicit, and then link that explicit structure back to the neural net. I haven’t got a clue about how that’s going to fall out.
imo the ones who think ai is stalling are those who are too unimaginative to think up novel frameworks/applications. with current LLMs we can already automate the vast majority of the annoying processes that plague our minds in corpo jobs but we just dont yet have the integrations.
in a lot of cases where ppl are mad that AI cant solve their problems they are primarily mad that AI can't understand what they are saying/asking for because they themselves haven't articulated it well enough or broken it down well enough.
>imo the ones who think ai is stalling are those who are too unimaginative to think up novel frameworks/applications.
Novel frameworks/applications are orthogonal to the AI stalling. The claim is the AI reasoning capabilities are stalling, not the AI applications
That's a different claim and not the one that I see most often when they say AI is hitting a wall
have Rohit's conclusions been justified by the release of 03?
> We already train using the raw data we have multiple times to learn better. (...) We can convert the data that we have into different formats in order to extract the most from it.
This sounds like saying "the low hanging fruits have been picked, expect modest incremental returns" with extra steps...
It's true that the lowest hanging fruits have been picked, it's also true that we can expend effort in order to get more fruit.
The problem with that is it usually takes a lot more effort for ever diminishing returns though, no?
Yes, with the same techniques, until we break that and start a new s curve
I don't think we'll see much innovation necessarily out of OpenAI any more. You are right that since GPT-4 it's all been fairly lackluster. Meanwhile open AI as a product company has a job of trying to convince us mostly otherwise. I'm not sure benchmarks or what Ilya says are the best guide posts to be honest. OpenAI marketing and reality are two very different things and smuggling the term AGI into the equation doesn't make the lack of progress anymore palatable. In fact it probably does the opposite.
o1 is really good though
So I was thinking of this excellent piece when I tested Anthropic Claude 3.5 OpenAI 4.o, and Gemini research pro on whether scholars of Reconstruction advised on the post WWII Marshall Plan. All wrote me beautiful compelling essays swiftly on of course the lessons learned from Reconstruction (here are many ways they are similar even though decades apart!) were applied to the Marshall Plan. But I pushed and there is no evidence in the vast archive of any lessons understood, no scholarship pointed to, no thinkers named, etc. But the desire to say yes! these things are similar! overrode attention to what it was I was really asking.
Rohit, I enjoyed your post—your optimism about AI's trajectory is refreshing and resonates with much of what I’ve observed.
Marc Benioff made an important point in his interview with Kara Swisher (*On with Kara Swisher*, December 9, 2024) about leveraging generative AI effectively. From my perspective, Salesforce exemplifies a practical, grounded approach to AI: starting with manageable, well-defined tasks rather than relying on super-intelligent models. They’ve integrated foundation models—developed by others—into their workflows to enhance internal operations and customer-facing products.
While I know Marc is giving a sales pitch, his approach matches my experience.
This strategy reflects a broader trend I’ve noticed: AI’s real value lies not in massive leaps in intelligence but in its ability to scale the complexity of tasks it can manage. Over the past two years, I’ve seen systems like ChatGPT and Claude handle tasks that are 100 times larger than what was possible before, even if the models themselves haven’t become dramatically “smarter.”
Salesforce’s approach is particularly noteworthy because it demonstrates how to redeploy human workers to higher-value activities while automating simpler tasks. This balance of human-AI collaboration highlights the importance of creating environments where users can effectively interact with AI, escalating issues seamlessly when the model hits its limits.
I agree with your overall sentiment: we’re not hitting a wall but instead redefining the scope of what AI can accomplish. The challenge—and the opportunity—is in applying these tools in ways that maximize their utility without overreaching their current capabilities.
OK - Good discussion, but if I look at things as a user of AI (which I am) not a researcher. I am focusing on a separate set of questions. I am concerned can the current implementation of a model I am using to the task I am looking to accomplish. If yes, then do it. If no, then new question is can some other different model do that now?
The issue if all the models can't do some tasks, I care about, is that things then become a waiting game, and I try again in 3 or 6 months. Repeat until I can do the desired tasks.
There are too many unknowns to make accurate predictions on speed of improvement if you are not working in the middle of one of the frontier development teams, from all that I can see!
Really interesting piece. On the point about using O1 pro to read literature--I have seen a number of people make this claim. Yet you can't upload PDFs to it, and it can't browse the web. I assume the assumption here is that at some unknown future point OpenAI will provide those capabilities.
You don't need to upload a PDF, you can often just ask for clarifications or information or even exposition. If it's entirely plot related or book related it might not be able to do, but for everything else it still opens up and enormous vista.
Great piece 💚 🥃
Many great points here. Regarding the saturation of benchmarks, one could draw an analogy to how human performance on standardized tests tends to correlate well with career prospects and job performance later in life, but isn't deterministic. (i.e there is a lot of contextual knowledge even a brilliant human test-taker will need to pick up before becoming becoming a productive member of the workforce)
The ultimate benchmarks then are whatever managers (and ultimately customers) decide are important for evaluating output. This makes me even more convinced that good management will be more important than ever in the age of AI (which I wrote more about a few weeks ago): https://www.2120insights.com/i/151437867/managerial-jobs
Ethan Mollick also wrote a nice tactical piece back in October that outlines some of the management practices that leading companies will likely follow: https://www.oneusefulthing.org/p/ai-in-organizations-some-tactics
That said, if contextual knowledge is a human advantage now, will we still have a strong enough labor market later when the primary human advantage is just accountability for decision-making?
I fully agree, just like human performance cannot be captured by benchmarks, the same principle applies here. You might appreciate this: https://www.strangeloopcanon.com/p/evaluations-are-all-we-need
Very much so, thank you for sharing!
A lot to think about here. In the process of thinking about what it would take for AI to be capable of doing each job, I suspect we'll uncover more underappreciated subtleties in how humans do work.
Yes, OK fine to all of that. But you're forgetting about the critique advanced perhaps most visibly by Gary Marcus. And if you find him annoying, well, he's not the only one offering that critique. There are others. For example, there's David Ferrucci, who headed up IBM's Watson project and is now quietly developing hybrid technology with his own company, Elemental Cognition (funded in part by Ray Diallo). He doesn't say much publicly, but that's because he's putting his time, effort, and money where is mouth is, on developing classical symbolic technology in tandem with neural nets.
Scaling on machine learning, even with the added oomph of inference scaling, won't take us all the way up Mount AGI or off to planet Super-intelligence. But then it doesn't have to. Even if the Big Boyz and their Big Money are too mesmerized by machine learning in the neural mesh to pay attention to anything else, we are going to figure out how to link classical techniques with neural nets. In a way, that's what Hassabis and his team are doing.
The problem with symbolic programming is that it has to hand-coded. That takes time and few can do it well.
From my investigations it’s clear to me that there’s a lot of structure in those statistical neural nets and that that structure (most likely) tracks the structure of human cognition. The trick is going to be to come up with a way to programmatically track that structure, make it explicit, and then link that explicit structure back to the neural net. I haven’t got a clue about how that’s going to fall out.
Yet...
Thanks!