Generative AI: the Anything from Anything Machine
Platonic realms, latent spaces and helping serve our collective wants
A large part of human effort is to change one thing to another. We read reports and summarise them. We experience events and write reports about them. We take screenplays and make movies from them. We think about something and write symphonies. We read poetry and create music. We create beautiful art, learn from them, and create even more!
These are all efforts to get to the bottom of who we are as humans. We are asked, in our jobs and in our studies, to constantly change a thing to something else.
And that is what’s being created by the generative AI revolution going on around us.
And it’s good! We can move codes, requirements, images, videos, music, essays, maybe someday even books, back and forth.
We’ve built an anything from anything machine. It’s not nearly perfect nor even really good enough for government work, but this is the trajectory.
It will help us paint something if we want it to, to help create a movie or a song, to write an essay, to summarise academic literature or read books, to figure out the points or counterpoints for any argument, to code and to write documentation, in general be able to convert anything to anything.
Imagine if you could fund your own startup. You hire an engineer and a designer and a marketer and an operations person and y’all sit in a room trying to make sure one person’s work is seamlessly linked into another’s. The marketing pitch has to lead into the actual product being coded. The requirements from the user has to be a doc that has to become the codebase. The user feedback has to link to the design document that has to link to the product.
But as anyone who’s dealt with this menagerie knows, running interference between these folks is often a logistical nightmare, filled with endless streams of misunderstandings and petty feuds. But if all these voices were inside the same person, things could be better. That’s what this is. If you get to link your inner specialised homunculi with all these capabilities to each other with your anything-from-anything machine, this becomes a whole lot simpler.
Noah and roon had an article about how this would mean more possibilities for the human user, due to comparative advantage. They envision a “sandwich workflow”, where the human asks for something she wants, AI creates a few versions or options, and the human selects her favourite to edit and use.
I think this is feasible, but it’s worth looking at why this might be the case. Today it exists because the AIs are not perfect out of the box. The question is if it will understand what we’re asking well enough to create what we wanted, or create for us a pastiche that sits in an uncanny valley somewhere.
But in front of us lay three eras, as the anything from anything machine evolves.
The first era
This, if true, gives us some sense of how companies built from this might look like. To use this anything to anything power still requires humans to help other humans figure out what they want. The best companies are likely to be consulting firms, tech enabled and AI powered, but consulting nonetheless.
It will continue to get used by plenty of folks who want to either roll their own, or pay the cloud computing costs to run it plus a markup. Most softwares will start integrating parts of it into its product, from Replit to Github to Canva to Word.
But for most enterprise applications, what’s needed is someone who can coax a behaviour that doesn’t come out of the box. Also, because the outcomes aren’t often 100% reliable, as of yet, it will need human help. It’s a way to solve problems, except it can’t always be just provided as a response.
Yes AI can supercharge making characters and animations in games, but to figure out what to choose and how to combine things will need someone’s help. Partly because of the data that’s needed including IP and the effort to get an output that people want.
The foundational models quickly get open sourced, and the competition remains primarily in the fine-tuning, training and product features arena, very much like the cloud applications war in the 2010s.
As the competition is likely to be over go-to-market and product features, we’ll see some large companies emerge, though I suspect unlike the saas wave, these companies will have even less staying power. For one thing their hard-won finetuned capabilities might come out of the box on the next foundational model release. And for another, the software that companies are making $100m ARR from are also the same software that others are building over their nights and weekends - the moat is awfully thin.
The second era
Maybe our edge will be in those edge cases. Or in creating truly long-form pieces which so far seems beyond the AIs ken.
Or maybe not, those edges too will get sanded away.
My favourite example here is text to speech. It’s a technology I’ve been playing with for the past decade and more. It was horrendous for me, as a speaker with a mixed Indian, British, American accent borne of watching too many movies and reading too much Wodehouse. Over time though, things have gotten good enough that the edge cases, while annoying, only crop up when I ask Google about the eating habits of quokkas. Much better.
As the edges get sanded away, the utility increases dramatically, and the two pieces of sandwich fall away, leaving only the filling. So to speak.
This would mean that a lot more of the cognitive work that gets done can finally be automated, removing the bullshit jobs as Graeber called it, the creative recombination jobs as McKinsey calls it, or the general “making your manager happy” jobs as I call it.
When there is a fundamentally new capability that emerges, unless there’s some sort of clear ecosystem lock-in, the competitive advantage will diminish rapidly. I’ve written about species emergence based on various niches, and this is a common theme.
So what’s in store for us, considering we have an anything-from-anything machine which looks like it’ll keep improving.
For one thing, the models today are idiot savants. They’re neither capable of jumping across domains very easily, nor are they able to explore the latent space all that well on their own. However, this is not criticism. This is an opportunity. The next step is clearly to string together multiple models, so the anything-from-anything machine gets true breadth. It’s starting, albeit in muted fashion, and with plenty of manual finetuning and parameter selection left. It’s alchemy at this point, not chemistry.
The third era
And the next step then is training AI itself to string together multiple models better, that too feels like the natural extension of where we’re heading.
By itself it might not be sufficient, since in order to fully explore its capabilities, we’d also need to be able to widely deploy agents in multiple scenarios without necessarily running into hard physical energy use constraints. And so once this is possible enough that the power consumption is drastically reduced, that’s likely to be where things explode.
That’s when we’ll have to grapple with the fact that a large percentage of what our jobs entail is surfing the nether regions of our collective unconscious, isolating and making real our desires. Business is the way we’ve found to explore this and give ourselves the things that we need.
Will we be able to automate this away? I don’t think so. No matter how you attempt it, the idea of providing value to each other by better serving each other’s needs will remain a mainstay of our culture for quite a while. Even if we are able to create autonomous if-this-then-that loops to run complex tasks of coordination and communication, it needs some level of direction or intent, not to mention some course correction, which is what we normally supply.
Will it spin out of control after this era? Leading us into a valley of death, or maybe utopia? Maybe, maybe not, but that’s an essay for another day (coming soon). Dealing with the real world is neither as simple nor as interesting for any number of linked-together anything-from-anything models.
And if all of this plays out, and there isn’t much of a role for enough of us in the “sandwich tasks” mentality, either we will upgrade ourselves, evolve beyond the need to do them, or just enjoy the fruits of our inventions as Keynes would have us do. That might be the utopia we should shoot for. Until then, there’s things to do!
An example of where AI generated prompts result in awesome AI generated images:
https://twitter.com/GuyP/status/1598020781065527296?s=20&t=sODodeedRrIlUZtBcV-SqQ
Enjoyed this read. I'm always on the look out for business models and frameworks around AI. Through the power of using multiple AI services, I'm able to efficiently run a basic solostudio. I can write a song fast, write copy to sell it, make graphics, spin up a landing page, and create a promotional video with the help of AI all in under a day.
This wasn't possible until this year. I'm sure things will continue to improve little by little in the future. I can imagine all-in-one services that can do most of the above under one label (that seems to be what the GitHub repo you linked is working toward) but I'm not sure what the business model looks like for people with that capability. We'll have to come up with new value models to adjust to it.