The impressive success of current "Foundation" or "Frontier" models, such as GPT-4 and Bard, has led some to claim that we are on our way to the promised land: artificial general intelligence (AGI). The idea is that, if we just keep growing these large language models, one will finally emerge as truly intelligent. Some have even gone so far as to say they already are AGI--that what they are doing is the long sought general intelligence we have been seeking for decades.
This view--that AGI is here--has received an influential defense by Blaise Aguera y Arcas and Peter Norvig, who argue in NOEMA that Artificial General Intelligence is here--at least in a nascent form. They contend we have passed the era of narrow AI and have finally encountered a machine that can perform "any information task" we might imagine, though imperfectly. We are still a long way from an artificial super intelligence, to be sure; but we have created an intelligence akin to our own, capable of performing a host of different actions, including some we haven't been trained on.
But is this accurate? Are we really closer to the goal? With Aguera y Arcas and Norvig, I agree the question shouldn't be a matter of semantics; if someone wants to say the General Problem Solver in 1957 was the first AGI, I don't really care about the term. The point is, rather, that AGI is a claim about capacities (as the authors note), and we should ask, are we really seeing a machine that can perform, at least in a limited form, any task? I think the answer is no--and it isn't clear where these Frontier models are on the path to AGI.
Carving an Ox
In the Zhuangzi, there is the story of Cook Ding, a butcher so impressive that people come to watch him carve up meat:
Cook Ding was carving an ox carcass for Lord Wenhui. With each touch of his hand, heave of his shoulder, step of his feet, thrust of his knee – whop! whish! – he wielded his knife with a whoosh, and every move was in rhythm. It was as though he were performing the Dance of the Mulberry Grove or keeping to the beat of the Constant Source music. “Ah, marvelous!” said Lord Wenhui. “Surely this is the acme of skill!
The Zhuangzi introduces Cook Ding as a counterpoint to a certain conception of knowledge as found in books. Ding is impressive precisely because he has achieved a kind of knowledge, know-how, that cannot be written down. It instead involves a kind of practice, patience, and keen insight, learning to figure out how to engage with the world.
In this regard, the Zhuangzi warns, language and linguistic knowledge is a kind of fool's gold--a fraudulent type of knowledge, rather than the genuine article. The text notes, "When the men of old died, they took with them the things that couldn’t be handed down. So what you are reading there [in your book] must be nothing but the chaff and dregs of the men of old.” If knowledge is know-how, it can't be found in books; words are just the wrong kinds of tools to carry that kind of knowledge.
Worse, they present knowledge and object as detached, separate from the humans who must know and engage with them. As Cook Ding complains, “At the beginning, when I first began carving up oxen, all I could see was the whole carcass [...but] now I meet it with my spirit and don’t look with my eyes. Perception and understanding cease and spirit moves as it will." If true knowledge is carefully embodied and embedded and engaged with the world as it is, the knowledge of books is separate and alien. It is not clear how to go from what we read to what we should do with it. The illusion, Zhuangzi warns, is that we know how to engage with the world from reading a book.
Frontier Models and the Illusions of Language
What should we take away from this tale? Part of it is that Frontier models are just the wrong kind of thing for knowing much of what there is to know about the world. No metric is perfect, to be sure, but can an AI that can't drive really count as that general? This has long been promised as one of the holy grails of narrow intelligence, and so we should expect a general AI to be able to pick up this skill. But Frontier models aren't even on the path to doing this. Shouldn't a standard of AGI be that it can pick up the narrow skills itself?
To be sure, this is part of what we expect out of humans. A human is expected not just to do well on the chemistry exam but also on the practicum. Should we be impressed with models who are failing the course because they aren't even showing up to the lab? Adding a camera to the model might be helpful, but you only get partially credit for recognizing a beaker or Bunsen burner. Full credit expects you to use them.
More broadly, we should ask, what is it these models have learned from all the books they've consumed? We should be explicit: they haven't read a word of them, in the sense we mean. In part, this is because reading, for a human, involves all the background knowledge that makes up understanding the world. It is because we already have a good grasp of the nature of things that reading about Jack and Jill and falling down hills is intuitive. And we need to know a great deal to follow the instructions of a chemistry text.
But, in larger part, it is because we don't train Frontier models to read at all. We train them to memorize. They fail at memorizing because there is too much presented too piecemeal, but this produces novel generalizations--helping the model fill in the gaps and figure how to continue a passage. In the arithmetic case, this means it can't memorize every single possible addition problem, so it instead comes up with the appropriate generalization--with the appropriate way to fill in the answer when given the question. The same is happening all across the board, allowing these models to acquire the knowledge for completing all kinds of sentences in the right way.
This is impressive and a general skill, and thus it has an AGI feel to it. But we need to be clear: it is learning how to say the thing. The goal is for the model to be able to answer questions with linguistic answers: you ask for an answer to a math question, it can write an answer. But many skills can't be learned or demonstrated in language. What we wanted was to feed the model a chemistry textbook and it would suddenly know how to do chemistry. But, instead, it took in the chemistry textbook and learned to spit back out a garbled chemistry textbook. We didn't want that. We wanted it to take in language in order to knowhow to use it, not how to repeat it back to us.
One rebuttal might be, "well, Frontier models have lights and sounds, so they are getting closer to doing." Which might be true, but the proof will be in the pudding (as they say). If we ask the model to make us an egg and it instead reads us a recipe, it fails. If, instead, it shows us a video, it still failed--even if the failure might impress us more than just the recipe. Someone might come back and say, "well, it is succeeding at the information tasks, if not the non-informational task." But that is a narrow kind of task. It is basically saying, "it can play around on the internet, type up copy for your website, and program something simple for you." But if that is general intelligence, it is a very low bar.
A Machine that Can Do
Why might we think this low bar is general intelligence, since it isn't getting us any closer to egg-making or car-driving? These "information tasks" are only a small fraction of what humans can do. But, to be sure, it is an elite fraction of what humans can do. If you're a white-collar worker, most of what you do is "information tasks": writing, programming, talking to people, filling in spreadsheets, and so on. It's a lot of the stuff we learn in school by reading the "chaff and dregs of the men of old."
It is kinda nice that the white-collar are finally seeing their jobs automated and freaking out about it. Blue-collar jobs have been automated for centuries without much complaint by the bourgeois. But white-collar work isn't always general intelligence. Sometimes it is managing, which just means commanding others to do narrow tasks that require know-how. The white-collar worker can type up a memo to tell the construction workers what their tasks are, but you kinda hope the construction worker knows how to build a house. And frontier models aren't building houses, but they can spit out the steps for it.
This should be humbling (though white-collar workers rarely feel humble). Humans were building shelters before even the first real "information task" came about. Figuring out how to build a workable structure relied on an enormous background knowledge of the world, one which even today can't really be spelled out in language. And this kind of knowledge, in bits and pieces, is seen throughout the animal kingdom. So there is plenty of knowledge that took millions of years for evolution to tease out. It is complicated and nuanced and really difficult. Lots of "general intelligence" happened before the first word existed.
Which is all to say, Frontier models are doing the easy stuff--the stuff humans invented in the last (roughly) 50,000 years. It was super difficult to invent. The first human (or, more accurately, collection of humans) to come up with a map or a drawing did something incredible, and the first written recipe would have seemed magic. But these maps and drawing and recipes are simple, easy to learn, and child's play to re-create. The inventing is complex, but not the using or re-using. That stuff can be picked up quickly and turned around rapidly.
Frontier models are users and re-users. They are taking the difficult stuff humans put a lot of thought and background knowledge and skill into inventing, and then repeating it. It's neat, but this really is the easy stuff. In a sense, you can think of Frontier models as bad managers: they know how to say all the things and tell you what to do, but they don't know how to do the things. No carving ox, driving cars, running experiments, building houses, or inventing novelty. And it isn't clear AI is any closer to doing those things with the Frontier models.
What we wanted from AGI is something like blue-collar worker who learns all the tricks and then works their way up to management. They know all the embodied and embedded stuff, and now are capable not just of doing things but also explaining it. We wanted engineers to turn into teachers. What we didn't want is one that can explain it but not do it. We got the wrong thing, the prolix guy who babbles about things they have no experience doing. And that just isn't AGI.
Comments