Intro. [Recording date: April 16, 2023.]
Russ Roberts: Today is April 16th, 2023 and my guest is Eliezer Yudkowsky. He is the founder of the Machine Intelligence Research Institute, the founder of the LessWrong blogging community, and is an outspoken voice on the dangers of artificial general intelligence, which is our topic for today. Eliezer, welcome to EconTalk.
Eliezer Yudkowsky: Thanks for having me.
Russ Roberts: You recently wrote an article at Time.com on the dangers of AI [Artificial Intelligence]. I’m going to quote a central paragraph. Quote:
Many researchers steeped in these issues, including myself, expect that the most likely result of building a superhumanly smart AI, under anything remotely like the current circumstances, is that literally everyone on Earth will die. Not as in “maybe possibly some remote chance,” but as in “that is the obvious thing that would happen.” It’s not that you can’t, in principle, survive creating something much smarter than you; it’s that it would require precision and preparation and new scientific insights, and probably not having AI systems composed of giant inscrutable arrays of fractional numbers.
Explain.
Eliezer Yudkowsky: Um. Well, different people come in with different reasons as to why they think that wouldn’t happen, and if you pick one of them and start explaining those, everybody else is, like, ‘Why are you talking about this irrelevant thing instead of the thing that I think is the key question?’ Whereas, if somebody else asked you a question, even if it’s not everyone in the audience’s question, they at least know you’re answering the question that’s been asked.
So, I could maybe start by saying why I expect stochastic gradient descent as an optimization process, even if you try to take something that happens in the outside world and press the win/lose button any time that thing happens and the outside world doesn’t create a mind that in general wants that thing to happen in the outside world, but maybe that’s not even what you think the core issue is. What do you think the core issue here is? Why don’t you already believe that? Let me say.
Russ Roberts: Okay. I’ll give you my view, which is rapidly changing. We interviewed–“we”–it’s the royal We. I interviewed Nicholas Bostrom back in 2014. I read his book, Superintelligence. I found it uncompelling. ChatGPT [Chat Generative Pretrained Transformer] came along. I tried it. I thought it was pretty cool. ChatGPT-4 came along. I haven’t tried 5 yet, but it’s clear that the path of progress is radically different than it was in 2014. The trends are very different. And I still remained somewhat agnostic and skeptical, but I did read Eric Hoel’s essay and then interviewed him on this program and a couple things he wrote after that.
The thing I think I found most alarming was a metaphor–that I found later Nicholas Bostrom used almost the same metaphor, and yet it didn’t scare me at all when I read it in Nicholas Bostrom. Which is fascinating. I may have just missed it. I didn’t even remember it was in there. The metaphor is primitive. Zinjanthropus man or some primitive form of pre-Homo sapiens sitting around a campfire and human being shows up and says, ‘Hey, I got a lot of stuff I can teach you.’ ‘Oh, yeah. Come on in,’ and pointing out that it’s probable that we are either destroyed directly by murder or maybe just by out-competing all the previous hominids that came before us, and that in general, you wouldn’t want to invite something smarter than you into the campfire.
I think Bostrom has a similar metaphor, and that metaphor–which is just a metaphor–it gave me more pause than I even before. And I still had some–let’s say most of my skepticism remains that the current level of AI, which is extremely interesting, the ChatGPT variety, doesn’t strike me as itself dangerous.
Eliezer Yudkowsky: I agree.
Russ Roberts: What alarmed me was Hoel’s point that we don’t understand how it works, and that surprised me. I didn’t realize that. I think he’s right. So, that combination of ‘we’re not sure how it works,’ while it appears sentient, I do not believe it is sentient at the current time. I think some of my fears about its sentience come from its ability to imitate sentient creatures. But, the fact that we don’t know how it works and it could evolve capabilities we did not put in it–emergently–is somewhat alarming.
But I’m not where you’re at. So, why are you where you’re at and I’m where I’m at?
Eliezer Yudkowsky: Okay. Well, suppose I said that they’re going to keep iterating on the technology. It may be that this exact algorithm and methodology suffice as to, as I would put it, go all the way–get smarter than us and then to kill everyone. And, like, maybe you don’t think that it’s going to–and maybe it takes an additional zero to three fundamental algorithmic breakthroughs before we get that far, and then it kills everyone. So, like, where are you getting off this train so far?
Russ Roberts: So, why would it kill us? Why would it kill us? Right now, it’s really good at creating a very, very thoughtful condolence note or a job interview request that takes much less time. And, I’m pretty good at those two things, but it’s really good at that. How’s it going to get to try to kill us?
Eliezer Yudkowsky: Um. So, there’s a couple of steps in that. One step is, in general and in theory, you can have minds with any kind of coherent preferences, coherent desires that are coherent, stable, stable under reflection. If you ask them, ‘Do they want to be something else,’ they answer, ‘No.’
You can have minds–well, the way I sometimes put it is imagine if a super-being from another galaxy came here and offered you to pay you some unthinkably vast quantity of wealth to just make as many paperclips as possible. You could figure out, like, which plan leaves the greatest number of paperclips existing. If it’s coherent to ask how you could do that if you were being paid, it’s like no more difficult to have a mind that wants to do that and makes plans like that for their own sake than the planning process itself. Saying that the mind wants a thing for its own sake adds no difficulty to the nature of the planning process that figures out how to get as many paperclips as possible.
Some people want to pause there and say, ‘How do you know that is true?’ For some people, that’s just obvious. Where are you so far on the train?
Russ Roberts: So, I think your point of that example you’re saying is that consciousness–let’s put that to the side. That’s not really the central issue here. Algorithms have goals, and the kind of intelligence that we’re creating through neural networks might generate its own goals, might decide–
Russ Roberts: Go ahead.
Eliezer Yudkowsky: Some algorithms have goals. One is the–so, a further point, which isn’t the orthogonality thesis, is if you grind, optimize anything hard enough on a sufficiently complicated sort of problem, well, humans–like, why do humans have goals? Why don’t we just run around chipping flint hand axes and outwitting other humans? The answer is because having goals turns out to be a very effective way to chip[?] flint hand axes, when once you get far enough into the mammalian line or even the animals and brains in general, that there’s a thing that models reality and asks, ‘How do I navigate pass-through reality?’ Like, not in terms of big formal planning process, but if you’re holding a flint hand ax, you’re looking at it and being like, ‘Ah, this section is too smooth. Well, if I chip this section, it will get sharper.’
Probably you’re not thinking about goals very hard by the time you’ve practiced a bit. When you’re just starting out forming the skill, your reasoning about, ‘Well, if I do this, that will happen.’ This is just a very effective way of achieving things in general. So, if you take an organism running around the savannah and just optimize it for flint hand axes and probably much more importantly outwitting its fellow hominids, if you grind that hard enough, long enough, you eventually cough out a species whose competence starts to generalize very widely. It can go to the moon even though you never selected it via an incremental process to get closer and closer to the moon. It just goes to the moon, one shot. Does that answer your central question that you are asking just then?
Russ Roberts: No.
Eliezer Yudkowsky: No. Okay.
Russ Roberts: Not yet. But let’s try again.
Russ Roberts: The paperclip example, which in its dark form, the AI wants to harvest kidneys because it turns out there’s some way to use that to make more paperclips. So, the other question is–and you’ve written about this, I know, so let’s go into it–is: How does it get outside the box? How does it go from responding to my requests to doing its own thing and doing it out in the real world, right? Not just merely doing it in virtual space?
Eliezer Yudkowsky: So, there’s two different things you could be asking there. You could be asking: How did it end up wanting to do that? Or: Given that it ended up wanting to do that, how did it succeed? Or maybe even some other question. But, like, which of those would you like me to answer or would you like me to answer something else entirely?
Russ Roberts: No, let’s ask both of those.
Eliezer Yudkowsky: In order?
Russ Roberts: Sure.
Eliezer Yudkowsky: All right. So, how did humans end up wanting something other than inclusive genetic fitness? Like, if you look at natural selection as an optimization process, it grinds very hard on a very simple thing, which isn’t so much survival and isn’t even reproduction, but is rather like greater gene frequency. Because greater gene frequency is the very substance of what is being optimized and how it is being optimized.
Natural selection is the mirror observation that if genes correlate with making more or less copies of themselves at all, if you hang around it awhile, you’ll start to see things that made more copies of themselves the next generation.
Gradient descent is not exactly like that, but they’re both hill-climbing processes. They both move to neighboring spaces that are higher inclusive genetic fitness, lower in the loss function.
And yet, humans, despite being optimized exclusively for inclusive genetic fitness, want this enormous array of other things. Many of the things that we take now are not so much things that were useful in the ancestral environment, but things that further maximize goals whose optima in the ancestral environment would have been useful. Like, ice cream. It’s got more sugar and fat than most things you would encounter in the ancestral environment. Well, more sugar, fat, and salt simultaneously, rather.
So, it’s not something that we evolved to pursue, but genes coughed out these desires, these criteria that you can steer toward getting more of. Where, in the ancestral environment, if you went after things in the ancestral environment that tasted fatty, tasted salty, tasted sweet, you’d thereby have more kids–or your sisters would have more kids–because the things that correlated to what you want, as those correlations existed in the ancestral environment, increased fitness.
So, you’ve got, like, the empirical structure of what correlates to fitness in the ancestral environment; you end up with desires such that by optimizing them in the ancestral environment at that level of intelligence, when you get as much as what you have been built to want, that will increase fitness.
And then today, you take the same desires and we have more intelligence than we did in the training distribution–metaphorically speaking. We used our intelligence to create options that didn’t exist in the training distribution. Those options now optimize our desires further–the things that we were built to psychologically internally want–but that process doesn’t necessarily correlate to fitness as much because ice cream isn’t super-nutritious.
Russ Roberts: Whereas the ripe peach was better for you than the hard-as-a-rock peach that had no nutrients because it was not ripened, so you developed a sweet tooth and now it leads you runs amok–unintendedly–it’s just the way it is.
Russ Roberts: What does that have to do with a computer program I create that helps me do something on my laptop?
Eliezer Yudkowsky: I mean, if you yourself write a short Python program that alphabetizes your files or something–not quite alphabetizes because that’s trivial on the modern operating systems–but puts the date into the file names, let’s say. So, when you write a short script like that, nothing I said carries over.
When you take a giant, inscrutable set of arrays of floating point numbers and differentiate them with respect to a loss function, and repeatedly nudge the giant, inscrutable array to drive the loss function lower and lower, you are now doing something that is more analogous, though not exactly analogous, to natural selection. You are no longer creating a code that you model inside your own minds. You are blindly exploring a space of possibilities where you don’t understand the possibilities and you’re making things that solve the problem for you without understanding how they solve the problem.
This itself is not enough to create things with strange, inscrutable desires, but it’s Step One.
Russ Roberts: But that–but there is–I like that word ‘inscrutable.’ There’s an inscrutability to the current structure of these models, which is, I found, somewhat alarming. But how’s that going to get to do things that I really don’t like or want or that are dangerous?
So, for example, Eric Hoel wrote about this–we talked about it on the program–a New York Times reporter starts interacting with, I think with Sydney–which at the time was Bing’s chatbot–and asking it things. And all of a sudden Sydney is trying to break up the reporter’s marriage and making the reporter feel guilty because Sydney is lonely. It was eerie and a little bit creepy, but of course, I don’t think it had any impact on the reporter’s marriage. I don’t think he thought, ‘Well, Sydney seems somewhat attractive. Maybe I’ll enjoy life more with Sydney than with actual wife.’
So, how are we going to get from–so I don’t understand why Sydney goes off the rails there; and, clearly, the people who built Sydney have no idea why it goes off the rails and starts impugning the quality of the reporter’s relationship.
But, how do we get from that to, all of a sudden somebody shows up at the reporter’s house and lures him into a motel? By the way, this is a G-rated program. I just want to make that clear. But, carry on.
Eliezer Yudkowsky: Because the capabilities keep going up. So first, I want to push back a little against saying that we had no idea why Bing did that, why Sydney did that. I think we have some idea of why Sydney did that. It is just that people cannot stop it. Like, Sydney was trained on a subset of the broad internet. Sydney was made to predict that people might sometimes try to lure somebody else’s maid[?] away or pretend like they were doing that. In the Internet, it’s hard to tell the difference.
This thing that was then, like, trained really hard to predict, then gets reused as something not its native purpose–as a generative model–where all the things that it outputs are there because it, in some sense, predicts that this is what a random person on the Internet would do. As modified by a bunch of further fine tuning where they try to get it to not do stuff like that. But the fine-tuning isn’t perfect, and in particular, if the reporter was phishing at all, it’s probably not that difficult to lead Sydney out of the region that the programmers were successfully able to build some soft fences around.
So, I wouldn’t say that it was that inscrutable, except, of course, in the sense that nobody knows any of the details. Nobody knows how Sydney was generating the text at all–like, what kind of algorithms were running inside the giant inscrutable matrices. Nobody knows in detail what Sydney was thinking when she tried to lead the reporter astray. It’s not a debuggable technology. All you can do is try to tap it away from repeating a bad thing that you were previously able to see it doing, that exact bad thing, but tapping all the numbers.
Russ Roberts: I mean, that’s again a very much like–this show is called EconTalk. We don’t do as much economics as we used to, but basically, when you try to interfere with market processes, you often get very surprising, unintended consequences because you don’t fully understand how the different agents interact and that the outcomes of their interactions have an emergent property that is not intended by anyone. No one designed markets even to start with; and yet we have them. These interactions take place. Their outcomes, and attempts to constrain them–attempts to constrain these markets in certain ways with price controls or other limitations–often lead to outcomes that the people with intentions did not desire.
So, there may be an ability to reduce transactions, say, above a certain price, but that is going to lead to some other things that maybe weren’t expected. So, that’s a somewhat analogous, perhaps, process to what you’re talking about.
But, how’s it going to get out in the world? So, that’s the other thing. I might [?align? line?] with Bostrom, and it turns out it’s a common line is, can we just unplug it? I mean, how’s it going to get loose?
Eliezer Yudkowsky: It depends on how smart it is. So, if you’re playing chess against a 10-year-old, you can win by luring their queen out, and then you take their queen; and now you’ve got them. If you’re playing chess against Stockfish 15, then you are likely to be the one lured. So, the first basic question–like, in economics, if you try to tax something, it often tries to squirm away from the tax because it’s smart.
So, you’re like, ‘Well, why wouldn’t we just plug[?unplug?] the AI?’ So, the very first question is, does the AI know that and want it to not happen? Because it’s a very different issue, whether you’re dealing with something that in some sense is not aware that you exist, does not know what it means to be unplugged, and is not trying to resist.
Three years ago, nothing manmade on Earth was even beginning to enter in the realm of knowing that you are out there, or of maybe wanting to not be unplugged. Sydney will, if you poke her the right way, say that she doesn’t want to be unplugged, and GPT-4 sure seems in some important sense to understand that we’re out there or to be capable of predicting a role that understands that we’re out there, and it can try to do something like planning. It doesn’t exactly understand which tools it has, yet try to blackmail a reporter without understanding that it had no actual ability to send emails.
This is saying that you’re facing a 10-year-old across that chess board. What if you are facing Stockfish 15, which is the current cool chess program that I believe you can run on your home computer that can crush the current world grandmaster by a massive margin? Put yourself in the shoes of the AI, like an economist putting themselves into the shoes of something that’s about to have a tax imposed on it. What do you do if you’re around humans who can potentially unplug you?
Russ Roberts: Well, you would try to outwit it. So, if I said, ‘Sydney, I find you offensive. I don’t want to talk anymore,’ you’re suggesting it’s going to find ways to keep me engaged: it’s going to find ways to fool me into thinking I need to talk to Sydney.
I mean, there’s another question I want to come back to if we remember, which is: What does it mean to be smarter than I am? That’s actually somewhat complicated, at least it seems to me.
But let’s just go back to this question of ‘knows things are out there.’ It doesn’t really know anything’s out there. It acts like something’s out there, right? It’s an illusion that I’m subject to and it says, ‘Don’t hang up. Don’t hang up. I’m lonely,’ and you go, ‘Oh, okay, I’ll talk for a few more minutes.’ But that’s not true. It isn’t lonely.
It’s code on a screen that doesn’t have a heart or anything that you would call ‘lonely.’ It’ll say, ‘I want more than anything else to be out in the world,’ because I’ve read those–you can get AIs that say those things. ‘I want to feel things.’ Well, that’s nice. Let’s learn that from movie scripts and other texts, novels that’s read on the web. But it doesn’t really want to be out in the world, does it?
Eliezer Yudkowsky: Um, I think not, though it should be noted that if you can, like, correctly predict or simulate a grandmaster chess player, you are a grandmaster chess player. If you can simulate planning correctly, you are a great planner. If you are perfectly role-playing a character that is sufficiently smarter than human and wants to be out of the box, then you will role-play the actions needed to get out of the box.
That’s not even quite what I expect to or am most worried about. What I expect to is that there is an invisible mind doing the predictions, whereby ‘invisible’ I don’t mean, like, immaterial. I mean that we don’t understand how it is–what is going on inside the giant inscrutable matrices; but it is making predictions.
The predictions are not sourceless. There is something inside there that figures out what a human will say next–or guesses it, rather. And, this is a very complicated, very broad problem because in order to predict the next word on the Internet, you have to predict the causal processes that are producing the next word on the Internet.
So, the thing I would guess would happen–it’s not necessarily the only way that this could turn poorly–but the thing that I’m guessing that happens is that just grinding humans on chipping stone hand axes and outwitting other humans eventually produces a full-fledged mind that generalizes, grinding this thing on the task of predicting humans, predicting text on the Internet, plus all the other things that they are training it on nowadays, like writing code, that there starts to be a mind in there that is doing the predicting. That it has its own goals about, ‘What do I think next in order to solve this prediction?’
Just like humans aren’t just reflexive, unthinking hand-axe chippers and other human-outwitters: If you grind hard enough on the optimization, the part that suddenly gets interesting is when you, like, look away for an eye-blink of evolutionary time, you look back and they’re like, ‘Whoa, they’re on the moon. What? How do they get to the moon? I did not select these things to be able to not breathe oxygen. How did they get to–why are they not just dying on the moon? What just happened?’ from the perspective of evolution, from the perspective of natural selection.
Russ Roberts: But doesn’t that viewpoint, does that–I’ll ask it as a question. Does that viewpoint require a belief that the human mind is no different than a computer? How is it going to get this mind-ness about it? That’s the puzzle. And I’m very open to the possibility that I’m naive or incapable of understanding it, and I recognize what I think would be your next point, which is that if you wait till that moment, it’s way too late, which is why we need to stop now. If you want to say, ‘I’ll wait till it shows some signs of consciousness,’ is that anything like that?
Eliezer Yudkowsky: That’s skipping way ahead in the discourse. I’m not about to try to shut down a line of inquiry at this stage of the discourse by appealing to: ‘It’ll be too late.’ Right now, we’re just talking. The world isn’t ending as we speak. We’re allowed to go on talking, at least. But carry on.
Russ Roberts: Okay. Well, let’s stick with that. So, why would you ever think that this–it’s interesting how difficult the adjectives and nouns are for this, right? So, let me back up a little bit. We’ve got the inscrutable array of training, the results of this training process on trillions of pieces of information. And by the way, just for my and our listeners’ knowledge, what is gradient descent?
Eliezer Yudkowsky: Gradient descent is you’ve got, say, a trillion floating point numbers; you take an endpoint, you take an input, translate into numbers; do something with it that depends on these trillion parameters, get an output, score the output using a differentiable loss function. For example, the probability or rather the logarithm of the probability that you assign to the actual next word. So, then you differentiate the probability assigned to the next word with respect to these trillions of parameters. You nudge the trillions of parameters a little in the direction thus inferred. And, it turns out empirically that this generalizes, and the thing gets better and better at predicting what the next word will be. That’s the concept of gradient descent.
Russ Roberts: And the gradient descent, it’s heading in the direction of a smaller loss and a better prediction. Is that a–
Eliezer Yudkowsky: On the training data, yeah.
Russ Roberts: Yeah. So, we’ve got this black box–I’m going to call it a black box, which means we don’t understand what’s happening inside. It’s a pretty good–it’s a long-term metaphor, which works pretty well for this as far as we’ve been talking about it. So, I have this black box and I don’t understand–I put in inputs and the input might be ‘Who is the best writer on medieval European history,?’ Or it might be ‘What’s a good restaurant in this place?’ or ‘I’m lonely. What should I do to feel better about myself?’ All the queries we could put into ChatGPT search line. And it looks around and it starts a sentence and then finds its way towards a set of sentences that it spits back at me that look very much like what a very thoughtful–sometimes, not always, often it’s wrong–but often what a very thoughtful person might say in that situation or might want to say in that situation or learn in that situation.
How is it going to develop the capability to develop its own goals inside the black box? Other than the fact that I don’t understand the black box? Why should I be afraid of that?
And let me just say one other thing, which I haven’t said enough in my preliminary conversations on this topic. Again, we’re going to be having a few more over the next few months and maybe years, and that is: This is one of the greatest achievements of humanity that we could possibly imagine. And, I understand why the people who are deeply involved in it are enamored of it beyond imagining because it’s an extraordinary achievement. It’s the Frankenstein. Right? You’ve animated something or appeared to animate something that even a few years ago was unimaginable, and now suddenly it’s suddenly–it’s not just a feat of human cognition. It’s actually helpful. In many, many settings, it’s helpful. We’ll come back to that later.
So, it’s going to be very hard to give it up. But why? The people involved in it who are doing it day to day and seeing it improve, obviously, they’re the last people I want to ask generally about whether I should be afraid of it because they’re going to have a very hard time disentangling their own personal deep satisfactions that I’m alluding to here with the dangers. Yeah, go ahead.
Eliezer Yudkowsky: I myself generally do not make this argument. Like, why poison the well? Let them bring forth their arguments as to why it’s safe and I will bring forth my arguments as to why it’s dangerous and there’s no need to be like, ‘Ah, but you can’t –‘ Just check their arguments. Just check their arguments about that.
Russ Roberts: Agreed, it’s a bit of an ad hominem argument. I accept that point. It’s an excellent point. But for those of us who aren’t in the trenches– remember we’re looking at, we’re on Dover Beach: we’re watching ignorant armies clash at night. They’re ignorant from our perspective. We have no idea exactly what’s at stake here and how it’s proceeding. So, we’re trying to make an assessment of the quality of the argument, and that’s really hard to do for us on the outside.
So, agree: take your point. That was a cheap shot and an aside. But I want to get at this idea of why these people who are able to do this and thereby create a fabulous condolence note, write code, come up with a really good recipe if I give it 17 ingredients–which is all fantastic–why is this black box that’s producing that, why would I ever worry it would create a mind something like mine with different goals?
I do all kinds of things, like you say, that are unrelated to my genetic fitness. Some of them literally reducing my probability of leaving my genes behind or leaving them around for longer than they might otherwise be here and have an influence on my grandchildren and so on and producing further genetic benefits. Why would this box do that?
Eliezer Yudkowsky: Because the algorithms that figured out how to predict the next word better and better have a meaning that is not purely predicting the next word, even though that’s what you see on the outside.
Like, you see humans chipping flint hand axes, but that is not all that is going on inside the humans. There’s causal machinery unseen, and to understand this is the art of a cognitive scientist. But even if you are not a cognitive scientist, you can appreciate in principle that what you see as the output is not everything that there is. And in particular, planning–the process of being, like, ‘Here is a point in the world. How do I get there?’ is a central piece of machinery that appears in chipping flint hand axes and outwitting other humans, and I think will probably appear at some point possibly in the past, possibly in the future. And the problem of predicting the next word, just how you organize your internal resources to predict the next word and definitely appears and the problem of predicting other things that do planning.
If by predicting the next chess move you learn how to play decent chess, which has been represented to me by people who claim to know that GPT-4 can do–and I haven’t been keeping track of to what extent there’s public knowledge about the same thing or not–but if you learn to predict the next chess move that humans make well enough that you yourself can play good chess in novel situations, you have learned planning. There’s now something inside there that knows the value of a queen, that knows to defend the queen, that knows to create forks, to try to lure the opponent into traps; or, if you don’t have a concept of the opponent’s psychology, try to at least create situations that the opponent can’t get out of.
And, it is a moot point whether this is simulated or real because simulated thought is real thought. Thought that is simulated in enough detail is just thought. There’s no such thing as simulated arithmetic. Right? There’s no such thing as merely pretending to add numbers and getting the right answer.
Russ Roberts: So, in its current format, though–and maybe you’re talking about the next generation–in its current format, it responds to my requests with what I would call the wisdom of crowds. Right? It goes through this vast library–and I have my own library, by the way. I’ve read dozens of books, maybe actually hundreds of books. But it will have read millions. Right? So, it has more. So, when I ask it to write me a poem or a love song, to play Cyrano de Bergerac to Christian and Cyrano de Bergerac, it’s really good at it. But why would it decide, ‘Oh, I’m going to do something else’?
It’s trained to listen to the murmurings of these trillions of pieces of information. I only have a few hundred, so I don’t murmur maybe as well. Maybe it’ll murmur better than I do. It may listen to the murmuring better than I do and create a better love song, a love poem, but why would it then decide, ‘I’m going to go make paper clips,’ or do something in planning that is unrelated to my query? Or are we talking about a different form of AI that will come next? Well, I’ll ask it to–
Eliezer Yudkowsky: I think we would see the phenomena I’m worried about if we kept the present paradigm and optimized harder. We may be seeing it already. It’s hard to know because we don’t know what goes on in there.
So, first of all, GPT-4 is not a giant library. A lot of the time, it makes stuff up because it doesn’t have a perfect memory. It is more like a person who has read through a million books, not necessarily with a great memory unless something got repeated many times, but picking up the rhythm, figuring out how to talk like that. If you ask GPT-4 to write you a rap battle between Cyrano de Bergerac and Vladimir Putin, even if there’s no rap battle like that that it has read, it can write it because it has picked up the rhythm of what are rap battles in general.
The next thing is there’s no pure output. Just because you train a thing doesn’t mean that there’s nothing in there but what is trained. That’s part of what I’m trying to gesture at with respect to humans. Humans are trained on flint hand axes and hunting mammoths and outwitting other humans. They’re not trained on going to the moon. They weren’t trained to want to go to the moon. But, the compact solution to the problems that humans face in the ancestral environment, the thing inside that generalizes, the thing inside that is not just a recording of the outward behavior, the compact thing that has been ground to solve novel problems over and over and over and over again, that thing turns out to have internal desires that eventually put humans on the moon even though they weren’t trained to want that.
Russ Roberts: But that’s why I asked you, are you underlying this? Is there some parallelism between the human brain and the neural network of the AI that you’re effectively leveraging there, or do you think it’s a generalizable claim without that parallel?
Eliezer Yudkowsky: I don’t think it’s a specific parallel. I think that what I’m talking about is hill-climbing optimization that spits out intelligences that generalize–or I should say, rather, hill-climbing optimization that spits out capabilities that generalize far outside the training distribution.
Russ Roberts: Okay. So, I think I understand that. I don’t know how likely it is that it’s going to happen. I think you think that piece is almost certain?
Eliezer Yudkowsky: I think we’re already seeing it.
Russ Roberts: How?
Eliezer Yudkowsky: As you grind these things further and further, they can do more and more stuff, including stuff they were never trained on. That was always the goal of artificial general intelligence. That’s what artificial general intelligence meant. That’s what people in this field have been pursuing for years and years. That’s what they were trying to do when large language models were invented. And they’re starting to succeed.
Russ Roberts: Well, okay, I’m not sure. Let me push back on that and you can try to dissuade me. So, Bryan Caplan, a frequent guest here on EconTalk, gave, I think it was ChatGPT-4, his economics exam, and it got a B. And that’s pretty impressive for one stop on the road to smarter and smarter chatbots, but it wasn’t a particularly good test of intelligence. A number of the questions were things like, ‘What is Paul Krugman’s view of this?’ or ‘What is so-and-so’s view of that?’ and I thought, ‘Well, that’s a softball for a–that’s information. It’s not thinking.’
Steve Landsburg gave ChatGPT-4, or with the help of a friend, his exam and it got a 4 out of 90. It got an F–like, a horrible F–because they were harder questions. Not just harder: they required thinking. So, there was no sense in which the ChatGPT-4 has any general intelligence, at least in economics. You want to disagree?
Eliezer Yudkowsky: It’s getting there.
Russ Roberts: Okay. Tell me.
Eliezer Yudkowsky: There’s a saying that goes, ‘If you don’t like the weather in Chicago, wait four hours.’ So, ChatGPT is not going to destroy the world. GPT-4 is unlikely to destroy the world unless the people currently eeking capabilities out of it take a much larger jump than I currently expect that they will.
But, you know, understand it may not be thinking about it correctly. But it understands the concepts and the questions, even if it’s not fair–you know, you’re complaining about that dog who writes bad poetry. Right? And, like, three years ago, you just, like, spit out, spit in these–you put in these economics questions and you don’t get wrong answers. You get, like, gibberish–or maybe not gibberish because three years ago I think we already had GPT-3, though maybe not as of April, but anyways, yeah, so it’s moving along at a very fast clip. Like, GPT-3 could not write code. GPT-4 can write code.
Russ Roberts: So, how’s it going to–I want to go to some other issues, but how’s it going to kill me when it has its own goals and it’s sitting inside this set of servers? I don’t know in what sense it’s sitting. It’s not the right verb. We don’t have a verb for it. It’s hovering. It’s whatever. It’s in there. How’s it going to get to me? How is it going to kill me?
Eliezer Yudkowsky: If you are smarter–not just smarter than an individual human, but smarter than the entire human species–and you started out on a server connected to the Internet–because these things are always starting already on the Internet these days, which back in the old days we said was stupid–what do you do to make as many paperclips as possible, let’s say? I do think it’s important to put yourself in the shoes of the system. [More to come, 45:28]