An Introduction to Logical Decision Theory for Everyone Else

https://arbital.com/p/5kv

by Eliezer Yudkowsky Jul 27 2016 updated Oct 19 2016

So like what the heck is 'logical decision theory' in terms a normal person can understand?


Q: Hi! I'm an imaginary science journalist, Msr. Querry, who's been assigned--God help me--to write an article on 'logical decision theory', whatever the heck that is.

A: Pleased to meet you. I'm the imaginary Msr. Anson.

Q: Is it too much to hope that 'logical decision theory' is a theory about logical decisions?

A: That is, indeed, too much to hope. More like… it's a theory of decisions about logic.

Q: Decisions about logic. I have to confess, the main thing I remember from logic class is "All men are mortal, Socrates is a man, therefore Socrates is mortal." If I get to make decisions about logic, does that mean I get to decide that Socrates isn't mortal?

A: Well, not in that particular case, but… sort of?

Q: Really? Are you just trying to be nice to the journalist here?

A: This actually is a theory in which our choices can be seen as determining a logical theorem that is much much much more complicated than the theorem about Socrates.

Q: I'm sorry if this is an unfair question, but can you give me an everyday real-world example of what that means and why my readers should care?

A: Sure.

Q: Wait, really?

A: Yeah. Have you ever heard somebody--maybe an economist--talking about how it's 'irrational' to put in the effort to [ vote in elections]?

Q: Because the system is rigged so that we only get to see horrible candidates?

A: The economic argument for voting being 'irrational' is more that one vote has only a very tiny chance of making a difference. Let's say that there's a really close election, like 500,220 votes for Kang versus 500,833 votes for Kodos. Let's say you voted for Kodos. What would have happened if you hadn't voted for Kodos? There would have been 500,832 votes for Kodos. Since large elections are almost never settled by one vote, your vote never makes a difference, in the sense that the end result is exactly the same as if you hadn't voted.

Q: That reasoning sounds fishy to me.

A: I quite agree.

Q: Like, we could apply that reasoning to everyone's vote. Which sounds like nobody is responsible for the results of elections. I can see that any one person's vote only makes a small difference. If you say everyone's vote makes zero difference, where do the results of elections come from?

A: Again, I agree. But the economists, philosophers, computer scientists, and so on, who say that voting is 'irrational', are not just making up a weird argument as they go along. They're giving the conventional answer from within a conventional theory of how decisions work, what's known as "causal decision theory", which was previously adopted for some [ pretty good reasons]. Causal decision theory is the old view that our upstart new idea, "logical decision theory", is challenging for the throne! And the stakes are nothing less than which theory gets to decide whether voting in elections is reasonable!

Q: That doesn't sound like a life-or-death issue.

A: Well, it might influence a number of people who try to do the reasonable thing, and influence what society at large thinks of being reasonable. But insofar as there are any life-or-death issues here, they are weirder and harder to understand, so we should talk about those later. For now, just consider voting in elections.

Q: So does logical decision theory say it's reasonable to vote in elections?

A: It depends on your circumstances, but for a lot of people the answer will be yes. Imagine a cohort of potential voters who are pretty similar to you, thinking thoughts similar to yours, also deciding whether or not to vote. If you're similar enough, you'll probably all decide the same thing, right? So you should ask whether the benefit of everyone like you voting exceeds the cost of everyone like you voting.

Q: This sounds suspiciously like common sense.

A: Progress does sometimes consist of realizing that common sense was right after all.

Q: And how is that… a theory of decisions about logic?

A: If there's a large group of people who are in shoes similar to yours, why would you all tend to decide the same thing? How does this correlation between your decision to vote, and other people's decision to vote, exist in the first place? Is there a Master Voter somewhere who telepathically beams their decision into everyone's head?

Q: I'm gonna go out on a limb here and answer no.

A: Then if there's no Master Voter, how do so many people end up doing the same thing?

Q: The same way that millions of people go to the supermarket and buy more milk when their refrigerator runs out of milk, without there being a Master Milker.

A: But again, if there's no Master Milker broadcasting the instructions, how to do so many people end up deciding the same thing?

Q: Because they all want milk. Where are you going with this?

A: What's 2 + 2?

Q: 5. I kid, I kid, it's 4.

A: And what's 2 + 2 on the planet Mars?

Q: Also 4.

A: So [ if you took a pocket calculator on Earth and a pocket calculator on Mars], and asked them both to compute 2 + 2 = 4, they'd give you the same answer, despite being separated by enough distance that there's no way for a signal to travel from one to other in time.

Q: Okay…

A: The notion of logical decision theory is that your decision to vote in elections is the same sort of thingy. You should see yourself as deciding the logical fact of whether the reasoning process you and some other people are using to decide whether to vote, outputs 'vote' or 'don't vote'. By contrast, in causal decision theory which is the current standard, you think about your own vote as if your answer could change separately from the answer given by all the other people thinking similar thoughts. Like a calculator on Mars that imagines it could output '5' while all the other calculators in the universe go on answering '4'.

Q: So you're saying that my deciding whether to vote is like my being a pocket calculator on Mars…?

A: Indeed, Msr. Querry, and you can take that to the bank and smoke it.

Q: I'm not sure I… no, I'm sure I don't understand. I heard you say a lot of sentences that sound sort of related to each other, but I don't think I understand the triumphant conclusion you just drew. Is this something like Kant's Categorical Imperative - a theory that logical people should act the way they'd want all logical people to act, so that everyone will be better off?

A: Again, this is a theory of decisions about logic, not a theory about logical decisions. Let me see if I can give an example that pries apart LDT--logical decision theory--from the intuitive Golden Rule. Suppose you're lost in the desert, about to die of thirst, when a guy in a truck drives up. The driver of this truck is, one, very good at reading faces, and two, a selfish bastard. He asks you whether, if he drives you to the city, you'll pay him \$1000 once you get to the city.

Q: Of course I'd say yes.

A: And then would you actually pay, when you got to the city?

Q: Yeah, because I keep my promises.

A: Suppose you're a selfish bastard too. You would want to keep the \$1000 if you could get away with it. The promise might not be legally enforceable, and even if it was, the driver thinks it's worth more than \$1000 of their time to drag you through Small Claims Court, so they wouldn't bother. However, you're not very good at controlling your facial expression. If you know you're going to refuse to pay the driver once you actually get to the city, it's going to show up on your face when the driver asks you whether you're going to pay.

Q: Again, not dying in the desert seems like common sense here.

A: Remember, there's a certain sense in which this whole issue here is about what is common sense, or whether to trust common sense. You might want to be able to answer 'Yes' to the driver, and have the driver see from your face that you're telling the truth. But what happens when you're actually in the city? Once you're in the city, you might reason that, hey, you're already in the city, it's not like you'll be teleported back into the desert if you don't pay up. So you'll laugh and walk away. Except, while you're in the desert, you know that's what you'll do once you're in the city. So the driver asks you if you'll pay, you say 'Yes', the driver reads from facial microexpressions that you're lying, and drives off to let you die in the desert.

Q: And that's why human beings evolved senses of honor.

A: Well, maybe. But the question is whether a rational selfish bastard actually does die in the desert. That dilemma is known as Parfit's Hitchhiker, by the way.

Q: I'm guessing LDT says a rational bastard survives?

A: Causal decision theory says a rational agent dies, because causal decision theory says that once you're in the city, refusing to pay can't teleport you back into the desert. So at the moment when you're actually in the city, the rational selfish decision will be to keep the money. If you can't stop yourself from knowing that in the desert or suddenly develop perfect control of your facial expressions… well, rational agents don't always do the best in every decision problem whose outcome is strictly determined by their behavior, alas. Logical decision theory says that two computations of the same algorithm, 'What will I do once I'm inside the city?', are controlling (a) whether the driver takes you to the city, and (b) whether you pay \$1000 on arrival. The part where you pay \$1000 is painful, but it would be a lot more painful to die in the desert! So the decision algorithm, taking into account both consequences of the decision algorithm's output, outputs 'Pay \$1000' in both cases. I mention this example because it pries apart LDT from the Categorical Imperative or the Golden Rule. In Parfit's Hitchhiker, a certain decision algorithm has a single output that appears in two different places, like 2 + 2 having the same answer on Earth and Mars. But both appearances of the decision algorithm are inside one person's mind, rather than being distributed over different voters in an election. You're not acting the way you want other people to act. It's not like you and the driver are both deciding whether to be nice or non-nice using the same algorithm. You are both selfish bastards and you're each making your decisions for different reasons. It's just that even after you reach the city, you still think that the decision algorithm for 'What should a rational selfish agent do inside the city?' can't give two different answers at two different times.

Q: Sorry if this is a distraction, but since in the real world people aren't perfectly selfish--

A: I'mma stop you right there. Yes, we can contrive examples that aren't about selfish bastards, but they're more complicated examples. The point of using selfish bastards in our dilemmas isn't that rational people have to be selfish. The point is that a selfish bastard is an agent that cares about one thing, instead of caring about money and other people's welfare and reputation and so on. We could talk about the driver of the truck being a mother who desperately needs that \$1000 to pay for antibiotics for her child, and it would work equally well as a simplified example because that desperate mother cares about one thing. Cases of people who care about two or more things are more complicated, and they're built up from the theory of caring about one thing.

Q: Like the story about the physicist asked for a theory of how to bet on horses, whose answer begins, "Let each horse be approximated as a perfect rolling sphere."

A: But in real life physicists don't actually do that. They know horses aren't perfect rolling spheres. Physicists do have more complicated theories that factor in air resistance and legs. You don't want to be that kid in class who's like, "Lol, the stupid science teacher thinks cylinders rolling down an inclined plane are always in a vacuum," because that kid has entirely missed the point. It's not that the standard model of physics only works in the absence of air resistance. It's that to learn physics you need to understand the simple cases first. The complicated cases are built up out of simple cases. Similarly, LDT is not restricted to selfish bastards and it certainly doesn't say people ought to be selfish bastards. We talk about greedy selfish bastards with no honor because they're the ideal rolling spheres of decision theory. If we start factoring in air resistance, the examples get a lot more complicated and the critical points are obscured.

Q: Fair enough. But the upshot from my point of view is that, besides one bunch of people contradicting another bunch of people about it being 'reasonable' to vote in elections or die in deserts, I don't really understand yet why LDT matters. Msr. Anson, maybe this is an unfair question again. But leave aside for the moment we think is 'reasonable' to choose. Is there something we can do with this theory? That before we couldn't do at all?

A: We can build computational agents that will naturally cooperate in the one-shot Prisoner's Dilemma given common knowledge of each other's code.

Q: I'll ask you to unpack all that for me, and hope to God it's something I can fit in a general-interest magazine article.

A: The Prisoner's Dilemma gets its name from the traditional and confusing presentation, which is that you are a criminal who's been arrested, for a crime that you committed with the help of one other criminal. Both of you are facing one-year prison sentences. The district attourney comes to each of you, privately, and offers you one year off your prison sentence if you give testimony against the other criminal, who then gets two years added onto their prison sentence. Testifying is termed 'Defection'--you're defecting against your fellow criminal--and refusing to testify is 'Cooperating', again with your fellow criminal. If you both Defect, you both spend two years in prison. If you both Cooperate, you both spend one year in prison. But if you Cooperate and the other prisoner Defects, you spend three years in prison while your fellow criminal gets off scott-free.

Q: And both prisoners are ideal selfish bastards with no senses of honor or air resistance.

A: That's right! An essentially identical dilemma is the following: You and another experimental subject both need to privately press a button marked D or C. If you both press D, you both get \$1. If you both press C, you both get \$2. If one presses D and the other presses C, the Defector gets \$3 and the Cooperator gets nothing. Whatever the other player does, you are \$1 better off if you press D than C, and the two of you can't see each other before you have to choose your moves. But of course, the other person is probably thinking just the same thing.

Q: And causal decision theory says that an ideal selfish bastard presses D, while logical decision theory says that an ideal selfish bastard presses C?

A: Then if you put a CDT bastard and an LDT bastard together, the CDT bastard would get the higher payoff, wouldn't they? That's not much of a recommendation for a decision theory.

Q: So… what does LDT say?

A: Well, for one thing, LDT says that if both of you know that the player is an LDT agent, and you know the other agent knows that you know they're an LDT agent, and so on, then you both ought to press C. That follows for pretty much the same reason you ought to press C if you were recently cloned and it's your own clone on the other side. It's not like the rational algorithm could compute different answers on identical problems, as Douglas Hofstadter pointed out some time ago. Though even in that case--even if it's your own recent clone on the other side--causal decision theory still says to defect. However, the really exciting result in LDT is that even if you and the other agent aren't clones, even if you're not thinking exactly the same thoughts, but you both have a sufficiently good picture of what the other person is thinking, you can still arrange it so that you both end up cooperating. At least that's one rough way of describing what was actually proved and then demonstrated, which is that--

Q: Pause just a second. Before we get technical, is there anything you can tell me about why the Prisoner's Dilemma matters? Like, I do remember hearing about this a while back, and I've seen it mentioned here and there like it's a big deal, but I don't know why it's a big deal.

A: The Prisoner's Dilemma is the traditional ideal-rolling-sphere illustration for [ coordination problems] and [ commons problems]. Like, it's the simple base case for 38% of literally everything that is wrong with the modern world.

Q: Go on.

A: In game theory, a [ Nash equilibrium] is a point where everyone is making their individually best moves, assuming that everyone else is playing according to the Nash equilibrium. Like, let's say that in some scientific field there's a very popular journal called the Expensive Journal, which costs \$20,000/year for a university subscription. Since everyone sends all their best papers to Expensive Journal, it's very prestigious and everyone reads it. And since the Expensive Journal is very prestigious, everyone tries to get their papers published there. Some bright grad student wants to know why they can't have the Free Journal, which would be just like the Expensive Journal only it wouldn't cost \$20,000. And the answer is that if everyone was already sending their best papers to the Free Journal, the Free Journal would be prestigious; but since that's not already true, the Free Journal isn't as prestigious, and people will only send papers there if they're rejected from the Expensive Journal, so the Free Journal goes on looking less prestigious. This is a coordination problem for the researchers, and what makes it a coordination problem is that every researcher is making their individually best move by sending their paper to the Expensive Journal, assuming everyone else goes on acting as they usually do. If all the researchers could change their policy simultaneously, they could all be better off. In fact, if everyone was already sending their best papers to the Free Journal, that would also be a stable state. "Everyone sends their best papers to the Expensive Journal" and "Everyone sends their best papers to the Free Journal" are both Nash equilibria, but the fact that any individual who tries to unilaterally defy the current Nash equilibrium does worse in expectation, makes either state very sticky once you're in it.

Q: I think I'm with you so far.

A: A [ Pareto optimum] is a separate term in game theory. An outcome is Pareto-optimal if it's good enough that it's impossible for all the players to do better simultaneously. In the Prisoner's Dilemma, "both players Defect and both players get \$1" is not Pareto optimal because both players would prefer the outcome in which both players cooperate and both players get \$2. Alas, the (\$2, \$2) payoff isn't a Nash equilibrium, because each player, holding the other player's actions constant, could do individually better by playing Defect. When the Nash equilibrium isn't Pareto-optimal, or no Pareto-optimal outcome is a Nash equilibrium, this is a polite way of saying "Your civilization is about to have a really big problem." It's how things get massively screwed up without there being any specific evil villains who are conspiring to screw things up that exact way. That's why there's an enormously enormous literature on the Prisoner's Dilemma.

Q: So this is a real-life issue, not just a theoretical one?

A: Remember when you talked about how sometimes there's no point voting in elections because the 'system' is 'rigged' so that the candidates are pretty similar? That's actually an unintentional design flaw of an election where everyone can only vote for one candidate and the candidate with the most votes is elected; so long as other people are voting for lizards, your vote can only be effective if you vote for the less bad lizard. But then you're also one of the people voting for lizards. It's not a fallacy and it's not being stupid, any more than researchers are being stupid by sending their best papers to the most prestigious journal even if somebody is ripping off the universities and charging thousands of dollars for the subscription. It's you doing what makes sense given your individual circumstances, in a way that causes the whole system to behave much less well than it theoretically could if lots of people did something different at the same time. The people who designed that election system weren't villains, they just set it up in the simplest and most obvious way and didn't realize where that would put the Nash equilibrium. Elsevier buying scientific journals and then ripping off universities by charging thousands of dollars for relatively trivial editing work with enormous profit margins, because of the Nash equilibrium around a journal being prestigious, is something that actually happens. So yes, in the real world this is a big, big deal!

Q: And thus, 38% of all the awfulness in modern human existence involves a 'Nash equilibrium' that isn't 'Pareto optimal'.

A: Right!

Q: So does your new decision theory solve this problem for all of human civilization?

A: Hahahahaha NO.

Q: Well then--

A: It is, however, a big-picture issue that our research really actually bears on, counting as a step of incremental progress.

Q: Hm. I guess that's a pretty significant claim in its own right. You said before that there was something new we could do with logical decision theory, not just a difference in what some scientists think we should call reasonable?

A: Right. But it's not at the stage where I can point to a hovering saucer and say "This saucer is flying using blibber theory, and that's what it does, never mind the details of how." It's normal for there to be a multi-decade gap between the first time you split in an atom in the lab in a way that requires sensitive instruments to detect, and when you have a big nuclear plant generating electricity that anyone can see. So in this case I do have to explain how the proof-of-principle demo worked and what the result meant, as opposed to pointing to a hovering saucer that is flying and that's neat regardless of whether you understand what's going on.

Q: But the demo is something a mortal can understand?

A: Maybe. It seems worth a shot.

Q: Go for it, then.

A: To review the Prisoner's Dilemma: If you both Defect, you both get \$1; if you both Cooperate, you both get \$2; if one player Defects and the other Cooperates, they get \$3 and \$0 respectively. %note: And we assume both players are rational selfish greedy honorless bastards so that the simple dollar amounts are exactly proportional to the sum of everything the player actually likes or dislikes in the outcome. We can imagine a more elaborate case where realistic humans with a realistic coordination problem have a total like or dislike for the outcomes in these relative proportions, but it would be basically the same idea.%

Q: I think I understood that part, at least.

A: Suppose you are a computational agent playing the Prisoner's Dilemma. Like, you are a computer program, and you are trying to figure out what to do--

Q: Does this thought experiment require that computer programs can be people?

A: The key postulate here is that programs can play games or output decisions. %note: Also, computer programs can like totally be people.%

Q: Okay, so I'm a computer program playing the Prisoner's Dilemma.

A: The other player is also a computational agent. You get to look at their code. They get to look at your code. You can reason about how the other player's code works. You can even simulate precisely what the other player is thinking, but with a time delay. Say, after 15 seconds you know what the other player was thinking for the first 10 seconds.

Q: …huh. So if I was thinking something like, 'I'll cooperate only if I see that you plan to cooperate'… um.

A: You might see that I was planning to cooperate if you cooperated…for the first 10 seconds of my own thinking. But if you stopped there and output 'cooperation' as your final answer, maybe I'd change my mind after another 10 seconds of simulating you, when I saw that you'd come to a definite conclusion and couldn't change that conclusion any more.

Q: Do I see you already planning to do that?

A: No, but what if the thought occurs to me later?

Q: So it's kind of like the Parfit's Hitchhiker case--I have to believe you're the sort of person who will go on cooperating even after I output cooperation, which is kinda like me having already driven you into the city.

A: Interesting point, but that wasn't actually where I was heading! Suppose the other player is very simple. Their name is FairBot1, and their code looks like this:

def FairBot1(otherAgent):
   if (otherAgent(FairBot1) == Cooperate):
     return Cooperate
   else:
     return Defect

Q: So, it runs me until I output an answer, and then it immediately outputs the same thing I did. Not that it's promising to do that, it's just what the code does--like, that program is the whole FairBot1.

A: Right.

Q: Obviously, I should cooperate in this case even if I'm a selfish bastard. And I'm guessing that causal decision theory says I should defect?

A: You're starting to pick up on the pattern! After all, for all we know, FairBot1 was already run a minute earlier; how can your decision now possibly affect what FairBot1 has already done, which lies in the past?

Q: Because the same algorithm is being run in two different places and it has to give the same answer each time, just like it's a necessary truth of logic that 2 + 2 = 4 on both Earth and Mars, which is why this is 'a theory of decisions that are the logical outputs of algorithms' or 'logical decision theory'.

A: You're catching on! But again, cooperating with FairBot1 is just a question of what we call reasonable, so we're not to the new technology yet. Let me ask you this, though: is FairBot1 optimal?

Q: What do you mean by 'optimal'?

A: If somebody was going to hold a Prisoner's Dilemma tournament between programs that could look at each other's source code, and you would actually get all the money your bot won at the end of the game, would you submit a FairBot1?

Q: FairBot1 does seem like a program that induces any other programs to Cooperate, which is a definite advantage. But I'm guessing the answer is that it's not optimal, and next you'll ask me why not, and my reply will be that I don't know.

A: Well, one disadvantage of FairBot1 is that FairBot1 cooperates with CooperateBot.

def CooperateBot(otherAgent):
   return Cooperate

Q: Sounds fair to me.

A: Look, if you accept the problem setup at all, then everyone agrees--all decision theorists on all sides of the debate concur--that it is irrational to Cooperate in the Prisoner's Dilemma with a rock that has the word 'cooperate' written on it.

Q: Okay, I can see myself Defecting against a rock.

A: There's another reason FairBot1 is suboptimal. Can you imagine what happens if two copies of FairBot1 play each other…?

Q: They go into an infinite loop of Fairbot A simulating Fairbot B simulating Fairbot A simulating Fairbot B until they both run out of time?

A: Right. So a preliminary interesting technical result is, we fixed that--we described a simple and natural FairBot2 that does not run out of time if it bumps into itself.

Q: Huh. How?

A: Magic.

Q: …

A: So, frankly, this is not something a general audience is going to fully understand. But suppose that instead of simulating the other agent, FairBot2 tries to prove that the other agent cooperates. Like, FairBot2 is allowed to do logical, deductive proofs, not just run simulations.

def FairBot2(otherAgent):
  if (TryToProve("otherAgent(FairBot2) == Cooperate")):
    return Cooperate
  else:
    return Defect

Q: I'm not sure I really understand the difference between proving things and simulating things--to me they both seem like reasoning.

A: Say FairBot2 is doing logical reasoning in some very simple proof system that's trusted by just about every mathematician on the face of the Earth, like first-order arithmetic. Suppose you are reasoning about what happens when FairBot2 plays a big complicated otherAgent. Even in advance of whether you know FairBot2 actually cooperates with the otherAgent, you might conclude quickly that FairBot2 will not be exploited by the otherAgent. You think it will not be the case that FairBot2 plays Cooperate when the otherAgent plays Defect. You conclude this because you can see that FairBot2 only cooperates if it can prove in first-order arithmetic that otherAgent cooperates. So if FairBot2 is exploited then first-order arithmetic contains an inconsistency and then the universe ends.

Q: Uh…

A: My point is, there's things you can conclude by abstract reasoning faster than you could conclude them by simulating everything out. For example, that FairBot2 won't be exploited by the other agent, whether it actually cooperates or defects.

Q: I think I see that.

A: Now what happens when two FairBot2s play each other?

Q: My brain asplodes.

A: It looks like it could go either way, right? Maybe they both Defect and therefore neither one can prove the other one Cooperates and that's why they both Defect. Or maybe they both do Cooperate so they both prove the other one Cooperates and that's why they Cooperate.

Q: Okay, so, intuitively, I'd suspect that both bots would run out of time trying to prove things.

A: That's an extremely sensible thing to expect and so it is quite surprising that a pair of FairBot2s will Cooperate! Because of Löb's theorem which is magic. I mean, I could try to explain it, but it would be a distracting detour and you'd need to be at least a little good at math to understand what the heck I was talking about.

Q: Okeydokey.

A: In fact, we can figure out surprisingly quickly what happens in complicated-looking systems of agents that are all trying to prove things about other agents using first-order logic. Like, you can hand me some horrific-looking system of "Agent 1 cooperates if it proves that Agent 2 defects against Agent 3 but not against Agent 4, Agent 2 cooperates if it proves that Agent 1 cooperates and Agent 3 defects, etcetera" and I toss it into a short computer program and give you the result one millisecond later. So we can actually build agents like FairBot2 and run them in complicated tournaments, looking for programs that seem smarter than FairBot2.

Q: That does sound interesting.

A: In particular, we looked for an agent that would cooperate with FairBot, cooperate with any other agents like itself, and defect against CooperateBot. Which is how a rational bot ought to behave, if you see what I'm saying.

Q: Did you find a bot like that?

A: Yes we did! We called it [ PrudentBot], and then we went on to generalize the underlying theory to more general decision problems. It's one little demo piece of technology that did not exist in any way shape or form, back when everyone thought that the inevitable answer to the Prisoner's Dilemma was that 'rational' agents would sadly Defect against each other even as they knew the other 'rational' agent was reasoning the same way.

Q: So what would an impressive, developed version of this technology look like? Agents embodied in smart contracts on Ethereum that could reason their way to trusting each other?

A: Sounds like a bit of a stretch for something that could realistically happen and be actually important. Though it is an obvious thought. But no, the real import of the idea does not mainly come from scaling up PrudentBot as technology. One example of a more realistic impact is that there are other basic problems in economics which have been, to some extent, blocked up by the conventional theory that says rational agents defect against each other in the Prisoner's Dilemma. I mean, if that's the rational answer, then if you try to make something else happen, it is irrational--so it might as well be some weird special case; you wouldn't go looking for a general principle of something you thought was 'irrational'. Logical decision theory encourages us to think about successful coordination in a more systematic way than causal decision theory encouraged.

Q: Would you perhaps have a more understandable everyday real-world example?

A: There is a standard, widely known economic experiment that has actually been run in the laboratory many times, the Ultimatum Game. In the Ultimatum bargaining game, one experimental subject, the Proposer, proposes a way to split \$10 with the other experimental subject, the Responder. If the Responder accepts the split, the two subjects actually get the money. If the Responder rejects the split, both players get nothing. Either way, their part in the experiment is over--you can't go back and change the numbers. If I say, "I propose I get \$8 and you get \$2," it's an ultimatum in the sense that you can say "Yes" and get \$2, or "No" and we both get nothing.

Q: With you so far.

A: On causal decision theory, a rational Responder should accept an offer of \$1, since getting \$1 is better than getting nothing. Obviously, a Proposer that knows it's dealing with a 'rational' Responder will offer \$1, that is, offer to split \$9/\$1. Oddly enough, human subjects often seem to be 'irrational' about this. So if you knew you were dealing with a human, whaddaya know, you might offer higher than \$1.

Q: And logical decision theory says to reject anything less than \$5?

A: Well, if you're an LDT Responder dealing with a 'rational' CDT Proposer, and you both know each other's code, you'd reject any offer less than \$9! The CDT agent simulates your code to see how you'd respond to various offers, sees that you'll reject anything less than \$9 of the \$10, sighs and wishes you weren't so 'irrational', and then it gives you almost all the money. Obviously, as an LDT Responder, you'd be doing that because you reasoned about the CDT Proposer's code and figured out you could get away with it. But the CDT Proposer doesn't think it can change the way you'll think--that's the kind of logical correlation a CDT agent refuses to exploit. Now, if you consider two LDT agents, the situation is more complicated.

Q: Wow. Okay, I can see where that might be true. Um, before we continue, is the some big-picture problem that the Ultimatum Game is secretly about?

A: Division of gains from trade! Let's say I have a used car whose true value to me is \$4,000, and you are a used-car buyer whose true value for this car would be \$7,000. The total gain from trade would be \$3,000, but the exact price at which we strike the deal determines who gets how much of the gains from trade. If you offer me a price of \$4,001, then you are offering me \$1 of the gains from trade and trying to keep \$2,999. But, of course, I can always refuse to trade at a price of \$4,001, in which case the \$3,000 of gains-from-trade won't exist to be divided, and the final result is that I have a used car worth \$4,000 to me instead of money worth \$4,001. Now, if there were no market in used cars--if I were the only seller, and that buyer was the only buyer, and no other buyer would potentially come along later--then this is an Ultimatum Game. It happens any time there's a one-of-a-kind transaction with no other potential buyers or sellers.

Q: Huh. Okay, I can see that. So I'm afraid to ask this, but what happens when two LDT agents play the Ultimatum Game?

A: We don't have any results established from first principles, but we have a guess as to what the result ought to look like. I think \$5 is the fair offer. If you offer me \$5 or more, I accept. Now you come along and say, "Well, I think that in the Ultimatum game, the fair split is \$6 for the Proposer and \$4 for the Responder, since the Proposer's increased power ought to correspond to getting at least a little more of the money." So if then you offer me \$4, meaning you're trying to keep \$6 for yourself, I could… accept your offer with 83% probability and reject it with 17% probability!

Q: But of course; that's only the natural thing to do when somebody offers you \$4.

A: Indeed it is! You see, in that case, you have an 83% chance of taking home \$6 and a 17% chance of getting nothing, whose expected profit for you works out to…

Q: \$4.98. Oh, I see.

A: Right. You can offer me \$5 and take home \$5 yourself, or you can try offering me \$4 and end up taking home an expected \$4.98. The point isn't that we end up getting the same amount, it's that I'm behaving in a way that maximizes my profit without giving you an incentive to cheat me. In fact, we're both behaving in a way that means we have a strong incentive to agree on what is 'fair'--you don't do better by rationalizing a higher fair price, and I don't either--but which doesn't leave us entirely in the cold if we have a slight disagreement on exactly what is fair. Like, we both still have expected gains in that case, they've just diminished a bit.

Q: That is… an interesting outlook on life. But this isn't something you can derive directly from logical decision theory?

A: Not yet! Right now we're at the stage of having just derived standard game theory in LDT from first principles. %note: Which is actually very exciting, because you can't derive game theory from first principles in CDT! In game theory there's just a sourceless assumption that the other player already plays at the Nash equilibrium, and so you decide to play at the Nash equilibrium yourself; but this assumption doesn't really come from anywhere in a fully general way. With LDT we can model ab initio how agents reason about other agents that are reasoning about them, and bootstrap standard game theory from scratch! In, uh, a surprisingly complicated technical way. % But in the bigger picture, even though the process I just gave might seem like a surprisingly clever and obvious-in-retrospect analysis of the Ultimatum Game, even a realistic one because it explains why people sometimes reject offers under \$5, you will not--so far as we know--find this idea in the literature anywhere except as an LDT-inspired idea.

Q: Why is that?

A: Because on the conventional analysis you're being irrational any time you reject an offer of \$1. If you try to get a reputation for not accepting \$1 offers, all we can say about you on the standard analysis is that you're trying to acquire a useful reputation for being irrational. If you live in a world where two rational agents Defect in the Prisoner's Dilemma and it's sad but that's all there is to it, you don't have the mindset of asking, "Okay, how will two rational agents squeeze every bit of value out of this problem?" If you think like a logical decision theorist, then rejecting a \$1 offer in the Ultimatum Game doesn't need to be irrational, a person who tries to get a reputation for rejecting \$1 offers is trying to behave in a way that is locally rational if the other person has any idea what you're thinking, and it makes far more sense to look for an elegant and beautiful solution to the Ultimatum Game because you don't have the idea that it's all just people being unreasonable.

Q: Huh. Okay, neat. I think I have some idea of what logical decision theory is good for, now.

%%!knows-requisite([dt_xrisk]):

A: Thanks for stopping by! If you want to try swallowing more information than that, consider trying to read one of the other intros, God help you. Work on logical decision theory was originally sponsored by the Machine Intelligence Research Institute, with contributions from the Future of Humanity Institute at Oxford University, plus numerous people who came to our workshops or joined in private discussions without being paid. For more on the history of who gets credit for what, although at this point you won't understand like three-quarters of what the terms are about, [ see here].

Q: You're welcome, and I'll keep it in mind.

%%

%%knows-requisite([dt_xrisk]):

A: Actually, that's not even the really important part. All that is just a side application.

Q: Oh, really.

A: Right, but the part that was actually important is, again, a bit more difficult to understand.

(In progress.)

%%


Comments

Eric Rogstad

A: In fact, we can figure out surprisingly quickly what happens in complicated\-looking systems of agents that are all trying to prove things about other agents using first\-order logic\. Like, you can hand me some horrific\-looking system of "Agent 1 cooperates if it proves that Agent 2 defects against Agent 3 but not against Agent 4, Agent 2 cooperates if it proves that Agent 1 cooperates and Agent 3 defects, etcetera" and I toss it into a short computer program and give you the result one millisecond later\. So we can actually build agents like FairBot2 and run them in complicated tournaments, looking for programs that seem smarter than FairBot2\.

and I can toss it?

Eric Rogstad

A: Division of gains from trade\! Let's say I have a used car whose true value to me is \$4,000, and you are a used\-car buyer whose true value for this car would be \$7,000\. The total gain from trade would be \$3,000, but the exact price at which we strike the deal determines who gets how much of the gains from trade\. If you offer me a price of \$4,001, then you are offering me \$1 of the gains from trade and trying to keep \$2,999\. But, of course, you can always refuse to trade at a price of \$4,001, in which case the \$3,000 of gains\-from\-trade won't exist to be divided, and the final result is that you have a used car worth \$4,000 to you instead of money worth \$4,001\. Now, if there were no market in used cars\-\-if you were the only seller, and that buyer was the only buyer, and no other buyer would potentially come along later\-\-then this is basically the Ultimatum Game\. It happens any time there's a one\-of\-a\-kind transaction with no other potential buyers or sellers\.

Did you just swap the pronouns here? In the previous sentences the speaker was the seller and the listener was the buyer, but now it sounds like it's the other way around.