"I am pretty surprised by ho..."

https://arbital.com/p/1fr

by Paul Christiano Dec 28 2015 updated Dec 28 2015


I am pretty surprised by how confident the voters are!

Is "arbitrarily powerful" intended to include e.g. an arbitrarily dumb search given arbitrarily large amounts of computing power? Or is it intended to require arbitrarily high efficiency as well? The latter interpretation seems to make more sense (and is relevant for forecasting). Also, it's the only option if we read "can exist" as referring to physical possibility, given that there are probably limits on the resources available to any physical system. But on that reading, 99% seems clearly crazy.

It also seems weird to give arguments in favor without offering any plausible way in which the claim could be false, or offering any arguments against. The only alternative mentioned is inevitability, which is maybe taken seriously in philosophy but doesn't really seem plausible.

I guess the norm is that I can add counterarguments and alternatives to the article itself if I object? Somehow the current experience is not set up in a way that would make that feel natural.

Note that most plausible failures of orthogonality are bad news, perhaps very bad news.


Comments

Eliezer Yudkowsky

To make sure we're on the same page, Orthogonality is true if it's possible for a paperclip maximizer to exist and be, say, 95% as cognitively efficient and ~100% as technologically sophisticated as any other agent (with equivalent resources). Check?

Alexei Andreev

Paul, you can start by writing an objection as a comment, if it's a few paragraphs long. You can write a new comment for each new objection. If you want to make it detailed / add a vote, then creating a new page makes sense.

I agree that the website currently doesn't provide intuitive support for arguments; this will come in the near future. For this year we focused on explanation / presentation.

Paul Christiano

(Understandable to focus on explanation for now. Threaded replies to replies would also be great eventually.)

Eliezer: I assumed 95% efficiency was not sufficient; I was thinking about asymptotic equivalence, i.e. efficiency approaching 1 as the sophistication of the system increases. Asymptotic equivalence of technological capability seems less interesting than of cognitive capability, though they are equivalent if either we construe technology broadly to include cognitive tasks or if we measure technological capability in a way with lots of headroom.

(Nick says "more or less any level of intelligence," which I guess could be taken to exclude the very highest levels of intelligence, but based on his other writing I think he intended merely to exclude low levels. The language in this post seems to explicitly cover arbitrarily high efficiency.)

I still think that 99% confidence is way too high even if you allow 50% efficiency, though at that point I would at least go for "very likely."

Also of course you need to be able to replace "paperclip maximizer" with anything. When I imagine orthogonality failing, "human values" seem like a much more likely failure case than "paperclips."

I don't think that this disagreement about orthogonality is especially important, I mostly found the 99%'s amusing and wanted to give you a hard time about it. It does suggest that in some sense I might be more pessimistic about the AI control problem itself than you are, with my optimism driven by faith in humanity / the AI community.

Eliezer Yudkowsky

Paul, I didn't say "99%" lightly, obviously. And that makes me worried that we're not talking about the same thing. Which of the following statements sound agreeable or disagreeable?

"If you can get to 95% cognitive efficiency and 100% technological efficiency, then a human value optimizer ought to not be at an intergalactic-colonization disadvantage or a take-over-the-world-in-an-intelligence-explosion disadvantage and not even very much of a slow-takeoff disadvantage."

"The failure scenario that Paul visualizes for Orthogonality is something along the lines of, 'You can't have superintelligences that optimize any external factor, only things analogous to internal reinforcement.'"

"The failure scenario that Paul visualizes for Orthogonality is something along the lines of, 'The problem of reflective stability is unsolvable in the limit and no efficient optimizer with a unitary goal can be computationally large or self-improving.'"

"Paul is worried about something else / Eliezer has completely missed Paul's point."

Paul Christiano

(This is hard without threaded conversations. Responding to the "agree/disagree" from Eliezer)

The failure scenario that Paul visualizes for Orthogonality is something along the lines of, 'You can't have superintelligences that optimize any external factor, only things analogous to internal reinforcement.'

The failure scenario that Paul visualizes for Orthogonality is something along the lines of, 'The problem of reflective stability is unsolvable in the limit and no efficient optimizer with a unitary goal can be computationally large or self-improving.'

I think there are a lot of plausible failure modes. The two failures you outline don't seem meaningfully distinct given our current understanding, and seem to roughly describe what I'm imagining. Possible examples:

Paul is worried about something else / Eliezer has completely missed Paul's point.

I do think the more general point, of "we really don't know what's going on here," is probably more important than the particular possible counterexamples. Even if I had no plausible counterexamples in mind, I just wouldn't especially confident.

I think the only robust argument in favor is that unbounded agents are probably orthogonal. But (1) that doesn't speak to efficiency, and (2) even that is a bit dicey, so I wouldn't go for 99% even on the weaker form of orthogonality that neglects efficiency.

If you can get to 95% cognitive efficiency and 100% technological efficiency, then a human value optimizer ought to not be at an intergalactic-colonization disadvantage or a take-over-the-world-in-an-intelligence-explosion disadvantage and not even very much of a slow-takeoff disadvantage.

It sounds regrettable but certainly not catastrophic. Here is how I would think about this kind of thing (it's not something I've thought about quantitatively much, it doesn't seem particularly action-relevant).

We might think that the speed of development or productivity of projects varies a lot randomly. So in the "race to take over the world" model (which I think is the best case for an inefficient project maximizing its share of the future), we'd want to think about what kind of probabilistic disadvantage a small productivity gap introduces.

As a simple toy model, you can imagine two projects; the one that does better will take over the world.

If you thought that productivity was log normal with a standard deviation of */ 2, then a 5% productivity disadvantage corresponds to maybe a 48% chance of being more productive. Over the course of more time the disadvantage becomes more pronounced if randomness averages out. If productivity variation is larger or smaller then it decreases or increases the impact of an efficiency loss. If there are more participants, then the impact of a productivity hit becomes significantly large. If the good guys only have a small probability of losing, then the cost is proportionally lower. And so on.

Combining with my other views, maybe one is looking at a cost of tenths of a percent. You would presumably hope to avoid this by having the world coordinate even a tiny bit (I thought about this a bit here). Overall I'll stick with regrettable but far from catastrophic.

(My bigger issue in practice with efficiency losses is similar to your view that people ought to have really high confidence. I think it is easy to make sloppy arguments that one approach to AI is 10% as effective as another, when in fact it is 0.0001% as effective, and that holding yourself to asymptotic equivalence is a more productive standard unless it turns out to be unrealizable.)