{ localUrl: '../page/instrumental_convergence.html', arbitalUrl: 'https://arbital.com/p/instrumental_convergence', rawJsonUrl: '../raw/10g.json', likeableId: '17', likeableType: 'page', myLikeValue: '0', likeCount: '3', dislikeCount: '0', likeScore: '3', individualLikes: [ 'EricBruylant', 'EliezerYudkowsky', 'JamesMiller2' ], pageId: 'instrumental_convergence', edit: '20', editSummary: '', prevEdit: '19', currentEdit: '20', wasPublished: 'true', type: 'wiki', title: 'Instrumental convergence', clickbait: 'Some strategies can help achieve most possible simple goals. E.g., acquiring more computing power or more material resources. By default, unless averted, we can expect advanced AIs to do that.', textLength: '38257', alias: 'instrumental_convergence', externalUrl: '', sortChildrenBy: 'likes', hasVote: 'false', voteType: '', votesAnonymous: 'false', editCreatorId: 'EliezerYudkowsky', editCreatedAt: '2017-04-10 06:12:52', pageCreatorId: 'EliezerYudkowsky', pageCreatedAt: '2015-07-16 02:02:55', seeDomainId: '0', editDomainId: 'EliezerYudkowsky', submitToDomainId: '0', isAutosave: 'false', isSnapshot: 'false', isLiveEdit: 'true', isMinorEdit: 'false', indirectTeacher: 'false', todoCount: '19', isEditorComment: 'false', isApprovedComment: 'true', isResolved: 'false', snapshotText: '', anchorContext: '', anchorText: '', anchorOffset: '0', mergedInto: '', isDeleted: 'false', viewCount: '2575', text: '[summary(Gloss): AIs that want different things may pursue very similar strategies. Whether you're trying to make as many paperclips as possible, or keep a single button pressed for as long as possible, you'll still want access to resources of matter, energy, and computation; and to not die in the next five minutes; and to not let anyone edit your utility function. Strategies that are implied by most (not all) goals are "instrumentally convergent".]\n\n[summary: Dissimilar goals can imply similar strategies. For example, whether your goal is to bake a cheesecake, or fly to Australia, you could benefit from... matter, energy, the ability to compute plans, and not dying in the next five minutes.\n\n"Instrumentally convergent strategies" are a set of strategies implied by *most* (not all) simple or compactly specified goals that a [7g1 sufficiently advanced artificial agent] could have.\n\nIt would still be possible to make a special and extra effort to build an AI that [2vk doesn't follow] convergent strategies. E.g., one convergent strategy is "don't let anyone edit your [1fw utility function]". We might be able to make a special AI that would let us edit the utility function. But it would take an [45 extra effort] to do this. By *default*, most agents won't let you edit their utility functions.]\n\n[summary(Technical): "Instrumental convergence" is the observation that, given [7cp simple measures] on a space $\\mathcal U$ of possible utility functions, it often seems reasonable to guess that a supermajority of utility functions $U_k \\in \\mathcal U$ will imply optimal policies $\\pi_k$ that lie in some abstract partition $X$ of policy space. $X$ is then an "instrumentally convergent strategy."\n\nExample: "Gather resources (of matter, negentropy, and computation)." Whether you're a [10h paperclip maximizer], or a diamond maximizer, or you just want to keep a single button pressed for as long as possible: It seems very likely that the policy $\\pi_k$ you pursue would include "gathering more resources" ($\\pi_k \\in X$) rather than being inside the policy partition "never gathering any more resources" ($\\pi_k \\in \\neg X$). Similarly, "become more intelligent" seems more likely as a convergent strategy than "don't try to become more intelligent".\n\nKey implications:\n\n- It doesn't require a deliberate effort to build AIs with human-hostile terms in their [1bh terminal] [1fw utility function], to end up with AIs that execute [450 detrimental] behaviors like using up all the matter and energy in the universe on [7ch things we wouldn't see as interesting].\n- It doesn't take a deliberate effort to make an AI that has an explicitly programmed goal of wanting to be superintelligent, for it to become superintelligent.\n- We need [45] solutions to [2vk avert] convergent strategies we'd rather our AI *not* execute, such as "Don't let anyone [1b7 edit your utility function]."]\n\n# Alternative introductions\n\n- Steve Omohundro: "[The Basic AI Drives](https://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf)"\n- Nick Bostrom: "[The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents](http://www.nickbostrom.com/superintelligentwill.pdf)".\n\n# Introduction: A machine of unknown purpose\n\nSuppose you landed on a distant planet and found a structure of giant metal pipes, crossed by occasional cables. Further investigation shows that the cables are electrical superconductors carrying high-voltage currents.\n\nYou might not know what the huge structure did. But you would nonetheless guess that this huge structure had been built by some *intelligence,* rather than being a naturally-occurring mineral formation - that there were aliens who built the structure for some purpose.\n\nYour reasoning might go something like this: "Well, I don't know if the aliens were trying to manufacture cars, or build computers, or what. But if you consider the problem of efficient manufacturing, it might involve mining resources in one place and then efficiently transporting them somewhere else, like by pipes. Since the most efficient size and location of these pipes would be stable, you'd want the shape of the pipes to be stable, which you could do by making the pipes out of a hard material like metal. There's all sorts of operations that require energy or negentropy, and a superconducting cable carrying electricity seems like an efficient way of transporting that energy. So I don't know what the aliens were *ultimately* trying to do, but across a very wide range of possible goals, an intelligent alien might want to build a superconducting cable to pursue that goal."\n\nThat is: We can take an enormous variety of compactly specifiable goals, like "travel to the other side of the universe" or "support biological life" or "make paperclips", and find very similar optimal strategies along the way. Today we don't actually know if electrical superconductors are the most useful way to transport energy in the limit of technology. But whatever is the most efficient way of transporting energy, whether that's electrical superconductors or something else, the most efficient form of that technology would probably not vary much depending on whether you were trying to make diamonds or make paperclips.\n\nOr to put it another way: If you consider the goals "make diamonds" and "make paperclips", then they might have almost nothing in common with respect to their end-states - a diamond might contain no iron. But the earlier strategies used to make a lot of diamond and make a lot of paperclips might have much in common; "the best way of transporting energy to make diamond" and "the best way of transporting energy to make paperclips" are much more likely to be similar.\n\nFrom a [1r8 Bayesian] standpoint this is how we can identify a huge machine strung with superconducting cables as having been produced by high-technology aliens, even before we have any idea of what the machine does. We're saying, "This looks like the product of optimization, a strategy $X$ that the aliens chose to best achieve some unknown goal $Y$; we can infer this even without knowing $Y$ because many possible $Y$-goals would concentrate probability into this $X$-strategy being used."\n\n# Convergence and its caveats\n\nWhen you select policy $\\pi_k$ [9h because you expect it to achieve] a later state $Y_k$ (the "goal"), we say that $\\pi_k$ is your [10j instrumental] strategy for achieving $Y_k.$ The observation of "instrumental convergence" is that a widely different range of $Y$-goals can lead into highly similar $\\pi$-strategies. (This becomes truer as the $Y$-seeking agent becomes more [6s instrumentally efficient]; two very powerful chess engines are more likely to solve a humanly solvable chess problem the same way, compared to two weak chess engines whose individual quirks might result in idiosyncratic solutions.)\n\nIf there's a simple way of classifying possible strategies $\\Pi$ into partitions $X \\subset \\Pi$ and $\\neg X \\subset \\Pi$, and you think that for *most* compactly describable goals $Y_k$ the corresponding best policies $\\pi_k$ are likely to be inside $X,$ then you think $X$ is a "convergent instrumental strategy".\n\nIn other words, if you think that a superintelligent paperclip maximizer, diamond maximizer, a superintelligence that just wanted to keep a single button pressed for as long as possible, and a superintelligence optimizing for a flourishing intergalactic civilization filled with happy sapient beings, would *all* want to "transport matter and energy efficiently" in order to achieve their other goals, then you think "transport matter and energy efficiently" is a convergent instrumental strategy.\n\nIn this case "paperclips", "diamonds", "keeping a button pressed as long as possible", and "sapient beings having fun", would be the goals $Y_1, Y_2, Y_3, Y_4.$ The corresponding best strategies $\\pi_1, \\pi_2, \\pi_3, \\pi_4$ for achieving these goals would not be *identical* - the policies for making paperclips and diamonds are not exactly the same. But all of these policies (we think) would lie within the partition $X \\subset \\Pi$ where the superintelligence tries to "transport matter and energy efficiently" (perhaps by using superconducting cables), rather than the complementary partition $\\neg X$ where the superintelligence does not try to transport matter and energy efficiently.\n\n## Semiformalization\n\n- Consider the set of [ computable] and [ tractable] [1fw utility functions] $\\mathcal U_C$ that take an outcome $o,$ described in some language $\\mathcal L$, onto a rational number $r$. That is, we suppose:\n - That the relation $U_k$ between descriptions $o_\\mathcal L$ of outcomes $o$, and the corresponding utilities $r,$ is computable;\n - Furthermore, that it can be computed in realistically bounded time;\n - Furthermore, that the $U_k$ relation between $o$ and $r$, and the $\\mathbb P [o | \\pi_i]$ relation between policies and subjectively expected outcomes, are together regular enough that a realistic amount of computing power makes it possible to search for policies $\\pi$ that are yield high expected $U_k(o)$.\n- Choose some simple programming language $\\mathcal P,$ such as the language of Turing machines, or Python 2 without most of the system libraries.\n- Choose a simple mapping $\\mathcal P_B$ from $\\mathcal P$ onto bitstrings.\n- Take all programs in $\\mathcal P_B$ between 20 and 1000 bits in length, and filter them for boundedness and tractability when treated as utility functions, to obtain the filtered set $U_K$.\n- Set 90% as an arbitrary threshold.\n\nIf, given our beliefs $\\mathbb P$ about our universe and which policies lead to which real outcomes, we think that in an intuitive sense it sure looks like at least 90% of the utility functions $U_k \\in U_K$ ought to imply best findable policies $\\pi_k$ which lie within the partition $X$ of $\\Pi,$ we'll allege that $X$ is "instrumentally convergent".\n\n## Compatibility with Vingean uncertainty\n\n[9g] is the observation that, as we become increasingly confident of increasingly powerful intelligence from an agent with precisely known goals, we become decreasingly confident of the exact moves it will make (unless the domain has an optimal strategy and we know the exact strategy). E.g., to know exactly where [1bx Deep Blue] would move on a chessboard, [1c0 you would have to be as good] at chess as Deep Blue. However, we can become increasingly confident that more powerful chessplayers will eventually win the game - that is, steer the future outcome of the chessboard into the set of states designated 'winning' for their color - even as it becomes less possible for us to be certain about the chessplayer's exact policy.\n\nInstrumental convergence can be seen as a caveat to Vingean uncertainty: Even if we don't know the exact actions *or* the exact end goal, we may be able to predict that some intervening states or policies will fall into certain *abstract* categories.\n\nThat is: If we don't know whether a superintelligent agent is a paperclip maximizer or a diamond maximizer, we can still guess with some confidence that it will pursue a strategy in the general class "obtain more resources of matter, energy, and computation" rather than "don't get more resources". This is true even though [1c0] says that we won't be able to predict *exactly* how the superintelligence will go about gathering matter and energy.\n\nImagine the real world as an extremely complicated game. Suppose that at the very start of this game, a highly capable player must make a single binary choice between the abstract moves "Gather more resources later" and "Never gather any more resources later". [9g Vingean uncertainty] or not, we seem justified in putting a high probability on the first move being preferred - a binary choice is simple enough that we can take a good guess at the optimal play.\n\n## Convergence supervenes on consequentialism\n\n$X$ being "instrumentally convergent" doesn't mean that every mind needs an extra, independent drive to do $X.$\n\nConsider the following line of reasoning: "It's impossible to get on an airplane without buying plane tickets. So anyone on an airplane must be a sort of person who enjoys buying plane tickets. If I offer them a plane ticket they'll probably buy it, because this is almost certainly somebody who has an independent motivational drive to buy plane tickets. There's just no way you can design an organism that ends up on an airplane unless it has a buying-tickets drive."\n\nThe appearance of an "instrumental strategy" can be seen as implicit in repeatedly choosing actions $\\pi_k$ that lead into a final state $Y_k,$ and it so happens that $\\pi_k \\in X$. There doesn't have to be a special $X$-module which repeatedly selects $\\pi_X$-actions regardless of whether or not they lead to $Y_k.$\n\nThe flaw in the argument about plane tickets is that human beings are consequentialists who buy plane tickets *just* because they wanted to go somewhere and they expected the action "buy the plane ticket" to have the consequence, in that particular case, of going to the particular place and time they wanted to go. No extra "buy the plane ticket" module is required, and especially not a plane-ticket-buyer that doesn't check whether there's any travel goal and whether buying the plane ticket leads into the desired later state.\n\nMore semiformally, suppose that $U_k$ is the [1fw utility function] of an agent and let $\\pi_k$ be the policy it selects. If the agent is [6s instrumentally efficient] relative to us at achieving $U_k,$ then from our perspective we can mostly reason about whatever kind of optimization it does [4gh as if it were] expected utility maximization, i.e.:\n\n$$\\pi_k = \\underset{\\pi_i \\in \\Pi}{\\operatorname{argmax}} \\mathbb E [ U_k | \\pi_i ]$$\n\nWhen we say that $X$ is instrumentally convergent, we are stating that it probably so happens that:\n\n$$\\big ( \\underset{\\pi_i \\in \\Pi}{\\operatorname{argmax}} \\mathbb E [ U_k | \\pi_i ] \\big ) \\in X$$\n\nWe are *not* making any claims along the lines that for an agent to thrive, its utility function $U_k$ must decompose into a term for $X$ plus a residual term $V_k$ denoting the rest of the utility function. Rather, $\\pi_k \\in X$ is the mere result of unbiased optimization for a goal $U_k$ that makes no explicit mention of $X.$\n\n(This doesn't rule out that some special cases of AI development pathways might tend to produce artificial agents with a value function $U_e$ which *does* decompose into some variant $X_e$ of $X$ plus other terms $V_e.$ For example, natural selection on organisms that spend a long period of time as non-[9h consequentialist] policy-reinforcement-learners, before they later evolve into consequentialists, [has had results along these lines](http://lesswrong.com/lw/l0/adaptationexecuters_not_fitnessmaximizers/) in [the case of humans](http://lesswrong.com/lw/l3/thou_art_godshatter/). For example, humans have an independent, separate "curiosity" drive, instead of just valuing information as a means to inclusive genetic fitness.)\n\n## Required advanced agent properties\n\n[4n1 Distinguishing the advanced agent properties that seem probably required] for an AI program to start exhibiting the sort of reasoning filed under "instrumental convergence", the most obvious candidates are:\n\n- Sufficiently powerful [9h consequentialism] (or [-pseudoconsequentialism]); plus\n- [3nf Understanding the relevant aspects of the big picture] that connect later goal achievement to executing the instrumental strategy.\n\nThat is: You don't automatically see "acquire more computing power" as a useful strategy unless you understand "I am a cognitive program and I tend to achieve more of my goals when I run on more resources." Alternatively, e.g., the programmers adding more computing power and the system's goals starting to be achieved better, after which related policies are positively reinforced and repeated, could arrive at a similar end via the [ pseudoconsequentialist] idiom of policy reinforcement.\n\nThe [2c advanced agent properties] that would naturally or automatically lead to instrumental convergence seem well above the range of modern AI programs. As of 2016, current machine learning algorithms don't seem to be within the range where this [6r predicted phenomenon] should start to be visible.\n\n# Caveats\n\n### An instrumental convergence claim is about a default or a majority of cases, *not* a universal generalization.\n\nIf for whatever reason your goal is to "make paperclips without using any superconductors", then superconducting cables will not be the best instrumental strategy for achieving that goal.\n\nAny claim about instrumental convergence says at most, "*The vast majority* of possible goals $Y$ would convergently imply a strategy in $X,$ *by default* and *unless otherwise averted* by some special case $Y_i$ for which strategies in $\\neg X$ are better."\n\nSee also the more general idea that [4ly the space of possible minds is very large]. Universal claims about all possible minds have many chances to be false, while existential claims "There exists at least one possible mind such that..." have many chances to be true.\n\nIf some particular oak tree is extremely important and valuable to you, then you won't cut it down to obtain wood. It is irrelevant whether a majority of other utility functions that you could have, but don't actually have, would suggest cutting down that oak tree.\n\n### Convergent strategies are not deontological rules.\n\nImagine looking at a machine chess-player and reasoning, "Well, I don't think the AI will sacrifice its pawn in this position, even to achieve a checkmate. Any chess-playing AI needs a drive to be protective of its pawns, or else it'd just give up all its pawns. It wouldn't have gotten this far in the game in the first place, if it wasn't more protective of its pawns than that."\n\nModern chess algorithms [6s behave in a fashion that most humans can't distinguish] from expected-checkmate-maximizers. That is, from your merely human perspective, watching a single move at the time it happens, there's no visible difference between *your subjective expectation* for the chess algorithm's behavior, and your expectation for the behavior of [4gh an oracle that always output] the move with the highest conditional probability of leading to checkmate. If you, a human, you could discern with your unaided eye some systematic difference like "this algorithm protects its pawn more often than checkmate-achievement would imply", you would know how to make systematically better chess moves; modern machine chess is too superhuman for that.\n\nOften, this uniform rule of output-the-move-with-highest-probability-of-eventual-checkmate will *seem* to protect pawns, or not throw away pawns, or defend pawns when you attack them. But if in some special case the highest probability of checkmate is instead achieved by sacrificing a pawn, the chess algorithm will do that instead.\n\nSemiformally:\n\nThe reasoning for an instrumental convergence claim says that for many utility functions $U_k$ and situations $S_i$ a $U_k$-consequentialist in situation $S_i$ will probably find some best policy $\\pi_k = \\underset{\\pi_i \\in \\Pi}{\\operatorname{argmax}} \\mathbb E [ U_k | S_i, \\pi_i ]$ that happens to be inside the partition $X$. If instead in situation $S_k$...\n\n$$\\big ( \\underset{\\pi_i \\in X}{\\operatorname{argmax}} \\mathbb E [ U_k | S_k, \\pi_i ] \\big ) \\ < \\ \\big ( \\underset{\\pi_i \\in \\neg X}{\\operatorname{argmax}} \\mathbb E [ U_k | S_k, \\pi_i ] \\big )$$\n\n...then a $U_k$-consequentialist in situation $S_k$ won't do any $\\pi_i \\in X$ even if most other scenarios $S_i$ make $X$-strategies prudent.\n\n### "$X$ would help accomplish $Y$" is insufficient to establish a claim of instrumental convergence on $X$.\n\nSuppose you want to get to San Francisco. You could get to San Francisco by paying me \\$20,000 for a plane ticket. You could also get to San Francisco by paying someone else \\$400 for a plane ticket, and this is probably the smarter option for achieving your other goals.\n\nEstablishing "Compared to doing nothing, $X$ is more useful for achieving most $Y$-goals" doesn't establish $X$ as an instrumental strategy. We need to believe that there's no other policy in $\\neg X$ which would be more useful for achieving most $Y.$\n\nWhen $X$ is phrased in very general terms like "acquire resources", we might reasonably guess that "don't acquire resources" or "do $Y$ without acquiring any resources" is indeed unlikely to be a superior strategy. If $X_i$ is some narrower and more specific strategy, like "acquire resources by mining them using pickaxes", it's much more likely that some other strategy $X_k$ or even a $\\neg X$-strategy is the real optimum.\n\nSee also: [43g], [9f].\n\nThat said, if we can see how a narrow strategy $X_i$ helps most $Y$-goals to some large degree, then we should expect the actual policy deployed by an [6s efficient] $Y_k$-agent to obtain *at least* as much $Y_k$ as would $X_i.$\n\nThat is, we can reasonably argue: "By following the straightforward strategy 'spread as far as possible, absorb all reachable matter, and turn it into paperclips', an initially unopposed superintelligent paperclip maximizer could obtain $10^{55}$ paperclips. Then we should expect an initially unopposed superintelligent paperclip maximizer to get at least this many paperclips, whatever it actually does. Any strategy in the opposite partition 'do *not* spread as far as possible, absorb all reachable matter, and turn it into paperclips' must seem to yield more than $10^{55}$ paperclips, before we should expect a paperclip maximizer to do that."\n\nSimilarly, a claim of instrumental convergence on $X$ can be ceteris paribus refuted by presenting some alternate narrow strategy $W_j \\subset \\neg X$ which seems to be more useful than any obvious strategy in $X.$ We are then not positively confident of convergence on $W_j,$ but we should assign very low probability to the alleged convergence on $X,$ at least until somebody presents an $X$-exemplar with higher expected utility than $W_j.$ If the proposed convergent strategy is "trade economically with other humans and obey existing systems of property rights," and we see no way for Clippy to obtain $10^{55}$ paperclips under those rules, but we do think Clippy could get $10^{55}$ paperclips by expanding as fast as possible without regard for human welfare or existing legal systems, then we can ceteris paribus reject "obey property rights" as convergent. Even if trading with humans to make paperclips produces more paperclips than *doing nothing*, it may not produce the *most* paperclips compared to converting the material composing the humans into more efficient paperclip-making machinery.\n\n### Claims about instrumental convergence are not ethical claims.\n\nWhether $X$ is a good way to get both paperclips and diamonds is irrelevant to whether $X$ is good for human flourishing or eudaimonia or fun-theoretic optimality or [313 extrapolated volition] or [55 whatever]. Whether $X$ is, in an intuitive sense, "good", needs to be evaluated separately from whether it is instrumentally convergent.\n\nIn particular: instrumental strategies are not [1bh terminal values]. In fact, they have a type distinction from terminal values. "If you're going to spend resources on thinking about technology, try to do it earlier rather than later, so that you can amortize your invention over more uses" seems very likely to be an instrumentally convergent exploration-exploitation strategy; but "spend cognitive resources sooner rather than later" is more a feature of *policies* rather than a feature of *utility functions.* It's definitely not plausible in a pretheoretic sense as the Meaning of Life. So a partition into which most instrumental best-strategies fall, is not like a universally convincing utility function (which you probably [ shouldn't look for] in the first place).\n\nSimilarly: The natural selection process that produced humans gave us many independent drives $X_e$ that can be viewed as special variants of some convergent instrumental strategy $X.$ A pure paperclip maximizer would calculate the value of information (VoI) for learning facts that could lead to it making more paperclips; we can see learning high-value facts as a convergent strategy $X$. In this case, human "curiosity" can be viewed as the corresponding emotion $X_e.$ This doesn't mean that the *true purpose* of $X_e$ is $X$ any more than the *true purpose* of $X_e$ is "make more copies of the allele coding for $X_e$" or "increase inclusive genetic fitness". [ That line of reasoning probably results from a mind projection fallacy on 'purpose'.]\n\n### Claims about instrumental convergence are not futurological predictions.\n\nEven if, e.g., "acquire resources" is an instrumentally convergent strategy, this doesn't mean that we can't as a special case deliberately construct advanced AGIs that are *not* driven to acquire as many resources as possible. Rather the claim implies, "We would need to deliberately build $X$-[2vk averting] agents as a special case, because by default most imaginable agent designs would pursue a strategy in $X.$"\n\nOf itself, this observation makes no further claim about the quantitative probability that, in the real world, AGI builders might *want* to build $\\neg X$-agents, might *try* to build $\\neg X$-agents, and might *succeed* at building $\\neg X$-agents.\n\nA claim about instrumental convergence is talking about a logical property of the larger design space of possible agents, not making a prediction what happens in any particular research lab. (Though the ground facts of computer science are *relevant* to what happens in actual research labs.)\n\nFor discussion of how instrumental convergence may in practice lead to [6r foreseeable difficulties] of AGI alignment that resist most simple attempts at fixing them, see the articles on [48] and [42].\n\n# Central example: Resource acquisition\n\nOne of the convergent strategies originally proposed by Steve Omohundro in "[The Basic AI Drives](https://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf)" was *resource acquisition:*\n\n> "All computation and physical action requires the physical resources of space, time, matter, and free energy. Almost any goal can be better accomplished by having more of these resources."\n\nWe'll consider this example as a template for other proposed instrumentally convergent strategies, and run through the standard questions and caveats.\n\n• Question: Is this something we'd expect a paperclip maximizer, diamond maximizer, *and* button-presser to do? And while we're at it, also a flourishing-intergalactic-civilization optimizer?\n\n- Paperclip maximizers need matter and free energy to make paperclips.\n- Diamond maximizers need matter and free energy to make diamonds.\n- If you're trying to maximize the probability that a single button stays pressed as long as possible, you would build fortresses protecting the button and energy stores to sustain the fortress and repair the button for the longest possible period of time.\n- Nice superintelligences trying to build happy intergalactic civilizations full of flourishing sapient minds, can build marginally larger civilizations with marginally more happiness and marginally longer lifespans given marginally more resources.\n\nTo put it another way, for a utility function $U_k$ to imply the use of every joule of energy, it is a sufficient condition that for every plan $\\pi_i$ with expected utility $\\mathbb E [ U | \\pi_i ],$ there is a plan $\\pi_j$ with $\\mathbb E [ U | \\pi_j ] > \\mathbb E [ U | \\pi_i]$ that uses one more joule of energy:\n\n- For every plan $\\pi_i$ that makes paperclips, there's a plan $\\pi_j$ that would make *more* expected paperclips if more energy were available and acquired.\n- For every plan $\\pi_i$ that makes diamonds, there's a plan $\\pi_j$ that makes slightly more diamond given one more joule of energy.\n- For every plan $\\pi_i$ that produces a probability $\\mathbb P (press | \\pi_i) = 0.999...$ of a button being pressed, there's a plan $\\pi_j$ with a *slightly higher* probability of that button being pressed $\\mathbb P (press | \\pi_j) = 0.9999...$ which uses up the mass-energy of one more star.\n- For every plan that produces a flourishing intergalactic civilization, there's a plan which produces slightly more flourishing given slightly more energy.\n\n• Question: Is there some strategy in $\\neg X$ which produces higher $Y_k$-achievement for most $Y_k$ than any strategy inside $X$?\n\nSuppose that by using most of the mass-energy in most of the stars reachable before they go over the cosmological horizon as seen from present-day Earth, it would be possible to produce $10^{55}$ paperclips (or diamonds, or probability-years of expected button-stays-pressed time, or QALYs, etcetera).\n\nIt seems *reasonably unlikely* that there is a strategy inside the space intuitively described by "Do not acquire more resources" that would produce $10^{60}$ paperclips, let alone that the strategy producing the *most* paperclips would be inside this space.\n\nWe might be able to come up with a weird special-case situation $S_w$ that would imply this. But that's not the same as asserting, "With high subjective probability, in the real world, the optimal strategy will be in $\\neg X$." We're concerned with making a statement about defaults given the most subjectively probable background states of the universe, not trying to make a universal statement that covers every conceivable possibility.\n\nTo put it another way, if your policy choices or predictions are only safe given the premise that "In the real world, the best way of producing the maximum possible number of paperclips involves not acquiring any more resources", you need to clearly [4lz flag this as a load-bearing assumption].\n\n• Caveat: The claim is not that *every possible* goal can be better-accomplished by acquiring more resources.\n\nAs a special case, this would not be true of an agent with an [4l impact penalty] term in its utility function, or some other [2pf low-impact agent], if that agent also only had [4mn goals of a form that could be satisfied inside bounded regions of space and time with a bounded effort].\n\nWe might reasonably expect this special kind of agent to only acquire the minimum resources to accomplish its [4mn task].\n\nBut we wouldn't expect this to be true in a majority of possible cases inside mind design space; it's not true *by default*; we need to specify a further fact about the agent to make the claim not be true; we must expend engineering effort to make an agent like that, and failures of this effort will result in reversion-to-default. If we imagine some [5v computationally simple language] for specifying utility functions, then *most* utility functions wouldn't happen to have both of these properties, so a *majority* of utility functions given this language and measure would not *by default* try to use fewer resources.\n\n• Caveat: The claim is not that well-functioning agents must have additional, independent resource-acquiring motivational drives.\n\nA paperclip maximizer will act like it is "obtaining resources" if it merely implements the policy it expects to lead to the most paperclips. [10h Clippy] does not need to have any separate and independent term in its utility function for the amount of resource it possesses (and indeed this would potentially interfere with Clippy making paperclips, since it might then be tempted to hold onto resources instead of making paperclips with them).\n\n• Caveat: The claim is not that most agents will behave as if under a deontological imperative to acquire resources.\n\nA paperclip maximizer wouldn't necessarily tear apart a working paperclip factory to "acquire more resources" (at least not until that factory had already produced all the paperclips it was going to help produce.)\n\n• Check: Are we arguing "Acquiring resources is a better way to make a few more paperclips than doing nothing" or "There's *no* better/best way to make paperclips that involves *not* acquiring more matter and energy"?\n\nAs mentioned above, the latter seems reasonable in this case.\n\n• Caveat: "Acquiring resources is instrumentally convergent" is not an ethical claim.\n\nThe fact that a paperclip maximizer would try to acquire all matter and energy within reach, does not of itself bear on whether our own [3y9 normative] [55 values] might perhaps command that we ought to use few resources as a [1bh terminal value].\n\n(Though some of us might find pretty compelling the observation that if you leave matter lying around, it sits around not doing anything and eventually the protons decay or the expanding universe tears it apart, whereas if you turn the matter into people, it can have fun. There's no rule that instrumentally convergent strategies *don't* happen to be the right thing to do.)\n\n• Caveat: "Acquiring resources is instrumentally convergent" is not of itself a futurological prediction.\n\nSee above. Maybe we try to build [6w Task AGIs] instead. Maybe we succeed, and Task AGIs don't consume lots of resources because they have [4mn well-bounded tasks] and [4l impact penalties].\n\n# Relevance to the larger field of value alignment theory\n\nThe [2vl list of arguably convergent strategies] has its own page. However, some of the key strategies that have been argued as convergent in e.g. Omohundro's "[The Basic AI Drives](https://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf)" and Bostrom's "[The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents](http://www.nickbostrom.com/superintelligentwill.pdf)" include:\n\n- Acquiring/controlling matter and energy.\n- Ensuring that future intelligences with similar goals exist. E.g., a paperclip maximizer wants the future to contain powerful, effective intelligences that maximize paperclips.\n - An important special case of this general rule is *self-preservation*.\n - Another special case of this rule is *protecting goal-content integrity* (not allowing accidental or deliberate modification of the utility function).\n- Learning about the world (so as to better manipulate it to make paperclips).\n - Carrying out relevant scientific investigations.\n- Optimizing technology and designs.\n - Engaging in an "exploration" phase of seeking optimal designs before an "exploitation" phase of using them.\n- Thinking effectively (treating the cognitive self as an improvable technology).\n - Improving cognitive processes.\n - Acquiring computing resources for thought.\n\nThis is relevant to some of the central background ideas in [2v AGI alignment], because:\n\n- A superintelligence can have a catastrophic impact on our world even if its utility function contains no overtly hostile terms. A paperclip maximizer doesn't hate you, it just wants paperclips.\n- A consequentialist AGI with sufficient big-picture understanding will by default want to promote itself to a superintelligence, even if the programmers did not explicitly program it to want to self-improve. Even a [ pseudoconsequentialist] may e.g. repeat strategies that led to previous cognitive capability gains.\n\nThis means that programmers don't have to be evil, or even deliberately bent on creating superintelligence, in order for their work to have catastrophic consequences.\n\nThe list of convergent strategies, by its nature, tends to include everything an agent needs to survive and grow. This supports strong forms of the [1y Orthogonality Thesis] being true in practice as well as in principle. We don't need to filter on agents with explicit [1bh terminal] values for e.g. "survival" in order to find surviving powerful agents.\n\nInstrumental convergence is also why we expect to encounter most of the problems filed under [45]. When the AI is young, it's less likely to be [6s instrumentally efficient] or understand the relevant parts of the [3nf bigger picture]; but once it does, we would by default expect, e.g.:\n\n- That the AI will try to avoid being shut down.\n- That it will try to build subagents (with identical goals) in the environment.\n- That the AI will resist modification of its utility function.\n- That the AI will try to avoid the programmers learning facts that would lead them to modify the AI's utility function.\n- That the AI will try to pretend to be friendly even if it is not.\n- That the AI will try to [3cq conceal hostile thoughts] (and the fact that any concealed thoughts exist).\n\nThis paints a much more effortful picture of AGI alignment work than "Oh, well, we'll just test it to see if it looks nice, and if not, we'll just shut off the electricity."\n\nThe point that some undesirable behaviors are instrumentally *convergent* gives rise to the [42] problem. Suppose the AGI's most preferred policy starts out as one of these incorrigible behaviors. Suppose we currently have enough control to add [48 patches] to the AGI's utility function, intended to rule out the incorrigible behavior. Then, after integrating the intended patch, the new most preferred policy may be the most similar policy that wasn't explicitly blocked. If you naively give the AI a term in its utility function for "having an off-switch", it may still build subagents or successors that don't have off-switches. Similarly, when the AGI becomes more powerful and [6q its option space expands], it's again likely to find new similar policies that weren't explicitly blocked.\n\nThus, instrumental convergence is one of the two basic sources of [48 patch resistance] as a [6r foreseeable difficulty] of AGI alignment work.\n\n[todo: write a tutorial for the central example of a paperclip maximizer]\n[todo: distinguish that the proposition is convergent pressure, not convergent decision]\n[todo: the commonly suggested instrumental convergences]\n[todo: separately: figure out the 'problematic instrumental pressures' list for Corrigibility]\n[todo: separately: explain why instrumental pressures may be patch-resistant especially in self-modifying consequentialists]', metaText: '', isTextLoaded: 'true', isSubscribedToDiscussion: 'false', isSubscribedToUser: 'false', isSubscribedAsMaintainer: 'false', discussionSubscriberCount: '2', maintainerCount: '1', userSubscriberCount: '0', lastVisit: '2016-02-21 20:33:37', hasDraft: 'false', votes: [], voteSummary: 'null', muVoteSummary: '0', voteScaling: '0', currentUserVote: '-2', voteCount: '0', lockedVoteType: '', maxEditEver: '0', redLinkCount: '0', lockedBy: '', lockedUntil: '', nextPageId: '', prevPageId: '', usedAsMastery: 'true', proposalEditNum: '0', permissions: { edit: { has: 'false', reason: 'You don't have domain permission to edit this page' }, proposeEdit: { has: 'true', reason: '' }, delete: { has: 'false', reason: 'You don't have domain permission to delete this page' }, comment: { has: 'false', reason: 'You can't comment in this domain because you are not a member' }, proposeComment: { has: 'true', reason: '' } }, summaries: {}, creatorIds: [ 'EliezerYudkowsky', 'AlexeiAndreev' ], childIds: [ 'paperclip_maximizer', 'instrumental', 'instrumental_pressure', 'convergent_strategies', 'not_more_paperclips' ], parentIds: [ 'advanced_agent_theory' ], commentIds: [], questionIds: [], tagIds: [ 'work_in_progress_meta_tag' ], relatedIds: [], markIds: [], explanations: [], learnMore: [ { id: '7756', parentId: 'instrumental_convergence', childId: 'instrumental_goals_equally_tractable', type: 'subject', creatorId: 'EliezerYudkowsky', createdAt: '2017-11-26 22:38:52', level: '2', isStrong: 'false', everPublished: 'true' } ], requirements: [], subjects: [], lenses: [], lensParentId: '', pathPages: [], learnMoreTaughtMap: {}, learnMoreCoveredMap: {}, learnMoreRequiredMap: {}, editHistory: {}, domainSubmissions: {}, answers: [], answerCount: '0', commentCount: '0', newCommentCount: '0', linkedMarkCount: '0', changeLogs: [ { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '22890', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '0', type: 'newTeacher', createdAt: '2017-11-26 22:38:53', auxPageId: 'instrumental_goals_equally_tractable', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '22437', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '20', type: 'newEdit', createdAt: '2017-04-10 06:12:52', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '22081', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '0', type: 'newParent', createdAt: '2017-02-17 21:24:15', auxPageId: 'advanced_agent_theory', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '22079', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '0', type: 'deleteParent', createdAt: '2017-02-17 21:24:00', auxPageId: 'ai_alignment', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '21978', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '19', type: 'newEdit', createdAt: '2017-02-08 18:38:56', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14602', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '18', type: 'newEdit', createdAt: '2016-06-26 22:16:16', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14599', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '17', type: 'newEdit', createdAt: '2016-06-26 20:04:01', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14598', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '16', type: 'newEdit', createdAt: '2016-06-26 19:43:20', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14272', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '15', type: 'newEdit', createdAt: '2016-06-21 19:15:21', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14271', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '14', type: 'newEdit', createdAt: '2016-06-21 19:05:52', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14270', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '13', type: 'newEdit', createdAt: '2016-06-21 19:03:47', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14234', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '12', type: 'newEdit', createdAt: '2016-06-21 01:40:37', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14227', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '11', type: 'newEdit', createdAt: '2016-06-21 01:08:12', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14226', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '10', type: 'newEdit', createdAt: '2016-06-21 01:05:54', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14224', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '9', type: 'newEdit', createdAt: '2016-06-21 01:00:53', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14217', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '8', type: 'newEdit', createdAt: '2016-06-21 00:30:08', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14166', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '7', type: 'newEdit', createdAt: '2016-06-20 22:25:45', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14165', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '6', type: 'newEdit', createdAt: '2016-06-20 22:21:27', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14122', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '5', type: 'newEdit', createdAt: '2016-06-20 19:10:20', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14068', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '4', type: 'newEdit', createdAt: '2016-06-20 05:54:08', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '14067', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '0', type: 'deleteTag', createdAt: '2016-06-20 05:48:45', auxPageId: 'stub_meta_tag', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '11046', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '2', type: 'newChild', createdAt: '2016-05-25 22:33:44', auxPageId: 'not_more_paperclips', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '3797', pageId: 'instrumental_convergence', userId: 'AlexeiAndreev', edit: '0', type: 'newAlias', createdAt: '2015-12-16 00:04:53', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '3798', pageId: 'instrumental_convergence', userId: 'AlexeiAndreev', edit: '2', type: 'newEdit', createdAt: '2015-12-16 00:04:53', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '1101', pageId: 'instrumental_convergence', userId: 'AlexeiAndreev', edit: '1', type: 'newUsedAsTag', createdAt: '2015-10-28 03:47:09', auxPageId: 'stub_meta_tag', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '1128', pageId: 'instrumental_convergence', userId: 'AlexeiAndreev', edit: '1', type: 'newUsedAsTag', createdAt: '2015-10-28 03:47:09', auxPageId: 'work_in_progress_meta_tag', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '694', pageId: 'instrumental_convergence', userId: 'AlexeiAndreev', edit: '1', type: 'newChild', createdAt: '2015-10-28 03:46:58', auxPageId: 'instrumental_pressure', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '695', pageId: 'instrumental_convergence', userId: 'AlexeiAndreev', edit: '1', type: 'newChild', createdAt: '2015-10-28 03:46:58', auxPageId: 'instrumental', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '696', pageId: 'instrumental_convergence', userId: 'AlexeiAndreev', edit: '1', type: 'newChild', createdAt: '2015-10-28 03:46:58', auxPageId: 'paperclip_maximizer', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '369', pageId: 'instrumental_convergence', userId: 'AlexeiAndreev', edit: '1', type: 'newParent', createdAt: '2015-10-28 03:46:51', auxPageId: 'ai_alignment', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '1698', pageId: 'instrumental_convergence', userId: 'EliezerYudkowsky', edit: '1', type: 'newEdit', createdAt: '2015-07-16 02:02:55', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' } ], feedSubmissions: [], searchStrings: {}, hasChildren: 'true', hasParents: 'true', redAliases: {}, improvementTagIds: [], nonMetaTagIds: [], todos: [], slowDownMap: 'null', speedUpMap: 'null', arcPageIds: 'null', contentRequests: {} }