{
  localUrl: '../page/nonadversarial.html',
  arbitalUrl: 'https://arbital.com/p/nonadversarial',
  rawJsonUrl: '../raw/7g0.json',
  likeableId: '3955',
  likeableType: 'page',
  myLikeValue: '0',
  likeCount: '1',
  dislikeCount: '0',
  likeScore: '1',
  individualLikes: [
    'AndrewMcKnight'
  ],
  pageId: 'nonadversarial',
  edit: '9',
  editSummary: '',
  prevEdit: '8',
  currentEdit: '9',
  wasPublished: 'true',
  type: 'wiki',
  title: 'Non-adversarial principle',
  clickbait: 'At no point in constructing an Artificial General Intelligence should we construct a computation that tries to hurt us, and then try to stop it from hurting us.',
  textLength: '10174',
  alias: 'nonadversarial',
  externalUrl: '',
  sortChildrenBy: 'likes',
  hasVote: 'false',
  voteType: '',
  votesAnonymous: 'false',
  editCreatorId: 'EliezerYudkowsky',
  editCreatedAt: '2017-01-22 07:06:13',
  pageCreatorId: 'EliezerYudkowsky',
  pageCreatedAt: '2017-01-16 18:51:08',
  seeDomainId: '0',
  editDomainId: 'EliezerYudkowsky',
  submitToDomainId: '0',
  isAutosave: 'false',
  isSnapshot: 'false',
  isLiveEdit: 'true',
  isMinorEdit: 'false',
  indirectTeacher: 'false',
  todoCount: '0',
  isEditorComment: 'false',
  isApprovedComment: 'false',
  isResolved: 'false',
  snapshotText: '',
  anchorContext: '',
  anchorText: '',
  anchorOffset: '0',
  mergedInto: '',
  isDeleted: 'false',
  viewCount: '595',
  text: '[summary:  The 'non-adversarial principle' states:  *By design, the human operators and the AGI should never come into conflict.*\n\nSince every event inside an AI is ultimately the causal result of choices by the human programmers, we should not choose so as to run computations that are searching for a way to hurt us.  At the point the AI is even *trying* to outwit us, we've already screwed up the design; we've made a foolish use of computing power.\n\nE.g., according to this principle, if the AI's server center has [2xd a switch that shuts off the electricity], our first thought should not be, "How do we have guards with guns defending this off-switch so the AI can't destroy it?"  Our first thought should be, "How do we make sure the AI *wants* this off-switch to exist?"]\n\nThe 'Non-Adversarial Principle' is a proposed design rule for [7g1 sufficiently advanced Artificial Intelligence] stating that:\n\n*By design, the human operators and the AGI should never come into conflict.*\n\nSpecial cases of this principle include [2x4] and [ai_wants_security The AI wants your safety measures].\n\nAccording to this principle, if the AI has an off-switch, our first thought should not be, "How do we have guards with guns defending this off-switch so the AI can't destroy it?" but "How do we make sure the AI *wants* this off-switch to exist?"\n\nIf we think the AI is not ready to act on the Internet, our first thought should not be "How do we [airgapping airgap] the AI's computers from the Internet?" but "How do we construct an AI that wouldn't *try* to do anything on the Internet even if it got access?"  Afterwards we may go ahead and still not connect the AI to the Internet, but only as a fallback measure.  Like the containment shell of a nuclear power plant, the *plan* shouldn't call for the fallback measure to ever become necessary.  E.g., nuclear power plants have containment shells in case the core melts down.  But this is not because we're planning to have the core melt down on Tuesday and have that be okay because there's a containment shell.\n\n# Why run code that does the wrong thing?\n\nUltimately, every event inside an AI--every RAM access and CPU instruction--is an event set in motion by our own design.  Even if the AI is modifying its own code, the modified code is a causal outcome of the original code (or the code that code wrote etcetera).  Everything that happens inside the computer is, in some sense, our fault and our choice.  Given that responsibility, we should not be constructing a computation that is *trying* to hurt us.  At the point that computation is running, we've already done something foolish--willfully shot ourselves in the foot.  Even if the AI doesn't find any way to do the bad thing, we are, at the very least, wasting computing power.\n\nNo aspect of the AI's design should ever put us in an adversarial position vis-a-vis the AI, or pit the AI's wits against our wits.  If a computation starts *looking* for a way to outwit us, then the design and methodology has *already* failed.  We just shouldn't be putting an AI in a box and then having the AI search for ways to get out of the box.  If you're building a toaster, you don't build one element that heats the toast and then add a tiny refrigerator that cools down the toast.\n\n# Not running searches for harmful strategies\n\nUnder the [agents_as_searches] viewpoint, we can see an AI as embodying computations that search out strategies within a domain.  E.g., an AI carrying out the "[one_damn_tomato put one damn strawberry on a damn plate]" task is searching the space of imaginable strategies for a strategy such that the consequence of executing that strategy is one strawberry being on the plate.\n\nThe non-adversarial principle then says that we should at no point be running a computation which is *searching* for ways to hurt us.\n\nSuppose you don't want the AI to go on the Internet.  If your first thought is [airgapping airgapping] the AI's hardware from the Internet or otherwise [6z boxing the AI], then what you're trying to do is ensure that *a computation searching for strategies to reach the Internet returns no result.*\n\nIf the primary plan is "the AI searches, but doesn't find anything", then we need to worry that the AI's search [9f considers some class of strategies we didn't imagine].  We are pitting our own wits against a potentially [9f weird] or [41l superhuman] search.  Even if we did successfully rule out all possibilities for escape, it would be hard for us to ever be confident that we had done so.  It would be very hard to rule out the *conceptual* possibility of some entirely unexpected side-channel attack like the AI [producing GSM cellphone signals by cleverly modulating its memory accesses](https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-guri-update.pdf), or the AI [aibox_experiment persuading its human operators to let it out of the box].  (Except that neither of those two particular possibilities would be *unexpected;* any specific strategy we write down will be the kind of possibility that humans consider.)\n\nEven if the AI's search does come up empty, you were tempting fate and wasting computing power by instantiating that search in the first place.\n\n# Oppositional measures are fallbacks\n\n*After* we ensure that the escape-search computation is not running in the first place, we can think about fallback [7fx oppositional] lines of defense; e.g. putting the AI's processors inside a [Faraday cage](https://en.wikipedia.org/wiki/Faraday_cage).\n\nBut the *plan* is not that the AI tries to escape and we successfully prevent the escape.  If we imagine the AI trying to escape, we ought to [rule_of_surprise feel very surprised in our imagination].  If the AI actually does try to escape, we should be crying 'What?' and tracking down which strong-seeming assumption has failed.\n\nNo matter what clever design we use or how strong it seems, we'd probably *still* be wise to [airgapping airgap] an AI under development, not say "Oh it's supposed to be safe!" and connect it to the Internet and saunter on.  But that doesn't mean we *rely* on the airgap.  It doesn't mean the plan calls for the airgap to be necessary. \n\nNuclear power plants have concrete containment units in case the core melts down; but the *design* is not that they melt down.  The master plan doesn't say "And then on Tuesday the core melts down, but that's fine because of the containment unit."  By design, that enormous concrete shell isn't supposed to actually ever become necessary.  And then we build it anyway, because the best-laid plans etcetera.\n\nSimilarly, when designing an AI, we should pretend that the airgap doesn't exist or that the AI will [2x suddenly get Internet access anyway] on Tuesday; our *primary* thought should be to design AI that doesn't need an airgap to be safe.  And *then* we add the airgap, making sure that we're not thinking the equivalent of "Oh, it doesn't *really* matter if the core melts down, because we've got a containment structure there anyway."\n\n# Challenges in implementing non-adversarialism\n\nThe main difficulties foreseen so far for implementing the non-adversarial principle, tend to center around [10g] plus [42] behavior.\n\nFor example, if you build a [2xd shutdown button] for a [6w Task AGI] that suspends the AI to disk when pressed, the nonadversarial principle implies you must also ensure:\n\n- That the AI *wants* there to be a shutdown button;\n- That the AI *wants* to be suspended to disk after this button is pressed;\n- That the AI *wants* the state of this shutdown button to reflect the dynamic decisions of the human operators;\n- That the AI does not *want* to influence the operators to decide to not press the switch, or to press it;\n- That the AI does *not* want anything *besides* an orderly suspend-to-disk to happen, or not happen, after this button is pressed.\n\n*Or:*\n\n- The AI does not think about or make plans involving the shutdown button, e.g. because that domain was not [whitelisting whitelisted] for cognition.\n- None of the AI's other models end up reflecting the existence of the shutdown button or none of its other plans end up taking into account that part of the model.\n\nThe difficulties here center around [7g2 "You can't fetch the coffee if you're dead"].  This reasoning is very general, so [48 even if we try to make it not apply at one point, it tends to pop up somewhere else]:\n\n- If you naively try to add in a special-case clause to the utility function for wanting a shutdown button to exist, the AI wants the shutdown button to not be pressed.\n- If you successfully add a special case saying that the AI wants the button to be pressed if the humans want that button to be pressed, the AI wants the humans to not want to press the button.\n- If you naively try to add in a special clause for the AI wanting to shut down after the button is pressed, the AI wants to create a [environmental_subagent subagent] to make sure the coffee gets pressed anyway.\n- If you try to make an AI that [1g4 doesn't think about the shutdown button] or model it at all, this seems potentially difficult because in reality the best hypothesis to explain the world *does* contain a shutdown button.  A general search for good hypotheses may tend to create cognitive tokens that represent the shutdown button, and it's not clear (yet) how this could in general be prevented by trying to divide the world into domains.\n\nMore generally: by default a lot of *high-level* searches we do want to run, [10g have *subsearches* we'd prefer *not* to run].  If we run an agent that searches *in general* for ways to fetch the coffee, that search would, by default and if smart enough, also search for ways to prevent itself from being shut down.\n\nHow exactly to implement the non-adversarial principle is thus a major open problem.  We may need to be more clever about shaping which computations give rise to which other computations than the default "Search for any action in any domain which achieves X."\n\n# See also\n\n- [2x4]\n- [2x The omnipotence/omniscience test]\n- [nonadversarial_safety The AI should not want to defeat your safety measures]\n- [7fx]',
  metaText: '',
  isTextLoaded: 'true',
  isSubscribedToDiscussion: 'false',
  isSubscribedToUser: 'false',
  isSubscribedAsMaintainer: 'false',
  discussionSubscriberCount: '1',
  maintainerCount: '1',
  userSubscriberCount: '0',
  lastVisit: '',
  hasDraft: 'false',
  votes: [],
  voteSummary: 'null',
  muVoteSummary: '0',
  voteScaling: '0',
  currentUserVote: '-2',
  voteCount: '0',
  lockedVoteType: '',
  maxEditEver: '0',
  redLinkCount: '0',
  lockedBy: '',
  lockedUntil: '',
  nextPageId: '',
  prevPageId: '',
  usedAsMastery: 'false',
  proposalEditNum: '10',
  permissions: {
    edit: {
      has: 'false',
      reason: 'You don't have domain permission to edit this page'
    },
    proposeEdit: {
      has: 'true',
      reason: ''
    },
    delete: {
      has: 'false',
      reason: 'You don't have domain permission to delete this page'
    },
    comment: {
      has: 'false',
      reason: 'You can't comment in this domain because you are not a member'
    },
    proposeComment: {
      has: 'true',
      reason: ''
    }
  },
  summaries: {},
  creatorIds: [
    'EliezerYudkowsky',
    'AnanyaAloke'
  ],
  childIds: [
    'omni_test',
    'niceness_defense',
    'direct_limit_oppose',
    'nonadversarial_safety',
    'cognitive_alignment'
  ],
  parentIds: [
    'alignment_principle'
  ],
  commentIds: [],
  questionIds: [],
  tagIds: [
    'taskagi_open_problems',
    'value_alignment_open_problem'
  ],
  relatedIds: [
    'corrigibility'
  ],
  markIds: [],
  explanations: [],
  learnMore: [],
  requirements: [],
  subjects: [],
  lenses: [],
  lensParentId: '',
  pathPages: [],
  learnMoreTaughtMap: {},
  learnMoreCoveredMap: {},
  learnMoreRequiredMap: {},
  editHistory: {},
  domainSubmissions: {},
  answers: [],
  answerCount: '0',
  commentCount: '0',
  newCommentCount: '0',
  linkedMarkCount: '0',
  changeLogs: [
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '23167',
      pageId: 'nonadversarial',
      userId: 'AnanyaAloke',
      edit: '10',
      type: 'newEditProposal',
      createdAt: '2019-03-05 09:22:49',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22049',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newParent',
      createdAt: '2017-02-16 18:54:54',
      auxPageId: 'alignment_principle',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22047',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'deleteParent',
      createdAt: '2017-02-16 18:54:49',
      auxPageId: 'AI_safety_mindset',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22007',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newChild',
      createdAt: '2017-02-13 18:55:59',
      auxPageId: 'cognitive_alignment',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22002',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newChild',
      createdAt: '2017-02-13 18:41:50',
      auxPageId: 'nonadversarial_safety',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21819',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '9',
      type: 'newEdit',
      createdAt: '2017-01-22 07:06:13',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21818',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '8',
      type: 'newEdit',
      createdAt: '2017-01-22 07:04:23',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21817',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '7',
      type: 'newEdit',
      createdAt: '2017-01-22 06:49:52',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21760',
      pageId: 'nonadversarial',
      userId: 'EricRogstad',
      edit: '0',
      type: 'newChild',
      createdAt: '2017-01-18 07:25:10',
      auxPageId: 'direct_limit_oppose',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21759',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '6',
      type: 'newEdit',
      createdAt: '2017-01-18 05:57:24',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21758',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '5',
      type: 'newEdit',
      createdAt: '2017-01-18 05:56:54',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21737',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '4',
      type: 'newEdit',
      createdAt: '2017-01-16 20:23:47',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21735',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newChild',
      createdAt: '2017-01-16 20:23:37',
      auxPageId: 'omni_test',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21733',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newChild',
      createdAt: '2017-01-16 20:23:03',
      auxPageId: 'niceness_defense',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21728',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '3',
      type: 'newEdit',
      createdAt: '2017-01-16 20:12:38',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21710',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '2',
      type: 'newEdit',
      createdAt: '2017-01-16 19:58:50',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21707',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newTag',
      createdAt: '2017-01-16 19:53:22',
      auxPageId: 'taskagi_open_problems',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21706',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newTag',
      createdAt: '2017-01-16 19:53:17',
      auxPageId: 'value_alignment_open_problem',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21705',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'deleteTag',
      createdAt: '2017-01-16 19:53:13',
      auxPageId: 'niceness_defense',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21699',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newTag',
      createdAt: '2017-01-16 18:51:10',
      auxPageId: 'niceness_defense',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21698',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newParent',
      createdAt: '2017-01-16 18:51:09',
      auxPageId: 'AI_safety_mindset',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21696',
      pageId: 'nonadversarial',
      userId: 'EliezerYudkowsky',
      edit: '1',
      type: 'newEdit',
      createdAt: '2017-01-16 18:51:08',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    }
  ],
  feedSubmissions: [],
  searchStrings: {},
  hasChildren: 'true',
  hasParents: 'true',
  redAliases: {},
  improvementTagIds: [],
  nonMetaTagIds: [],
  todos: [],
  slowDownMap: 'null',
  speedUpMap: 'null',
  arcPageIds: 'null',
  contentRequests: {}
}