Directing, vs. limiting, vs. opposing

{
  localUrl: '../page/direct_limit_oppose.html',
  arbitalUrl: 'https://arbital.com/p/direct_limit_oppose',
  rawJsonUrl: '../raw/7fx.json',
  likeableId: '0',
  likeableType: 'page',
  myLikeValue: '0',
  likeCount: '0',
  dislikeCount: '0',
  likeScore: '0',
  individualLikes: [],
  pageId: 'direct_limit_oppose',
  edit: '7',
  editSummary: '',
  prevEdit: '6',
  currentEdit: '7',
  wasPublished: 'true',
  type: 'wiki',
  title: 'Directing, vs. limiting, vs. opposing',
  clickbait: 'Getting the AI to compute the right action in a domain; versus getting the AI to not compute at all in an unsafe domain; versus trying to prevent the AI from acting successfully.  (Prefer 1 & 2.)',
  textLength: '6591',
  alias: 'direct_limit_oppose',
  externalUrl: '',
  sortChildrenBy: 'likes',
  hasVote: 'false',
  voteType: '',
  votesAnonymous: 'false',
  editCreatorId: 'EliezerYudkowsky',
  editCreatedAt: '2017-05-23 00:39:26',
  pageCreatorId: 'EliezerYudkowsky',
  pageCreatedAt: '2017-01-16 20:08:45',
  seeDomainId: '0',
  editDomainId: 'EliezerYudkowsky',
  submitToDomainId: '0',
  isAutosave: 'false',
  isSnapshot: 'false',
  isLiveEdit: 'true',
  isMinorEdit: 'false',
  indirectTeacher: 'false',
  todoCount: '0',
  isEditorComment: 'false',
  isApprovedComment: 'false',
  isResolved: 'false',
  snapshotText: '',
  anchorContext: '',
  anchorText: '',
  anchorOffset: '0',
  mergedInto: '',
  isDeleted: 'false',
  viewCount: '382',
  text: '[summary:  With respect to the [2v theory] of constructing [7g1 sufficiently advanced AIs] in ways that yield [3d9 good outcomes]:\n\n- **Direction** is when it's okay for the AI to compute plans, because the AI will end up choosing rightly;\n- **Limitation** is shaping an insufficiently-aligned AI so that it doesn't run a computation, if we expect that computation to produce bad results;\n- **Opposition**  is when we try to prevent the AI from successfully doing something we don't like, *assuming* the AI would act wrongly given the power to do so.\n\nFor example:\n\n- A successfully **directed** AI, given full Internet access, will do [3d9 beneficial] things given that Internet access;\n- A **limited AI**, suddenly given an Internet feed, will not do *anything* with that Internet access, because its programmers haven't [whitelisting whitelisted] this new domain as okay to think about;\n- **Opposition** is [airgapping](https://en.wikipedia.org/wiki/Air_gap_(networking)) the AI from the Internet and then putting the AI's processors inside a [Faraday cage](https://en.wikipedia.org/wiki/Faraday_cage), in the hope that even if the AI *wants* to get to the Internet, the AI won't be able to, say, [produce GSM cellphone signals by modulating its memory accesses](https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-guri-update.pdf).]\n\n'Directing' versus 'limiting' versus 'opposing' is a proposed conceptual distinction between 3 ways of getting [3d9 good] outcomes and avoiding [450 bad] outcomes, when running a [7g1 sufficiently advanced Artificial Intelligence]:\n\n- **Direction** means the AGI wants to do the right thing in a domain;\n- **Limitation** is the AGI not thinking or acting in places where it's not aligned;\n- **Opposition**  is when we try to prevent the AGI from successfully doing the wrong thing, *assuming* that it would act wrongly given the power to do so.\n\nFor example:\n\n- A successfully **directed** AI, given full Internet access, will do [3d9 beneficial] things rather than [450 detrimental] things using Internet access, because it wants to do good and understands sufficiently well which actions have good or bad outcomes;\n- A **limited AI**, suddenly given an Internet feed, will not do *anything* with that Internet access, because its programmers haven't [whitelisting whitelisted] this new domain for being okay to think about;\n- **Opposition** is [airgapping](https://en.wikipedia.org/wiki/Air_gap_(networking)) the AI from the Internet and then putting the AI's processors inside a [Faraday cage](https://en.wikipedia.org/wiki/Faraday_cage), in the hope that even if the AI *wants* to get to the Internet, the AI won't be able to [produce GSM cellphone signals by modulating its memory accesses](https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-guri-update.pdf).\n\nA fourth category not reducible to the other three might be **stabilizing,** e.g. numerical stability of floating-point algorithms, not having memory leaks in the code, etcetera.  These are issues that a sufficiently advanced AI would fix in itself automatically, but an [1fy insufficiently advanced] AI might not, which causes problems either if early errors introduce changes that are reflectively stable later, or if we are intending to run the AI in insufficiently-advanced mode.\n\n# Not running adversarial searches\n\nUnder the [agents_as_searches] view, we can see an agent as embodying computations that search out actions within a domain.  The [7g0 non-adversarial principle] then says that we should not be running a search which is looking for ways to hurt us.\n\nWe can achieve this desideratum in one of two ways:\n\n- The AI is sufficiently nice / [5s value-aligned] that it's okay for the AI to think about this domain / run searches here; whatever the AI is searching for, we don't mind if that search succeeds.\n- The AI is *not* aligned in this domain and therefore should not be *running searches* here--we didn't finish the alignment here, and therefore, we shouldn't be running this computation.\n\nIf both of these measures fail, then as a [strictly fallback second line of defense], we could try to make a bad search return a null result (the AI doesn't find any way to do the harmful thing).\n\nFor example:  The [7g0 non-adversarial principle] implies that if you build a [2xd shutdown button] for the AI that suspends the AI to disk when pressed, you must *also* ensure:\n\n- That the AI *wants* there to be a shutdown button;\n- That the AI *wants* the state of this shutdown switch to reflect the dynamic decisions of the human operators;\n- That the AI does not *want* to influence the operators to press the switch or not press it;\n- That the AI does *not* want to cause anything besides an orderly suspend-to-disk to happen after this switch is pressed.\n\n*Or:*\n\n- That the AI doesn't *want* to think about the shutdown button at all (e.g. because it's not inside a [whitelisting whitelisted cognitive domain]).\n\nThe first set of measures would be 'direction'--if those measures work, it's okay for the AI to think here because we think those thoughts are pointed in the right direction.  The second approach would be [7fx limitation]--we don't think the AI's computations are aligned inside this domain yet, so until then, we shouldn't be running computations there.\n\nAs a [2x4 final, strictly fallback] line of defense, you could put a big red lever in the computing center that shut off all the electricity even if the AI was failing to suspend to disk.  (But you're not relying on this working; it's just there as a maybe-it-might-work measure that might buy you a little more probable safety even if the primary lines of defense failed.)\n\n# Relation to other non-adversarial ideas\n\nThe direction/limitation/opposition distinction can help state other ideas from the [1cv].  For example:\n\nThe principle [2x4 niceness is the first line of defense] can be rephrased as follows:  When designing an AGI, we should imagine that all 'oppositional' measures are absent or failed, and think only about 'direction' and 'limitation'.  Any oppositional measures are then added on top of that, just in case.\n\nSimilarly, the [2x] says that when thinking through our primary design for alignment, we should think as if the AGI just *will* get Internet access on some random Tuesday.  This says that we should design an AGI that is limited by [whitelisting not wanting to act in newly opened domains without some programmer action], rather than relying on the AI to be *unable* to reach the Internet until we've finished aligning it.\n\n',
  metaText: '',
  isTextLoaded: 'true',
  isSubscribedToDiscussion: 'false',
  isSubscribedToUser: 'false',
  isSubscribedAsMaintainer: 'false',
  discussionSubscriberCount: '1',
  maintainerCount: '1',
  userSubscriberCount: '0',
  lastVisit: '',
  hasDraft: 'false',
  votes: [],
  voteSummary: [
    '0',
    '0',
    '0',
    '0',
    '0',
    '0',
    '0',
    '0',
    '0',
    '0'
  ],
  muVoteSummary: '0',
  voteScaling: '0',
  currentUserVote: '-2',
  voteCount: '0',
  lockedVoteType: '',
  maxEditEver: '0',
  redLinkCount: '0',
  lockedBy: '',
  lockedUntil: '',
  nextPageId: '',
  prevPageId: '',
  usedAsMastery: 'false',
  proposalEditNum: '0',
  permissions: {
    edit: {
      has: 'false',
      reason: 'You don't have domain permission to edit this page'
    },
    proposeEdit: {
      has: 'true',
      reason: ''
    },
    delete: {
      has: 'false',
      reason: 'You don't have domain permission to delete this page'
    },
    comment: {
      has: 'false',
      reason: 'You can't comment in this domain because you are not a member'
    },
    proposeComment: {
      has: 'true',
      reason: ''
    }
  },
  summaries: {},
  creatorIds: [
    'EliezerYudkowsky'
  ],
  childIds: [],
  parentIds: [
    'AI_safety_mindset',
    'nonadversarial'
  ],
  commentIds: [],
  questionIds: [],
  tagIds: [],
  relatedIds: [],
  markIds: [],
  explanations: [],
  learnMore: [],
  requirements: [],
  subjects: [],
  lenses: [],
  lensParentId: '',
  pathPages: [],
  learnMoreTaughtMap: {},
  learnMoreCoveredMap: {},
  learnMoreRequiredMap: {},
  editHistory: {},
  domainSubmissions: {},
  answers: [],
  answerCount: '0',
  commentCount: '0',
  newCommentCount: '0',
  linkedMarkCount: '0',
  changeLogs: [
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22562',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '7',
      type: 'newEdit',
      createdAt: '2017-05-23 00:39:38',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22344',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newAlias',
      createdAt: '2017-03-19 05:42:01',
      auxPageId: '',
      oldSettingsValue: 'align_limit_oppose',
      newSettingsValue: 'direct_limit_oppose'
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22345',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '6',
      type: 'newEdit',
      createdAt: '2017-03-19 05:42:01',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21761',
      pageId: 'direct_limit_oppose',
      userId: 'EricRogstad',
      edit: '0',
      type: 'newParent',
      createdAt: '2017-01-18 07:25:10',
      auxPageId: 'nonadversarial',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21757',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '5',
      type: 'newEdit',
      createdAt: '2017-01-18 05:32:56',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21756',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'deleteParent',
      createdAt: '2017-01-18 05:32:50',
      auxPageId: 'niceness_defense',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21754',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newParent',
      createdAt: '2017-01-18 05:32:45',
      auxPageId: 'niceness_defense',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21743',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '4',
      type: 'newEdit',
      createdAt: '2017-01-16 20:35:41',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21739',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newParent',
      createdAt: '2017-01-16 20:24:17',
      auxPageId: 'AI_safety_mindset',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21727',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '3',
      type: 'newEdit',
      createdAt: '2017-01-16 20:11:47',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21725',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'deleteChild',
      createdAt: '2017-01-16 20:11:39',
      auxPageId: 'omni_test',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21723',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'deleteChild',
      createdAt: '2017-01-16 20:11:38',
      auxPageId: 'niceness_defense',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21722',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'deleteParent',
      createdAt: '2017-01-16 20:11:16',
      auxPageId: 'AI_safety_mindset',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21720',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '2',
      type: 'newEdit',
      createdAt: '2017-01-16 20:09:09',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21715',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newParent',
      createdAt: '2017-01-16 20:08:47',
      auxPageId: 'AI_safety_mindset',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21716',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newChild',
      createdAt: '2017-01-16 20:08:47',
      auxPageId: 'niceness_defense',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21718',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newChild',
      createdAt: '2017-01-16 20:08:47',
      auxPageId: 'omni_test',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21713',
      pageId: 'direct_limit_oppose',
      userId: 'EliezerYudkowsky',
      edit: '1',
      type: 'newEdit',
      createdAt: '2017-01-16 20:08:45',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    }
  ],
  feedSubmissions: [],
  searchStrings: {},
  hasChildren: 'false',
  hasParents: 'true',
  redAliases: {},
  improvementTagIds: [],
  nonMetaTagIds: [],
  todos: [],
  slowDownMap: 'null',
  speedUpMap: 'null',
  arcPageIds: 'null',
  contentRequests: {}
}