Interruptibility

{
  localUrl: '../page/interruptibility.html',
  arbitalUrl: 'https://arbital.com/p/interruptibility',
  rawJsonUrl: '../raw/7tc.json',
  likeableId: '0',
  likeableType: 'page',
  myLikeValue: '0',
  likeCount: '0',
  dislikeCount: '0',
  likeScore: '0',
  individualLikes: [],
  pageId: 'interruptibility',
  edit: '2',
  editSummary: '',
  prevEdit: '1',
  currentEdit: '2',
  wasPublished: 'true',
  type: 'wiki',
  title: 'Interruptibility',
  clickbait: 'A subproblem of corrigibility under the machine learning paradigm: when the agent is interrupted, it must not learn to prevent future interruptions.',
  textLength: '2792',
  alias: 'interruptibility',
  externalUrl: '',
  sortChildrenBy: 'likes',
  hasVote: 'false',
  voteType: '',
  votesAnonymous: 'false',
  editCreatorId: 'EliezerYudkowsky',
  editCreatedAt: '2017-02-13 18:09:20',
  pageCreatorId: 'EliezerYudkowsky',
  pageCreatedAt: '2017-02-13 18:07:34',
  seeDomainId: '0',
  editDomainId: 'EliezerYudkowsky',
  submitToDomainId: '0',
  isAutosave: 'false',
  isSnapshot: 'false',
  isLiveEdit: 'true',
  isMinorEdit: 'false',
  indirectTeacher: 'false',
  todoCount: '3',
  isEditorComment: 'false',
  isApprovedComment: 'false',
  isResolved: 'false',
  snapshotText: '',
  anchorContext: '',
  anchorText: '',
  anchorOffset: '0',
  mergedInto: '',
  isDeleted: 'false',
  viewCount: '90',
  text: '"Interruptibility" is a subproblem of [45 corrigibility] (creating an advanced agent that allows us, its creators, to 'correct' what *we* see as our mistakes in constructing it), as seen from a machine learning paradigm.  In particular, "interruptibility" says, "If you do interrupt the operation of an agent, it must not learn to avoid future interruptions."\n\nThe groundbreaking paper on interruptibility, "[Safely Interruptible Agents](https://www.fhi.ox.ac.uk/wp-content/uploads/Interruptibility.pdf)", was published by [ Laurent Orseau] and [ Stuart Armstrong].  This says, roughly, that to avoid a model-based reinforcement-learning algorithm from learning to avoid interruption, we should, after any interruption, propagate internal weight updates as if the agent had received exactly its expected reward from before the interruption.  This approach was inspired by [ Stuart Armstrong]'s earlier idea of [-1b7].\n\nContrary to some uninformed media coverage, the above paper doesn't solve [2xd the general problem of getting an AI to not try to prevent itself from being switched off].  In particular, it doesn't cover the [2l advanced-safety] case of a [7g1 sufficiently intelligent AI] that is trying to [9h achieve particular future outcomes] and that [3nf realizes] it needs to [7g2 go on operating in order to achieve those outcomes].\n\nRather, if a non-general AI is operating by policy reinforcement - repeating policies that worked well last time, and avoiding policies that worked poorly last time, in some general sense of a network being trained - then 'interruptibility' is about making an algorithm that, *after* being interrupted, doesn't define this as a poor outcome to be avoided (nor a good outcome to be repeated).\n\nOne way of seeing that Interruptibility doesn't address the general-cognition form of the problem is that Interruptibility only changes what happens after an actual interruption.  So if a problem can arise from an AI foreseeing interruption in advance, before having ever actually been shut off, interruptibility won't address that (on the current paradigm).\n\nSimilarly, interruptibility would not be [2rb consistent under cognitive reflection]; a sufficiently advanced AI that knew about the existence of the interruptibility code would have no reason to want that code to go on existing.  (It's hard to even phrase that idea inside the reinforcement learning framework.)\n\nMetaphorically speaking, we could see the general notion of 'interruptibility' as the modern-day shadow of [45 corrigibility] problems for non-[42g generally-intelligent], non-[9h future-preferring], non-[1c1 reflective] machine learning algorithms.\n\nFor an example of ongoing work on the [2c advanced-agent] form of [-45], see the entry on Armstrong's original proposal of [1b7].',
  metaText: '',
  isTextLoaded: 'true',
  isSubscribedToDiscussion: 'false',
  isSubscribedToUser: 'false',
  isSubscribedAsMaintainer: 'false',
  discussionSubscriberCount: '1',
  maintainerCount: '1',
  userSubscriberCount: '0',
  lastVisit: '',
  hasDraft: 'false',
  votes: [],
  voteSummary: 'null',
  muVoteSummary: '0',
  voteScaling: '0',
  currentUserVote: '-2',
  voteCount: '0',
  lockedVoteType: '',
  maxEditEver: '0',
  redLinkCount: '0',
  lockedBy: '',
  lockedUntil: '',
  nextPageId: '',
  prevPageId: '',
  usedAsMastery: 'false',
  proposalEditNum: '0',
  permissions: {
    edit: {
      has: 'false',
      reason: 'You don't have domain permission to edit this page'
    },
    proposeEdit: {
      has: 'true',
      reason: ''
    },
    delete: {
      has: 'false',
      reason: 'You don't have domain permission to delete this page'
    },
    comment: {
      has: 'false',
      reason: 'You can't comment in this domain because you are not a member'
    },
    proposeComment: {
      has: 'true',
      reason: ''
    }
  },
  summaries: {},
  creatorIds: [
    'EliezerYudkowsky'
  ],
  childIds: [],
  parentIds: [
    'corrigibility'
  ],
  commentIds: [],
  questionIds: [],
  tagIds: [
    'c_class_meta_tag'
  ],
  relatedIds: [],
  markIds: [],
  explanations: [],
  learnMore: [],
  requirements: [],
  subjects: [],
  lenses: [],
  lensParentId: '',
  pathPages: [],
  learnMoreTaughtMap: {},
  learnMoreCoveredMap: {},
  learnMoreRequiredMap: {},
  editHistory: {},
  domainSubmissions: {},
  answers: [],
  answerCount: '0',
  commentCount: '0',
  newCommentCount: '0',
  linkedMarkCount: '0',
  changeLogs: [
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21999',
      pageId: 'interruptibility',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newEditGroup',
      createdAt: '2017-02-13 18:09:20',
      auxPageId: 'EliezerYudkowsky',
      oldSettingsValue: '123',
      newSettingsValue: '2'
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '22000',
      pageId: 'interruptibility',
      userId: 'EliezerYudkowsky',
      edit: '2',
      type: 'newEdit',
      createdAt: '2017-02-13 18:09:20',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21997',
      pageId: 'interruptibility',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newParent',
      createdAt: '2017-02-13 18:07:36',
      auxPageId: 'corrigibility',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21998',
      pageId: 'interruptibility',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'newTag',
      createdAt: '2017-02-13 18:07:36',
      auxPageId: 'c_class_meta_tag',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '21995',
      pageId: 'interruptibility',
      userId: 'EliezerYudkowsky',
      edit: '1',
      type: 'newEdit',
      createdAt: '2017-02-13 18:07:34',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    }
  ],
  feedSubmissions: [],
  searchStrings: {},
  hasChildren: 'false',
  hasParents: 'true',
  redAliases: {},
  improvementTagIds: [],
  nonMetaTagIds: [],
  todos: [],
  slowDownMap: 'null',
  speedUpMap: 'null',
  arcPageIds: 'null',
  contentRequests: {}
}