Informed oversight

{
  localUrl: '../page/informed_oversight.html',
  arbitalUrl: 'https://arbital.com/p/informed_oversight',
  rawJsonUrl: '../raw/2sc.json',
  likeableId: '1708',
  likeableType: 'page',
  myLikeValue: '0',
  likeCount: '1',
  dislikeCount: '0',
  likeScore: '1',
  individualLikes: [
    'PatrickLaVictoir'
  ],
  pageId: 'informed_oversight',
  edit: '7',
  editSummary: '',
  prevEdit: '6',
  currentEdit: '7',
  wasPublished: 'true',
  type: 'wiki',
  title: 'Informed oversight',
  clickbait: 'Incentivize a reinforcement learner that's less smart than you to accomplish some task',
  textLength: '3655',
  alias: 'informed_oversight',
  externalUrl: '',
  sortChildrenBy: 'likes',
  hasVote: 'false',
  voteType: '',
  votesAnonymous: 'false',
  editCreatorId: 'JessicaTaylor',
  editCreatedAt: '2016-03-24 01:50:49',
  pageCreatorId: 'JessicaTaylor',
  pageCreatedAt: '2016-03-23 23:50:22',
  seeDomainId: '0',
  editDomainId: 'EliezerYudkowsky',
  submitToDomainId: '0',
  isAutosave: 'false',
  isSnapshot: 'false',
  isLiveEdit: 'true',
  isMinorEdit: 'false',
  indirectTeacher: 'false',
  todoCount: '0',
  isEditorComment: 'false',
  isApprovedComment: 'true',
  isResolved: 'false',
  snapshotText: '',
  anchorContext: '',
  anchorText: '',
  anchorOffset: '0',
  mergedInto: '',
  isDeleted: 'false',
  viewCount: '121',
  text: 'It can be difficult for one agent to oversee another agent that is maximizing a reward controlled by the first agent, even if the first agent is smarter than the second agent.  This is a serious obstacle to [1t7] and other approaches based on reinforcement learning.\n\nMore formally, suppose a strong agent B with a utility function U is overseeing a weak agent A.  Agent A produces an action y and gets a reward B(y) determined by agent B.  B wants to give rewards such that agent A is incentivized to make its best effort at maximizing U(a).  \n\nIf B(y) is an [6s efficient] estimate of U(y) relative to A, then indeed maximizing B(y) is equivalent to maximizing U(y) from A's perspective.  Unfortunately, it is difficult for this to be the case even when B is smarter than A, because A may think thoughts B does not in the course of selecting y.  Here are a couple concrete examples (each having the flavor of "easy to generate a solution, hard to check a solution"):\n\n* Suppose A is outputting a partial Sudoku solutions.  U assigns a high value to partial Sudoku solutions consistent with some solution to the entire Sudoku problem.  A comes up with some solution to the Sudoku puzzle and outputs part of this solution.  B cannot tell that this partial solution is actually part of a full solution, even though B is smarter than A.  (B is able to come up with its own solutions but most aren't consistent with A's partial solution).\n* Suppose U assigns a high value to cat pictures that don't contain steganographic messages.  B has no way of distinguishing cat pictures that contain steganographic messages from ones that don't.  Thus, B has to give all cat pictures the same score, regardless of whether the pictures contain steganographic messages.  This either fails to reward A for producing cat pictures not containing steganographic messages, or rewards A for producing cat pictures containing steganographic messages.\n\nHere are a couple analogous examples that seem easier to analyze:\n\n1. Let O be a one-way function (e.g. SHA256).  U assigns a high value to binary strings y equal to O(111x) for some x.  B cannot tell whether a particular string y is equal to O(111x) for some x (even if B is smarter than A), so B has no good way of giving rewards to A.  \n2. Same as the previous problem, except U assigns a high value to binary strings y that are _not_ equal to O(111x) for any short string x.  A's intended strategy is to generate a uniformly random string (or generate a string in some other way that doesn't introduce a bias towards generating strings of the form O(111x)), but B has no way of knowing that A's string was uniformly random (rather than equal to O(111x) for a uniformly random x).\n\n\nOne attempted solution is for agent B to see A's thought processes in addition to the action y.  Then, agent B can use all the information A has, so that B(y) is an efficient estimate of U(y) from the perspective of A.  However, this requires B to understand something about A's mental architecture.  It would be good to have a more generic solution that requires only weak assumptions about A's mental architecture.\n\nAn attempted solution to problem 2 is [for B to incentivize A to generate its action in a provably random manner](https://agentfoundations.org/item?id=700).  However, this doesn't appear to generalize well to problems significantly different from 2.\n\n# Further reading\n[Paul Christiano on adequate oversight](https://medium.com/ai-control/adequate-oversight-25fadf1edce9#.bv5lq13fw)\n\n[Paul Christiano on the informed oversight problem](https://medium.com/ai-control/the-informed-oversight-problem-1b51b4f66b35#.o82tjmhll)',
  metaText: '',
  isTextLoaded: 'true',
  isSubscribedToDiscussion: 'false',
  isSubscribedToUser: 'false',
  isSubscribedAsMaintainer: 'false',
  discussionSubscriberCount: '1',
  maintainerCount: '1',
  userSubscriberCount: '0',
  lastVisit: '',
  hasDraft: 'false',
  votes: [],
  voteSummary: 'null',
  muVoteSummary: '0',
  voteScaling: '0',
  currentUserVote: '-2',
  voteCount: '0',
  lockedVoteType: '',
  maxEditEver: '0',
  redLinkCount: '0',
  lockedBy: '',
  lockedUntil: '',
  nextPageId: '',
  prevPageId: '',
  usedAsMastery: 'false',
  proposalEditNum: '0',
  permissions: {
    edit: {
      has: 'false',
      reason: 'You don't have domain permission to edit this page'
    },
    proposeEdit: {
      has: 'true',
      reason: ''
    },
    delete: {
      has: 'false',
      reason: 'You don't have domain permission to delete this page'
    },
    comment: {
      has: 'false',
      reason: 'You can't comment in this domain because you are not a member'
    },
    proposeComment: {
      has: 'true',
      reason: ''
    }
  },
  summaries: {},
  creatorIds: [
    'JessicaTaylor'
  ],
  childIds: [],
  parentIds: [
    'ai_alignment'
  ],
  commentIds: [
    '2w5'
  ],
  questionIds: [],
  tagIds: [
    'taskagi_open_problems'
  ],
  relatedIds: [],
  markIds: [],
  explanations: [],
  learnMore: [],
  requirements: [
    {
      id: '2670',
      parentId: 'efficiency',
      childId: 'informed_oversight',
      type: 'requirement',
      creatorId: 'AlexeiAndreev',
      createdAt: '2016-06-17 21:58:56',
      level: '1',
      isStrong: 'false',
      everPublished: 'true'
    }
  ],
  subjects: [],
  lenses: [],
  lensParentId: '',
  pathPages: [],
  learnMoreTaughtMap: {},
  learnMoreCoveredMap: {},
  learnMoreRequiredMap: {},
  editHistory: {},
  domainSubmissions: {},
  answers: [],
  answerCount: '0',
  commentCount: '0',
  newCommentCount: '0',
  linkedMarkCount: '0',
  changeLogs: [
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9116',
      pageId: 'informed_oversight',
      userId: 'JessicaTaylor',
      edit: '0',
      type: 'deleteParent',
      createdAt: '2016-03-27 06:00:29',
      auxPageId: 'taskagi_open_problems',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9031',
      pageId: 'informed_oversight',
      userId: 'JessicaTaylor',
      edit: '0',
      type: 'deleteParent',
      createdAt: '2016-03-24 04:27:01',
      auxPageId: 'approval_directed_agents',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9020',
      pageId: 'informed_oversight',
      userId: 'JessicaTaylor',
      edit: '7',
      type: 'newEdit',
      createdAt: '2016-03-24 01:50:49',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9019',
      pageId: 'informed_oversight',
      userId: 'JessicaTaylor',
      edit: '6',
      type: 'newEdit',
      createdAt: '2016-03-24 01:31:41',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9006',
      pageId: 'informed_oversight',
      userId: 'JessicaTaylor',
      edit: '5',
      type: 'newEdit',
      createdAt: '2016-03-24 00:44:07',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '9005',
      pageId: 'informed_oversight',
      userId: 'JessicaTaylor',
      edit: '4',
      type: 'newParent',
      createdAt: '2016-03-24 00:36:03',
      auxPageId: 'taskagi_open_problems',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8997',
      pageId: 'informed_oversight',
      userId: 'JessicaTaylor',
      edit: '4',
      type: 'newRequirement',
      createdAt: '2016-03-24 00:07:33',
      auxPageId: 'efficiency',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8995',
      pageId: 'informed_oversight',
      userId: 'JessicaTaylor',
      edit: '4',
      type: 'newParent',
      createdAt: '2016-03-24 00:07:23',
      auxPageId: 'approval_directed_agents',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8993',
      pageId: 'informed_oversight',
      userId: 'JessicaTaylor',
      edit: '4',
      type: 'newEdit',
      createdAt: '2016-03-24 00:03:25',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8992',
      pageId: 'informed_oversight',
      userId: 'JessicaTaylor',
      edit: '3',
      type: 'newEdit',
      createdAt: '2016-03-24 00:02:34',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8991',
      pageId: 'informed_oversight',
      userId: 'JessicaTaylor',
      edit: '0',
      type: 'newAlias',
      createdAt: '2016-03-24 00:00:14',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8990',
      pageId: 'informed_oversight',
      userId: 'JessicaTaylor',
      edit: '2',
      type: 'newEdit',
      createdAt: '2016-03-23 23:55:22',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8989',
      pageId: 'informed_oversight',
      userId: 'JessicaTaylor',
      edit: '1',
      type: 'newEdit',
      createdAt: '2016-03-23 23:50:22',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    }
  ],
  feedSubmissions: [],
  searchStrings: {},
  hasChildren: 'false',
  hasParents: 'true',
  redAliases: {},
  improvementTagIds: [],
  nonMetaTagIds: [],
  todos: [],
  slowDownMap: 'null',
  speedUpMap: 'null',
  arcPageIds: 'null',
  contentRequests: {}
}