{ localUrl: '../page/informed_oversight.html', arbitalUrl: 'https://arbital.com/p/informed_oversight', rawJsonUrl: '../raw/2sc.json', likeableId: '1708', likeableType: 'page', myLikeValue: '0', likeCount: '1', dislikeCount: '0', likeScore: '1', individualLikes: [ 'PatrickLaVictoir' ], pageId: 'informed_oversight', edit: '7', editSummary: '', prevEdit: '6', currentEdit: '7', wasPublished: 'true', type: 'wiki', title: 'Informed oversight', clickbait: 'Incentivize a reinforcement learner that's less smart than you to accomplish some task', textLength: '3655', alias: 'informed_oversight', externalUrl: '', sortChildrenBy: 'likes', hasVote: 'false', voteType: '', votesAnonymous: 'false', editCreatorId: 'JessicaTaylor', editCreatedAt: '2016-03-24 01:50:49', pageCreatorId: 'JessicaTaylor', pageCreatedAt: '2016-03-23 23:50:22', seeDomainId: '0', editDomainId: 'EliezerYudkowsky', submitToDomainId: '0', isAutosave: 'false', isSnapshot: 'false', isLiveEdit: 'true', isMinorEdit: 'false', indirectTeacher: 'false', todoCount: '0', isEditorComment: 'false', isApprovedComment: 'true', isResolved: 'false', snapshotText: '', anchorContext: '', anchorText: '', anchorOffset: '0', mergedInto: '', isDeleted: 'false', viewCount: '121', text: 'It can be difficult for one agent to oversee another agent that is maximizing a reward controlled by the first agent, even if the first agent is smarter than the second agent. This is a serious obstacle to [1t7] and other approaches based on reinforcement learning.\n\nMore formally, suppose a strong agent B with a utility function U is overseeing a weak agent A. Agent A produces an action y and gets a reward B(y) determined by agent B. B wants to give rewards such that agent A is incentivized to make its best effort at maximizing U(a). \n\nIf B(y) is an [6s efficient] estimate of U(y) relative to A, then indeed maximizing B(y) is equivalent to maximizing U(y) from A's perspective. Unfortunately, it is difficult for this to be the case even when B is smarter than A, because A may think thoughts B does not in the course of selecting y. Here are a couple concrete examples (each having the flavor of "easy to generate a solution, hard to check a solution"):\n\n* Suppose A is outputting a partial Sudoku solutions. U assigns a high value to partial Sudoku solutions consistent with some solution to the entire Sudoku problem. A comes up with some solution to the Sudoku puzzle and outputs part of this solution. B cannot tell that this partial solution is actually part of a full solution, even though B is smarter than A. (B is able to come up with its own solutions but most aren't consistent with A's partial solution).\n* Suppose U assigns a high value to cat pictures that don't contain steganographic messages. B has no way of distinguishing cat pictures that contain steganographic messages from ones that don't. Thus, B has to give all cat pictures the same score, regardless of whether the pictures contain steganographic messages. This either fails to reward A for producing cat pictures not containing steganographic messages, or rewards A for producing cat pictures containing steganographic messages.\n\nHere are a couple analogous examples that seem easier to analyze:\n\n1. Let O be a one-way function (e.g. SHA256). U assigns a high value to binary strings y equal to O(111x) for some x. B cannot tell whether a particular string y is equal to O(111x) for some x (even if B is smarter than A), so B has no good way of giving rewards to A. \n2. Same as the previous problem, except U assigns a high value to binary strings y that are _not_ equal to O(111x) for any short string x. A's intended strategy is to generate a uniformly random string (or generate a string in some other way that doesn't introduce a bias towards generating strings of the form O(111x)), but B has no way of knowing that A's string was uniformly random (rather than equal to O(111x) for a uniformly random x).\n\n\nOne attempted solution is for agent B to see A's thought processes in addition to the action y. Then, agent B can use all the information A has, so that B(y) is an efficient estimate of U(y) from the perspective of A. However, this requires B to understand something about A's mental architecture. It would be good to have a more generic solution that requires only weak assumptions about A's mental architecture.\n\nAn attempted solution to problem 2 is [for B to incentivize A to generate its action in a provably random manner](https://agentfoundations.org/item?id=700). However, this doesn't appear to generalize well to problems significantly different from 2.\n\n# Further reading\n[Paul Christiano on adequate oversight](https://medium.com/ai-control/adequate-oversight-25fadf1edce9#.bv5lq13fw)\n\n[Paul Christiano on the informed oversight problem](https://medium.com/ai-control/the-informed-oversight-problem-1b51b4f66b35#.o82tjmhll)', metaText: '', isTextLoaded: 'true', isSubscribedToDiscussion: 'false', isSubscribedToUser: 'false', isSubscribedAsMaintainer: 'false', discussionSubscriberCount: '1', maintainerCount: '1', userSubscriberCount: '0', lastVisit: '', hasDraft: 'false', votes: [], voteSummary: 'null', muVoteSummary: '0', voteScaling: '0', currentUserVote: '-2', voteCount: '0', lockedVoteType: '', maxEditEver: '0', redLinkCount: '0', lockedBy: '', lockedUntil: '', nextPageId: '', prevPageId: '', usedAsMastery: 'false', proposalEditNum: '0', permissions: { edit: { has: 'false', reason: 'You don't have domain permission to edit this page' }, proposeEdit: { has: 'true', reason: '' }, delete: { has: 'false', reason: 'You don't have domain permission to delete this page' }, comment: { has: 'false', reason: 'You can't comment in this domain because you are not a member' }, proposeComment: { has: 'true', reason: '' } }, summaries: {}, creatorIds: [ 'JessicaTaylor' ], childIds: [], parentIds: [ 'ai_alignment' ], commentIds: [ '2w5' ], questionIds: [], tagIds: [ 'taskagi_open_problems' ], relatedIds: [], markIds: [], explanations: [], learnMore: [], requirements: [ { id: '2670', parentId: 'efficiency', childId: 'informed_oversight', type: 'requirement', creatorId: 'AlexeiAndreev', createdAt: '2016-06-17 21:58:56', level: '1', isStrong: 'false', everPublished: 'true' } ], subjects: [], lenses: [], lensParentId: '', pathPages: [], learnMoreTaughtMap: {}, learnMoreCoveredMap: {}, learnMoreRequiredMap: {}, editHistory: {}, domainSubmissions: {}, answers: [], answerCount: '0', commentCount: '0', newCommentCount: '0', linkedMarkCount: '0', changeLogs: [ { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '9116', pageId: 'informed_oversight', userId: 'JessicaTaylor', edit: '0', type: 'deleteParent', createdAt: '2016-03-27 06:00:29', auxPageId: 'taskagi_open_problems', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '9031', pageId: 'informed_oversight', userId: 'JessicaTaylor', edit: '0', type: 'deleteParent', createdAt: '2016-03-24 04:27:01', auxPageId: 'approval_directed_agents', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '9020', pageId: 'informed_oversight', userId: 'JessicaTaylor', edit: '7', type: 'newEdit', createdAt: '2016-03-24 01:50:49', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '9019', pageId: 'informed_oversight', userId: 'JessicaTaylor', edit: '6', type: 'newEdit', createdAt: '2016-03-24 01:31:41', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '9006', pageId: 'informed_oversight', userId: 'JessicaTaylor', edit: '5', type: 'newEdit', createdAt: '2016-03-24 00:44:07', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '9005', pageId: 'informed_oversight', userId: 'JessicaTaylor', edit: '4', type: 'newParent', createdAt: '2016-03-24 00:36:03', auxPageId: 'taskagi_open_problems', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '8997', pageId: 'informed_oversight', userId: 'JessicaTaylor', edit: '4', type: 'newRequirement', createdAt: '2016-03-24 00:07:33', auxPageId: 'efficiency', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '8995', pageId: 'informed_oversight', userId: 'JessicaTaylor', edit: '4', type: 'newParent', createdAt: '2016-03-24 00:07:23', auxPageId: 'approval_directed_agents', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '8993', pageId: 'informed_oversight', userId: 'JessicaTaylor', edit: '4', type: 'newEdit', createdAt: '2016-03-24 00:03:25', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '8992', pageId: 'informed_oversight', userId: 'JessicaTaylor', edit: '3', type: 'newEdit', createdAt: '2016-03-24 00:02:34', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '8991', pageId: 'informed_oversight', userId: 'JessicaTaylor', edit: '0', type: 'newAlias', createdAt: '2016-03-24 00:00:14', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '8990', pageId: 'informed_oversight', userId: 'JessicaTaylor', edit: '2', type: 'newEdit', createdAt: '2016-03-23 23:55:22', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '8989', pageId: 'informed_oversight', userId: 'JessicaTaylor', edit: '1', type: 'newEdit', createdAt: '2016-03-23 23:50:22', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' } ], feedSubmissions: [], searchStrings: {}, hasChildren: 'false', hasParents: 'true', redAliases: {}, improvementTagIds: [], nonMetaTagIds: [], todos: [], slowDownMap: 'null', speedUpMap: 'null', arcPageIds: 'null', contentRequests: {} }