{
localUrl: '../page/informed_oversight.html',
arbitalUrl: 'https://arbital.com/p/informed_oversight',
rawJsonUrl: '../raw/2sc.json',
likeableId: '1708',
likeableType: 'page',
myLikeValue: '0',
likeCount: '1',
dislikeCount: '0',
likeScore: '1',
individualLikes: [
'PatrickLaVictoir'
],
pageId: 'informed_oversight',
edit: '7',
editSummary: '',
prevEdit: '6',
currentEdit: '7',
wasPublished: 'true',
type: 'wiki',
title: 'Informed oversight',
clickbait: 'Incentivize a reinforcement learner that's less smart than you to accomplish some task',
textLength: '3655',
alias: 'informed_oversight',
externalUrl: '',
sortChildrenBy: 'likes',
hasVote: 'false',
voteType: '',
votesAnonymous: 'false',
editCreatorId: 'JessicaTaylor',
editCreatedAt: '2016-03-24 01:50:49',
pageCreatorId: 'JessicaTaylor',
pageCreatedAt: '2016-03-23 23:50:22',
seeDomainId: '0',
editDomainId: 'EliezerYudkowsky',
submitToDomainId: '0',
isAutosave: 'false',
isSnapshot: 'false',
isLiveEdit: 'true',
isMinorEdit: 'false',
indirectTeacher: 'false',
todoCount: '0',
isEditorComment: 'false',
isApprovedComment: 'true',
isResolved: 'false',
snapshotText: '',
anchorContext: '',
anchorText: '',
anchorOffset: '0',
mergedInto: '',
isDeleted: 'false',
viewCount: '121',
text: 'It can be difficult for one agent to oversee another agent that is maximizing a reward controlled by the first agent, even if the first agent is smarter than the second agent. This is a serious obstacle to [1t7] and other approaches based on reinforcement learning.\n\nMore formally, suppose a strong agent B with a utility function U is overseeing a weak agent A. Agent A produces an action y and gets a reward B(y) determined by agent B. B wants to give rewards such that agent A is incentivized to make its best effort at maximizing U(a). \n\nIf B(y) is an [6s efficient] estimate of U(y) relative to A, then indeed maximizing B(y) is equivalent to maximizing U(y) from A's perspective. Unfortunately, it is difficult for this to be the case even when B is smarter than A, because A may think thoughts B does not in the course of selecting y. Here are a couple concrete examples (each having the flavor of "easy to generate a solution, hard to check a solution"):\n\n* Suppose A is outputting a partial Sudoku solutions. U assigns a high value to partial Sudoku solutions consistent with some solution to the entire Sudoku problem. A comes up with some solution to the Sudoku puzzle and outputs part of this solution. B cannot tell that this partial solution is actually part of a full solution, even though B is smarter than A. (B is able to come up with its own solutions but most aren't consistent with A's partial solution).\n* Suppose U assigns a high value to cat pictures that don't contain steganographic messages. B has no way of distinguishing cat pictures that contain steganographic messages from ones that don't. Thus, B has to give all cat pictures the same score, regardless of whether the pictures contain steganographic messages. This either fails to reward A for producing cat pictures not containing steganographic messages, or rewards A for producing cat pictures containing steganographic messages.\n\nHere are a couple analogous examples that seem easier to analyze:\n\n1. Let O be a one-way function (e.g. SHA256). U assigns a high value to binary strings y equal to O(111x) for some x. B cannot tell whether a particular string y is equal to O(111x) for some x (even if B is smarter than A), so B has no good way of giving rewards to A. \n2. Same as the previous problem, except U assigns a high value to binary strings y that are _not_ equal to O(111x) for any short string x. A's intended strategy is to generate a uniformly random string (or generate a string in some other way that doesn't introduce a bias towards generating strings of the form O(111x)), but B has no way of knowing that A's string was uniformly random (rather than equal to O(111x) for a uniformly random x).\n\n\nOne attempted solution is for agent B to see A's thought processes in addition to the action y. Then, agent B can use all the information A has, so that B(y) is an efficient estimate of U(y) from the perspective of A. However, this requires B to understand something about A's mental architecture. It would be good to have a more generic solution that requires only weak assumptions about A's mental architecture.\n\nAn attempted solution to problem 2 is [for B to incentivize A to generate its action in a provably random manner](https://agentfoundations.org/item?id=700). However, this doesn't appear to generalize well to problems significantly different from 2.\n\n# Further reading\n[Paul Christiano on adequate oversight](https://medium.com/ai-control/adequate-oversight-25fadf1edce9#.bv5lq13fw)\n\n[Paul Christiano on the informed oversight problem](https://medium.com/ai-control/the-informed-oversight-problem-1b51b4f66b35#.o82tjmhll)',
metaText: '',
isTextLoaded: 'true',
isSubscribedToDiscussion: 'false',
isSubscribedToUser: 'false',
isSubscribedAsMaintainer: 'false',
discussionSubscriberCount: '1',
maintainerCount: '1',
userSubscriberCount: '0',
lastVisit: '',
hasDraft: 'false',
votes: [],
voteSummary: 'null',
muVoteSummary: '0',
voteScaling: '0',
currentUserVote: '-2',
voteCount: '0',
lockedVoteType: '',
maxEditEver: '0',
redLinkCount: '0',
lockedBy: '',
lockedUntil: '',
nextPageId: '',
prevPageId: '',
usedAsMastery: 'false',
proposalEditNum: '0',
permissions: {
edit: {
has: 'false',
reason: 'You don't have domain permission to edit this page'
},
proposeEdit: {
has: 'true',
reason: ''
},
delete: {
has: 'false',
reason: 'You don't have domain permission to delete this page'
},
comment: {
has: 'false',
reason: 'You can't comment in this domain because you are not a member'
},
proposeComment: {
has: 'true',
reason: ''
}
},
summaries: {},
creatorIds: [
'JessicaTaylor'
],
childIds: [],
parentIds: [
'ai_alignment'
],
commentIds: [
'2w5'
],
questionIds: [],
tagIds: [
'taskagi_open_problems'
],
relatedIds: [],
markIds: [],
explanations: [],
learnMore: [],
requirements: [
{
id: '2670',
parentId: 'efficiency',
childId: 'informed_oversight',
type: 'requirement',
creatorId: 'AlexeiAndreev',
createdAt: '2016-06-17 21:58:56',
level: '1',
isStrong: 'false',
everPublished: 'true'
}
],
subjects: [],
lenses: [],
lensParentId: '',
pathPages: [],
learnMoreTaughtMap: {},
learnMoreCoveredMap: {},
learnMoreRequiredMap: {},
editHistory: {},
domainSubmissions: {},
answers: [],
answerCount: '0',
commentCount: '0',
newCommentCount: '0',
linkedMarkCount: '0',
changeLogs: [
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '9116',
pageId: 'informed_oversight',
userId: 'JessicaTaylor',
edit: '0',
type: 'deleteParent',
createdAt: '2016-03-27 06:00:29',
auxPageId: 'taskagi_open_problems',
oldSettingsValue: '',
newSettingsValue: ''
},
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '9031',
pageId: 'informed_oversight',
userId: 'JessicaTaylor',
edit: '0',
type: 'deleteParent',
createdAt: '2016-03-24 04:27:01',
auxPageId: 'approval_directed_agents',
oldSettingsValue: '',
newSettingsValue: ''
},
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '9020',
pageId: 'informed_oversight',
userId: 'JessicaTaylor',
edit: '7',
type: 'newEdit',
createdAt: '2016-03-24 01:50:49',
auxPageId: '',
oldSettingsValue: '',
newSettingsValue: ''
},
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '9019',
pageId: 'informed_oversight',
userId: 'JessicaTaylor',
edit: '6',
type: 'newEdit',
createdAt: '2016-03-24 01:31:41',
auxPageId: '',
oldSettingsValue: '',
newSettingsValue: ''
},
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '9006',
pageId: 'informed_oversight',
userId: 'JessicaTaylor',
edit: '5',
type: 'newEdit',
createdAt: '2016-03-24 00:44:07',
auxPageId: '',
oldSettingsValue: '',
newSettingsValue: ''
},
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '9005',
pageId: 'informed_oversight',
userId: 'JessicaTaylor',
edit: '4',
type: 'newParent',
createdAt: '2016-03-24 00:36:03',
auxPageId: 'taskagi_open_problems',
oldSettingsValue: '',
newSettingsValue: ''
},
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '8997',
pageId: 'informed_oversight',
userId: 'JessicaTaylor',
edit: '4',
type: 'newRequirement',
createdAt: '2016-03-24 00:07:33',
auxPageId: 'efficiency',
oldSettingsValue: '',
newSettingsValue: ''
},
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '8995',
pageId: 'informed_oversight',
userId: 'JessicaTaylor',
edit: '4',
type: 'newParent',
createdAt: '2016-03-24 00:07:23',
auxPageId: 'approval_directed_agents',
oldSettingsValue: '',
newSettingsValue: ''
},
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '8993',
pageId: 'informed_oversight',
userId: 'JessicaTaylor',
edit: '4',
type: 'newEdit',
createdAt: '2016-03-24 00:03:25',
auxPageId: '',
oldSettingsValue: '',
newSettingsValue: ''
},
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '8992',
pageId: 'informed_oversight',
userId: 'JessicaTaylor',
edit: '3',
type: 'newEdit',
createdAt: '2016-03-24 00:02:34',
auxPageId: '',
oldSettingsValue: '',
newSettingsValue: ''
},
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '8991',
pageId: 'informed_oversight',
userId: 'JessicaTaylor',
edit: '0',
type: 'newAlias',
createdAt: '2016-03-24 00:00:14',
auxPageId: '',
oldSettingsValue: '',
newSettingsValue: ''
},
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '8990',
pageId: 'informed_oversight',
userId: 'JessicaTaylor',
edit: '2',
type: 'newEdit',
createdAt: '2016-03-23 23:55:22',
auxPageId: '',
oldSettingsValue: '',
newSettingsValue: ''
},
{
likeableId: '0',
likeableType: 'changeLog',
myLikeValue: '0',
likeCount: '0',
dislikeCount: '0',
likeScore: '0',
individualLikes: [],
id: '8989',
pageId: 'informed_oversight',
userId: 'JessicaTaylor',
edit: '1',
type: 'newEdit',
createdAt: '2016-03-23 23:50:22',
auxPageId: '',
oldSettingsValue: '',
newSettingsValue: ''
}
],
feedSubmissions: [],
searchStrings: {},
hasChildren: 'false',
hasParents: 'true',
redAliases: {},
improvementTagIds: [],
nonMetaTagIds: [],
todos: [],
slowDownMap: 'null',
speedUpMap: 'null',
arcPageIds: 'null',
contentRequests: {}
}