{ localUrl: '../page/research_directions_ai_control.html', arbitalUrl: 'https://arbital.com/p/research_directions_ai_control', rawJsonUrl: '../raw/1tz.json', likeableId: '777', likeableType: 'page', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], pageId: 'research_directions_ai_control', edit: '7', editSummary: '', prevEdit: '6', currentEdit: '7', wasPublished: 'true', type: 'wiki', title: 'Research directions in AI control', clickbait: '', textLength: '3001', alias: 'research_directions_ai_control', externalUrl: '', sortChildrenBy: 'likes', hasVote: 'false', voteType: '', votesAnonymous: 'false', editCreatorId: 'PaulChristiano', editCreatedAt: '2016-03-04 00:50:02', pageCreatorId: 'PaulChristiano', pageCreatedAt: '2016-02-03 00:07:31', seeDomainId: '0', editDomainId: '705', submitToDomainId: '0', isAutosave: 'false', isSnapshot: 'false', isLiveEdit: 'true', isMinorEdit: 'false', indirectTeacher: 'false', todoCount: '0', isEditorComment: 'false', isApprovedComment: 'true', isResolved: 'false', snapshotText: '', anchorContext: '', anchorText: '', anchorOffset: '0', mergedInto: '', isDeleted: 'false', viewCount: '14', text: '\nWhat research would best advance our understanding of AI control?\n\nI’ve been thinking about this question a lot over the last few weeks. This post lays out my best guesses.\n\n### My current take on AI control\n\nI want to [focus on existing AI techniques](https://arbital.com/p/1w2), minimizing speculation about future developments. As a special case, I would like to [use minimal assumptions about unsupervised learning](https://arbital.com/p/1w3), instead relying on supervised and[reinforcement learning](https://arbital.com/p/1v2?title=reinforcement-learning-and-linguistic-convention). My goal is to find [scalable](https://arbital.com/p/1v1?title=scalable-ai-control) approaches to AI control that can be applied to existing AI systems.\n\nFor now, I think that [act-based approaches](https://arbital.com/p/1w4) look significantly more promising than goal-directed approaches. (Note that both categories are [consistent with using value learning](https://arbital.com/p/1vj?title=learn-policies-or-goals).) I think that many apparent problems are distinctive to goal-directed approaches [and can be temporarily set aside](https://arbital.com/p/1w5). But a more direct motivation is that the goal-directed approach seems to require speculative future developments in AI, whereas we can [take a stab](https://arbital.com/p/1vw) at the act-based approach now (though obviously much more work is needed).\n\nIn light of those views, I find the following research directions most attractive:\n\n### Four promising directions\n\n- [Elaborating on apprenticeship learning](https://arbital.com/p/1vx/elaborations_apprenticeship_learning). \nImitating human behavior seems especially promising as a scalable approach to AI control, but there are many outstanding problems.\n- [Efficiently using human feedback](https://arbital.com/p/1w1). \nThe limited availability of human feedback may be a serious bottleneck for realistic approaches to AI control.\n- [Explaining human judgments and disagreements](https://arbital.com/p/1vy/human_arguments_ai_control). \nMy preferred approach to AI control requires humans to understand AIs’ plans and beliefs. We don’t know how to solve the analogous problem for humans.\n- [Designing feedback mechanisms for reinforcement learning](https://arbital.com/p/1vd?title=reward-engineering). \nA grab bag of problems, united by a need for proxies of hard-to-optimize, implicit objectives.\n\nI will probably be doing work in one or more of these directions soon. I am also interested in talking with anyone who is considering looking into these or similar questions.\n\nI’d love to find considerations that would change my view — whether arguments against these projects, or more promising alternatives. But these are my current best guesses, and I consider them good enough that the right next step is to work on them.\n\n(This research was supported as part of the [_Future of Life Institute_](http://futureoflife.org/) FLI-RFP-AI1 program, grant #2015–143898.)', metaText: '', isTextLoaded: 'true', isSubscribedToDiscussion: 'false', isSubscribedToUser: 'false', isSubscribedAsMaintainer: 'false', discussionSubscriberCount: '1', maintainerCount: '1', userSubscriberCount: '0', lastVisit: '', hasDraft: 'false', votes: [], voteSummary: 'null', muVoteSummary: '0', voteScaling: '0', currentUserVote: '-2', voteCount: '0', lockedVoteType: '', maxEditEver: '0', redLinkCount: '0', lockedBy: '', lockedUntil: '', nextPageId: '', prevPageId: '', usedAsMastery: 'false', proposalEditNum: '0', permissions: { edit: { has: 'false', reason: 'You don't have domain permission to edit this page' }, proposeEdit: { has: 'true', reason: '' }, delete: { has: 'false', reason: 'You don't have domain permission to delete this page' }, comment: { has: 'false', reason: 'You can't comment in this domain because you are not a member' }, proposeComment: { has: 'true', reason: '' } }, summaries: {}, creatorIds: [ 'PaulChristiano' ], childIds: [], parentIds: [ 'active_learning_powerful_predictors' ], commentIds: [], questionIds: [], tagIds: [], relatedIds: [], markIds: [], explanations: [], learnMore: [], requirements: [], subjects: [], lenses: [], lensParentId: '', pathPages: [], learnMoreTaughtMap: {}, learnMoreCoveredMap: {}, learnMoreRequiredMap: {}, editHistory: {}, domainSubmissions: {}, answers: [], answerCount: '0', commentCount: '0', newCommentCount: '0', linkedMarkCount: '0', changeLogs: [ { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '8266', pageId: 'research_directions_ai_control', userId: 'JessicaChuan', edit: '7', type: 'newEdit', createdAt: '2016-03-04 00:50:02', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '7783', pageId: 'research_directions_ai_control', userId: 'JessicaChuan', edit: '6', type: 'newEdit', createdAt: '2016-02-25 01:58:12', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '7758', pageId: 'research_directions_ai_control', userId: 'JessicaChuan', edit: '5', type: 'newEdit', createdAt: '2016-02-24 23:13:28', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6890', pageId: 'research_directions_ai_control', userId: 'JessicaChuan', edit: '3', type: 'newEdit', createdAt: '2016-02-11 22:43:16', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6184', pageId: 'research_directions_ai_control', userId: 'JessicaChuan', edit: '0', type: 'deleteChild', createdAt: '2016-02-03 00:15:30', auxPageId: 'implicit_consequentialism', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6182', pageId: 'research_directions_ai_control', userId: 'JessicaChuan', edit: '2', type: 'newChild', createdAt: '2016-02-03 00:13:33', auxPageId: 'implicit_consequentialism', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6181', pageId: 'research_directions_ai_control', userId: 'JessicaChuan', edit: '2', type: 'newEdit', createdAt: '2016-02-03 00:08:06', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6180', pageId: 'research_directions_ai_control', userId: 'JessicaChuan', edit: '1', type: 'newEdit', createdAt: '2016-02-03 00:07:31', auxPageId: '', oldSettingsValue: '', newSettingsValue: '' }, { likeableId: '0', likeableType: 'changeLog', myLikeValue: '0', likeCount: '0', dislikeCount: '0', likeScore: '0', individualLikes: [], id: '6179', pageId: 'research_directions_ai_control', userId: 'JessicaChuan', edit: '0', type: 'newParent', createdAt: '2016-02-03 00:04:37', auxPageId: 'active_learning_powerful_predictors', oldSettingsValue: '', newSettingsValue: '' } ], feedSubmissions: [], searchStrings: {}, hasChildren: 'false', hasParents: 'true', redAliases: {}, improvementTagIds: [], nonMetaTagIds: [], todos: [], slowDownMap: 'null', speedUpMap: 'null', arcPageIds: 'null', contentRequests: {} }