Research directions in AI control

{
  localUrl: '../page/research_directions_ai_control.html',
  arbitalUrl: 'https://arbital.com/p/research_directions_ai_control',
  rawJsonUrl: '../raw/1tz.json',
  likeableId: '777',
  likeableType: 'page',
  myLikeValue: '0',
  likeCount: '0',
  dislikeCount: '0',
  likeScore: '0',
  individualLikes: [],
  pageId: 'research_directions_ai_control',
  edit: '7',
  editSummary: '',
  prevEdit: '6',
  currentEdit: '7',
  wasPublished: 'true',
  type: 'wiki',
  title: 'Research directions in AI control',
  clickbait: '',
  textLength: '3001',
  alias: 'research_directions_ai_control',
  externalUrl: '',
  sortChildrenBy: 'likes',
  hasVote: 'false',
  voteType: '',
  votesAnonymous: 'false',
  editCreatorId: 'PaulChristiano',
  editCreatedAt: '2016-03-04 00:50:02',
  pageCreatorId: 'PaulChristiano',
  pageCreatedAt: '2016-02-03 00:07:31',
  seeDomainId: '0',
  editDomainId: '705',
  submitToDomainId: '0',
  isAutosave: 'false',
  isSnapshot: 'false',
  isLiveEdit: 'true',
  isMinorEdit: 'false',
  indirectTeacher: 'false',
  todoCount: '0',
  isEditorComment: 'false',
  isApprovedComment: 'true',
  isResolved: 'false',
  snapshotText: '',
  anchorContext: '',
  anchorText: '',
  anchorOffset: '0',
  mergedInto: '',
  isDeleted: 'false',
  viewCount: '14',
  text: '\nWhat research would best advance our understanding of AI control?\n\nI’ve been thinking about this question a lot over the last few weeks. This post lays out my best guesses.\n\n### My current take on AI control\n\nI want to [focus on existing AI techniques](https://arbital.com/p/1w2), minimizing speculation about future developments. As a special case, I would like to [use minimal assumptions about unsupervised learning](https://arbital.com/p/1w3), instead relying on supervised and[reinforcement learning](https://arbital.com/p/1v2?title=reinforcement-learning-and-linguistic-convention). My goal is to find [scalable](https://arbital.com/p/1v1?title=scalable-ai-control) approaches to AI control that can be applied to existing AI systems.\n\nFor now, I think that [act-based approaches](https://arbital.com/p/1w4) look significantly more promising than goal-directed approaches. (Note that both categories are [consistent with using value learning](https://arbital.com/p/1vj?title=learn-policies-or-goals).) I think that many apparent problems are distinctive to goal-directed approaches [and can be temporarily set aside](https://arbital.com/p/1w5). But a more direct motivation is that the goal-directed approach seems to require speculative future developments in AI, whereas we can [take a stab](https://arbital.com/p/1vw) at the act-based approach now (though obviously much more work is needed).\n\nIn light of those views, I find the following research directions most attractive:\n\n### Four promising directions\n\n- [Elaborating on apprenticeship learning](https://arbital.com/p/1vx/elaborations_apprenticeship_learning).  \nImitating human behavior seems especially promising as a scalable approach to AI control, but there are many outstanding problems.\n- [Efficiently using human feedback](https://arbital.com/p/1w1).  \nThe limited availability of human feedback may be a serious bottleneck for realistic approaches to AI control.\n- [Explaining human judgments and disagreements](https://arbital.com/p/1vy/human_arguments_ai_control).  \nMy preferred approach to AI control requires humans to understand AIs’ plans and beliefs. We don’t know how to solve the analogous problem for humans.\n- [Designing feedback mechanisms for reinforcement learning](https://arbital.com/p/1vd?title=reward-engineering).  \nA grab bag of problems, united by a need for proxies of hard-to-optimize, implicit objectives.\n\nI will probably be doing work in one or more of these directions soon. I am also interested in talking with anyone who is considering looking into these or similar questions.\n\nI’d love to find considerations that would change my view — whether arguments against these projects, or more promising alternatives. But these are my current best guesses, and I consider them good enough that the right next step is to work on them.\n\n(This research was supported as part of the [_Future of Life Institute_](http://futureoflife.org/) FLI-RFP-AI1 program, grant #2015–143898.)',
  metaText: '',
  isTextLoaded: 'true',
  isSubscribedToDiscussion: 'false',
  isSubscribedToUser: 'false',
  isSubscribedAsMaintainer: 'false',
  discussionSubscriberCount: '1',
  maintainerCount: '1',
  userSubscriberCount: '0',
  lastVisit: '',
  hasDraft: 'false',
  votes: [],
  voteSummary: 'null',
  muVoteSummary: '0',
  voteScaling: '0',
  currentUserVote: '-2',
  voteCount: '0',
  lockedVoteType: '',
  maxEditEver: '0',
  redLinkCount: '0',
  lockedBy: '',
  lockedUntil: '',
  nextPageId: '',
  prevPageId: '',
  usedAsMastery: 'false',
  proposalEditNum: '0',
  permissions: {
    edit: {
      has: 'false',
      reason: 'You don't have domain permission to edit this page'
    },
    proposeEdit: {
      has: 'true',
      reason: ''
    },
    delete: {
      has: 'false',
      reason: 'You don't have domain permission to delete this page'
    },
    comment: {
      has: 'false',
      reason: 'You can't comment in this domain because you are not a member'
    },
    proposeComment: {
      has: 'true',
      reason: ''
    }
  },
  summaries: {},
  creatorIds: [
    'PaulChristiano'
  ],
  childIds: [],
  parentIds: [
    'active_learning_powerful_predictors'
  ],
  commentIds: [],
  questionIds: [],
  tagIds: [],
  relatedIds: [],
  markIds: [],
  explanations: [],
  learnMore: [],
  requirements: [],
  subjects: [],
  lenses: [],
  lensParentId: '',
  pathPages: [],
  learnMoreTaughtMap: {},
  learnMoreCoveredMap: {},
  learnMoreRequiredMap: {},
  editHistory: {},
  domainSubmissions: {},
  answers: [],
  answerCount: '0',
  commentCount: '0',
  newCommentCount: '0',
  linkedMarkCount: '0',
  changeLogs: [
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '8266',
      pageId: 'research_directions_ai_control',
      userId: 'JessicaChuan',
      edit: '7',
      type: 'newEdit',
      createdAt: '2016-03-04 00:50:02',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '7783',
      pageId: 'research_directions_ai_control',
      userId: 'JessicaChuan',
      edit: '6',
      type: 'newEdit',
      createdAt: '2016-02-25 01:58:12',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '7758',
      pageId: 'research_directions_ai_control',
      userId: 'JessicaChuan',
      edit: '5',
      type: 'newEdit',
      createdAt: '2016-02-24 23:13:28',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '6890',
      pageId: 'research_directions_ai_control',
      userId: 'JessicaChuan',
      edit: '3',
      type: 'newEdit',
      createdAt: '2016-02-11 22:43:16',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '6184',
      pageId: 'research_directions_ai_control',
      userId: 'JessicaChuan',
      edit: '0',
      type: 'deleteChild',
      createdAt: '2016-02-03 00:15:30',
      auxPageId: 'implicit_consequentialism',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '6182',
      pageId: 'research_directions_ai_control',
      userId: 'JessicaChuan',
      edit: '2',
      type: 'newChild',
      createdAt: '2016-02-03 00:13:33',
      auxPageId: 'implicit_consequentialism',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '6181',
      pageId: 'research_directions_ai_control',
      userId: 'JessicaChuan',
      edit: '2',
      type: 'newEdit',
      createdAt: '2016-02-03 00:08:06',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '6180',
      pageId: 'research_directions_ai_control',
      userId: 'JessicaChuan',
      edit: '1',
      type: 'newEdit',
      createdAt: '2016-02-03 00:07:31',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '6179',
      pageId: 'research_directions_ai_control',
      userId: 'JessicaChuan',
      edit: '0',
      type: 'newParent',
      createdAt: '2016-02-03 00:04:37',
      auxPageId: 'active_learning_powerful_predictors',
      oldSettingsValue: '',
      newSettingsValue: ''
    }
  ],
  feedSubmissions: [],
  searchStrings: {},
  hasChildren: 'false',
  hasParents: 'true',
  redAliases: {},
  improvementTagIds: [],
  nonMetaTagIds: [],
  todos: [],
  slowDownMap: 'null',
  speedUpMap: 'null',
  arcPageIds: 'null',
  contentRequests: {}
}