Don't try to solve the entire alignment problem

{
  localUrl: '../page/dont_solve_whole_problem.html',
  arbitalUrl: 'https://arbital.com/p/dont_solve_whole_problem',
  rawJsonUrl: '../raw/43w.json',
  likeableId: '2655',
  likeableType: 'page',
  myLikeValue: '0',
  likeCount: '3',
  dislikeCount: '0',
  likeScore: '3',
  individualLikes: [
    'AlexeiTurchin',
    'EricRogstad',
    'JeyanBurnsOorjitham'
  ],
  pageId: 'dont_solve_whole_problem',
  edit: '10',
  editSummary: '',
  prevEdit: '9',
  currentEdit: '10',
  wasPublished: 'true',
  type: 'wiki',
  title: 'Don't try to solve the entire alignment problem',
  clickbait: 'New to AI alignment theory?  Want to work in this area?  Already been working in it for years?  Don't try to solve the entire alignment problem with your next good idea!',
  textLength: '5409',
  alias: 'dont_solve_whole_problem',
  externalUrl: '',
  sortChildrenBy: 'likes',
  hasVote: 'false',
  voteType: '',
  votesAnonymous: 'false',
  editCreatorId: 'EliezerYudkowsky',
  editCreatedAt: '2016-06-29 04:48:31',
  pageCreatorId: 'EliezerYudkowsky',
  pageCreatedAt: '2016-06-09 21:09:50',
  seeDomainId: '0',
  editDomainId: 'EliezerYudkowsky',
  submitToDomainId: '0',
  isAutosave: 'false',
  isSnapshot: 'false',
  isLiveEdit: 'true',
  isMinorEdit: 'false',
  indirectTeacher: 'false',
  todoCount: '0',
  isEditorComment: 'false',
  isApprovedComment: 'true',
  isResolved: 'false',
  snapshotText: '',
  anchorContext: '',
  anchorText: '',
  anchorOffset: '0',
  mergedInto: '',
  isDeleted: 'false',
  viewCount: '346',
  text: '[summary:  Rather than trying to solve the entire problem of [2v building nice Artificial Intelligences] with a sufficiently right idea:\n\n- Focus on a single, crisply stated subproblem.  (If your idea solves all of alignment theory, you should also be able to walk through a much smaller-scale example of how it solves one [4m open problem], right?)\n- Make one non-obvious new statement about a subproblem of alignment theory, such that if you were in fact wrong, it would be possible to figure this out now rather than in 20 years.\n  - E.g., your new statement must be clear enough that its further consequences can be derived in a way that current researchers can agree on and have sustained conversations about it.\n  - Or, less likely, that it's possible to test right now with current algorithms, in such a way that [6q if this kind of AI would blow up later then it would also blow up right now].\n- Glance over what current workers in the field consider to be [2l standard difficulties and challenges], so that you can explain to them in their own language why your theory doesn't fail for the usual reasons.]\n\nOn first approaching the [2v alignment problem] for [2c advanced agents], aka "[2l robust] and [3d9 beneficial] [42g AGI]", aka "[1g2 Friendly AI]", a very common approach is to try to come up with *one* idea that solves *all* of [2v AI alignment].  A simple design concept; a simple utility function; a simple development strategy; one guideline for everyone to adhere to; or a large diagram full of boxes with lines to other boxes; that is allegedly sufficient to realize around as much [55 benefit] from beneficial [41l superintelligences] as can possibly be realized.\n\nWithout knowing the details of your current idea, this article can't tell you why it's wrong - though frankly we've got a strong [1rm prior] against it at this point.  But some very standard advice would be:\n\n- Glance over what current discussants think of as [2l standard challenges and difficulties of the overall problem], i.e., why people think the alignment might be hard, and what standard questions a new approach would face.\n- Consider focusing your attention down on a [4m single subproblem] of alignment, and trying to make progress there - not necessarily solve it completely, but contribute nonobvious knowledge about the problem that wasn't there before.  (If you have a broad new approach that solves all of alignment, maybe you could walk through *exactly* how it solves one [2mx crisply identified subproblem]?)\n- Check out the flaws in previous proposals that people currently think won't work.  E.g. various versions of [1b7 utility indifference].\n\nA good initial goal is not "persuade everyone in the field to agree with a new idea" but rather "come up with a contribution to an open discussion that is sufficiently crisply stated that, if it were in fact wrong, it would be possible for somebody else to shoot it down today."  I.e., an idea such that if you're wrong, this can be pointed out in the form of a crisply derivable consequence of a crisply specified idea, rather than it taking 20 years to see what happens.  For there to be sustained progress, propositions need to be stated modularly enough and crisply enough that there can be a conversation about them that goes beyond "does not / does too" - ideas need to be stated in forms that have sufficiently clear and derivable consequences that if there's a problem, people can see the problem and agree on it.\n\nAlternatively, [3nj poke a clearly demonstrable flaw in some solution currently being critiqued].  Since most proposals in alignment theory get shot down, trying to participate in the critiquing process has a great advantage over trying to invent solutions, in that you'll probably have started with the true premise "proposal X is broken or incomplete" rather than the false premise "proposal X works and solves everything".\n\n[43h] a little about why people might try to solve all of alignment theory in one shot, one might recount Robyn Dawes's advice that:\n\n- Research shows that people come up with better solutions when they discuss the problem as thoroughly as possible before discussing any answers.\n- Dawes has observed that people seem *more* likely to violate this principle as the problem becomes more difficult.\n\n...and finally remark that building a nice machine intelligence correctly on the first try must be pretty darned difficult, since so many people solve it in the first 15 seconds.\n\nIt's possible that everyone working in this field is just missing the obvious and that there *is* some simple idea which solves all the problems.  But realistically, you should be aware that everyone in this field has already heard a dozen terrible Total Solutions, and probably hasn't had anything fun happen as a result of discussing them, resulting in some amount of attentional fatigue.  (Similarly:  If not everyone believes you, or even if it's hard to get people to listen to your solution instead of talking with people they already know, that's not necessarily because of some [43h deep-seated psychological problem] on their part, such as being uninterested in outsiders' ideas.  Even if you're not an obvious crank, people are still unlikely to take the time out to engage with you unless you signal awareness of what *they* think are the usual issues and obstacles.  It's not so different here from other fields.)',
  metaText: '',
  isTextLoaded: 'true',
  isSubscribedToDiscussion: 'false',
  isSubscribedToUser: 'false',
  isSubscribedAsMaintainer: 'false',
  discussionSubscriberCount: '4',
  maintainerCount: '1',
  userSubscriberCount: '0',
  lastVisit: '',
  hasDraft: 'false',
  votes: [],
  voteSummary: 'null',
  muVoteSummary: '0',
  voteScaling: '0',
  currentUserVote: '-2',
  voteCount: '0',
  lockedVoteType: '',
  maxEditEver: '0',
  redLinkCount: '0',
  lockedBy: '',
  lockedUntil: '',
  nextPageId: '',
  prevPageId: '',
  usedAsMastery: 'false',
  proposalEditNum: '0',
  permissions: {
    edit: {
      has: 'false',
      reason: 'You don't have domain permission to edit this page'
    },
    proposeEdit: {
      has: 'true',
      reason: ''
    },
    delete: {
      has: 'false',
      reason: 'You don't have domain permission to delete this page'
    },
    comment: {
      has: 'false',
      reason: 'You can't comment in this domain because you are not a member'
    },
    proposeComment: {
      has: 'true',
      reason: ''
    }
  },
  summaries: {},
  creatorIds: [
    'EliezerYudkowsky'
  ],
  childIds: [],
  parentIds: [
    'AI_safety_mindset'
  ],
  commentIds: [
    '4x8',
    '4xd'
  ],
  questionIds: [],
  tagIds: [],
  relatedIds: [],
  markIds: [],
  explanations: [],
  learnMore: [],
  requirements: [],
  subjects: [],
  lenses: [],
  lensParentId: '',
  pathPages: [],
  learnMoreTaughtMap: {},
  learnMoreCoveredMap: {},
  learnMoreRequiredMap: {},
  editHistory: {},
  domainSubmissions: {},
  answers: [],
  answerCount: '0',
  commentCount: '0',
  newCommentCount: '0',
  linkedMarkCount: '0',
  changeLogs: [
    {
      likeableId: '2881',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '1',
      dislikeCount: '0',
      likeScore: '1',
      individualLikes: [],
      id: '14798',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '10',
      type: 'newEdit',
      createdAt: '2016-06-29 04:48:31',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '14793',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '9',
      type: 'newEdit',
      createdAt: '2016-06-29 04:30:54',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '12224',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '8',
      type: 'newEdit',
      createdAt: '2016-06-09 21:28:52',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '12223',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '7',
      type: 'newEdit',
      createdAt: '2016-06-09 21:28:36',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '12222',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '6',
      type: 'newEdit',
      createdAt: '2016-06-09 21:27:44',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '12221',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '5',
      type: 'newEdit',
      createdAt: '2016-06-09 21:27:12',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '12220',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '4',
      type: 'newEdit',
      createdAt: '2016-06-09 21:26:47',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '12219',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '3',
      type: 'newEdit',
      createdAt: '2016-06-09 21:22:43',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '12218',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'deleteTag',
      createdAt: '2016-06-09 21:10:11',
      auxPageId: 'start_meta_tag',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '12215',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '1',
      type: 'newEdit',
      createdAt: '2016-06-09 21:09:50',
      auxPageId: '',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '12124',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '1',
      type: 'newTag',
      createdAt: '2016-06-09 01:00:39',
      auxPageId: 'start_meta_tag',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '12123',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '0',
      type: 'deleteTag',
      createdAt: '2016-06-09 01:00:34',
      auxPageId: 'stub_meta_tag',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '12122',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '1',
      type: 'newTag',
      createdAt: '2016-06-09 01:00:31',
      auxPageId: 'stub_meta_tag',
      oldSettingsValue: '',
      newSettingsValue: ''
    },
    {
      likeableId: '0',
      likeableType: 'changeLog',
      myLikeValue: '0',
      likeCount: '0',
      dislikeCount: '0',
      likeScore: '0',
      individualLikes: [],
      id: '12121',
      pageId: 'dont_solve_whole_problem',
      userId: 'EliezerYudkowsky',
      edit: '1',
      type: 'newParent',
      createdAt: '2016-06-09 00:59:07',
      auxPageId: 'AI_safety_mindset',
      oldSettingsValue: '',
      newSettingsValue: ''
    }
  ],
  feedSubmissions: [],
  searchStrings: {},
  hasChildren: 'false',
  hasParents: 'true',
  redAliases: {},
  improvementTagIds: [],
  nonMetaTagIds: [],
  todos: [],
  slowDownMap: 'null',
  speedUpMap: 'null',
  arcPageIds: 'null',
  contentRequests: {}
}