List: value-alignment subjects

https://arbital.com/p/value_alignment_subject_list

by Eliezer Yudkowsky Mar 31 2015 updated Jan 27 2017

Bullet point list of core VAT subjects.


[summary: This page contains a thorough list of all value-alignment subjects.]

Safety paradigm for advanced agents

Foreseen difficulties

Reflectivity problems

Foreseen normal difficulties

General agent theory

Value theory

Larger research agendas

Possible future use-cases

Possible escape routes

Background

Strategy


Comments

Alexei Andreev

Ideally we shouldn't have pages like this. It means that the hierarchy feature failed. Is this just meant to be temporary? Or do you foresee this as a permanent page?

Eliezer Yudkowsky

I think one will often still need 'introductory' or 'tutorial' type pages that walk through the hierarchy as English text, but this exact page was something I whipped up during the recent Experimental Research Retreat as an alternative to just dumping the info and because I thought I might start filling it in as Arbital pages.

Anna Salamon

I'm finding this page helpful. Alexei, does your theory think I shouldn't be?

Alexei Andreev

I definitely think something like this should exist and will be helpful, but I think Arbital should be able to generate something like this automatically. Until it can, we are stuck doing it manually.

Expanding all children in the Children tab on the AI alignment page achieves something similar, but not quite as clean.

Mike Johnson

Within the "Value Theory" section, I'd propose two subpoints:

The 'Unity of Value Thesis' is simply what we get if the Complexity of Value Thesis is wrong. And it could be wrong- we just don't know. For what this could look like, see e.g. https://qualiacomputing.com/2016/11/19/the-tyranny-of-the-intentional-object/

'Necessity of Physical Representation' refers to the notion that ultimately, a proper theory of value must compile to physics. We are made from physical stuff, and everything we interact with and value is made from the same physical stuff, and so ethics ultimately is about how to move & arrange the physical stuff in our light-cone. If a theory of value does not operate at this level, it can't be a final theory of value. See e.g., Tegmark's argument here: https://arxiv.org/abs/1409.0813