Shutdown problem

[summary: The 'shutdown problem' is creating a sufficiently advanced Artificial Intelligence which will, on the press of a button, suspend itself safely to disk; such that the AI does not object to the existence of this button, nor try to prevent it from being pressed.

This is difficult because avoiding your own halt or suspension is a convergent instrumental strategy; even a goal like "bring the user coffee" implies avoiding shutdown.

This problem is sometimes decomposed into (1) the problem of finding a utility function that really actually means "Suspend yourself safely to disk", and (2) the problem of building an agent that wants to switch to optimizing a different utility function if a button is pressed, but that doesn't want to press that button or prevent its being pressed.

The 'shutdown problem' is creating a sufficiently advanced Artificial Intelligence which will, on the press of a button, suspend itself safely to disk; such that the AI does not object to the existence of this button, nor try to prevent it from being pressed.

This is difficult because avoiding your own halt or suspension is a convergent instrumental strategy; even a goal like "bring the user coffee" implies avoiding shutdown.

One possible research avenue is to:

Describe a 'shutdown utility function' whose attainable optimum is "Suspend safely to disk (and possibly safely abort plans in progress), without building a hidden environmental copy of yourself to carry out your previous goal and then assimilate all matter in the universe to make absolutely sure the original AI stays shut down".
Find a sensible way to compose the shutdown utility function with the agent's regular utility function, such that:
- (2a) Which utility function the agent optimizes depends on whether a switch was pressed.
- (2b) The AI experiences no incentive to cause the switch to be pressed or prevent the switch from being pressed.

Harder versions of this problem would add specifications such as:

The AI positively wants the switch to exist and will restore the switch's existence if it is damaged.
The AI will safely abort any plans in progress (and design plans such that they can be safely aborted).