Task (AI goal)

https://arbital.com/p/task_goal

by Eliezer Yudkowsky Jun 20 2016 updated Jan 26 2017

When building the first AGIs, it may be wiser to assign them only goals that are bounded in space and time, and can be satisfied by bounded efforts.


[summary: A "Task" is a goal within an AI that only covers a bounded amount of space and time, and can be satisfied by a limited amount of effort.

An example might be "fill this cauldron with water before 1pm"; but even there, we have to be careful. "Maximize the probability that this cauldron contains water at 1pm" would imply unlimited effort, since slightly higher probabilities could be obtained by adding more and more effort.

"Carry out some policy such that there's at least a 95% chance that the cauldron is at least 90% full of water by 1pm" would be more task-ish. A limited effort seems like definitely enough to do that, and then it can't be done any further by expending more effort.

See also Low impact, Mild optimization and Task-directed AGI.]

A "Task" is a goal or subgoal within an advanced AI, that can be satisfied as fully as possible by optimizing a bounded part of space, for a limited time, with a limited amount of effort.

E.g., "make as many paperclips as possible" is definitely not a 'task' in this sense, since it spans every paperclip anywhere in space and future time. Creating more and more paperclips, using more and more effort, would be more and more preferable up to the maximum exertable effort.

For a more subtle example of non-taskishness, consider Disney's "sorcerer's apprentice" scenario: Mickey Mouse commands a broomstick to fill a cauldron. The broomstick then adds more and more water to the cauldron until the workshop is flooded. (Mickey then tries to destroy the broomstick. But since the broomstick has no designed-in reflectively stable shutdown button, the broomstick repairs itself and begins constructing subagents that go on pouring more water into the cauldron.)

Since the Disney cartoon is a musical, we don't know if the broomstick was given a time bound on its job. Let us suppose that Mickey tells the broomstick to do its job sometime before 1pm.

Then we might imagine that the broomstick is a subjective expected utility maximizer with a utility function $~$U_{cauldron}$~$ over outcomes $~$o$~$:

$$~$U_{cauldron}(o): \begin{cases} 1 & \text{if in $o$ the cauldron is $\geq 90\%$ full of water at 1pm} \\ 0 & \text{otherwise} \end{cases}$~$$

This looks at first glance like it ought to be taskish:

The last property in particular makes $~$U_{cauldron}$~$ a "satisficing utility function", one where an outcome is either satisfactory or not-satisfactory, and it is not possible to do any better than "satisfactory".

But by previous assumption, the broomstick is still optimizing expected utility. Assume the broomstick reasons with reasonable generality via some universal prior. Then the subjective probability of the cauldron being full, when it looks full to the broomstick-agent, will not be exactly $~$1.$~$ Perhaps (the broomstick-agent reasons) the broomstick's cameras are malfunctioning, or its RAM has malfunctioned producing an inaccurate memory.

Then the broomstick-agent reasons that it can further increase the probability of the cauldron being full - however slight the increase in probability - by going ahead and dumping in another bucket of water.

That is: Cromwell's Rule implies that the subjective probability of the bucket being full never reaches exactly $~$1$~$. Then there can be an infinite series of increasingly preferred, increasingly more effortful policies $~$\pi_1, \pi_2, \pi_3 \ldots$~$ with

$$~$\mathbb E [ U_{cauldron} | \pi_1] = 0.99\\ \mathbb E [ U_{cauldron} | \pi_2] = 0.999 \\ \mathbb E [ U_{cauldron} | \pi_3] = 0.999002 \\ \ldots$~$$

In that case the broomstick can always do better in expected utility (however slightly) by exerting even more effort, up to the maximum effort it can exert. Hence the flooded workshop.

If on the other hand the broomstick is an expected utility satisficer, i.e., a policy is "acceptable" if it has $~$\mathbb E [ U_{cauldron} | \pi ] \geq 0.95,$~$ then this is now finally a taskish process (we think). The broomstick can find some policy that's reasonably sure of filling up the cauldron, execute that policy, and then do no more.

As described, this broomstick doesn't yet have any impact penalty, or features for mild optimization. So the broomstick could also get $~$\geq 0.90$~$ expected utility by flooding the whole workshop; we haven't yet forbidden excess efforts. Similarly, the broomstick could also go on to destroy the world after 1pm - we haven't yet forbidden excess impacts.

But the underlying rule of "Execute a policy that fills the cauldron at least 90% full with at least 95% probability" does appear taskish, so far as we know. It seems possible for an otherwise well-designed agent to execute this goal to the greatest achievable degree, by acting in bounded space, over a bounded time, with a limited amount of effort. There does not appear to be a sequence of policies the agent would evaluate as better fulfilling its decision criterion, which use successively more and more effort.

The "taskness" of this goal, even assuming it was correctly identified, wouldn't by itself make the broomstick a fully taskish AGI. We also have to consider whether every subprocess of the AI is similarly tasky; whether there is any subprocess anywhere in the AI that tries to improve memory efficiency 'as far as possible'. But it would be a start, and make further safety features more feasible/useful.

See also Mild optimization as an open problem in AGI alignment.


Comments

Ryan Carey

I think the "task AI" term has been a bit confusing. When people first hear the term "task AI" they naturally think of non-autonomy (Jessica and I both did this). It also sounds a bit similar to Holden's "tool AI" which has similar connotations.

Whereas I'm apparently supposed to be imagining an optionally autonomous satisficing agent. I admittedly don't have any better suggestions.