The probability that an agent that is cognitively powerful enough to be relevant to existential outcomes, will have been subject to strong, general optimization pressures. Two (disjunctive) supporting arguments are that, one, pragmatically accessible paths to producing cognitively powerful agents tend to invoke strong and general optimization pressures, and two, that cognitively powerful agents would be expected to apply strong and general optimization pressures to themselves.
An example of a scenario that negates [ RelevantPowerfulAgentsHighlyOptimized] is [ KnownAlgorithmNonrecursiveIntelligence], where a cognitively powerful intelligence is produced by pouring lots of computing power into known algorithms, and this intelligence is then somehow prohibited from self-modification and the creation of environmental subagents.
Whether a cognitively powerful agent will in fact have been sufficiently optimized depends on the disjunction of:
- [ PowerfulAgentAlreadyOptimized]. Filtering for agents which are powerful enough to be relevant to advanced agent scenarios, will in practice leave mainly agents that have already been subjected to large amounts of cognitive optimization from some internal or external source.
- [ PowerfulAgentSelfOptimizes]. Agents that reach a threshold of power sufficient to be relevant to advanced agent scenarios will afterwards optimize themselves both heavily and generally.
Ending up with a scenario along the lines of [ KnownAlgorithmNonrecursiveIntelligence] requires defeating both of the above conditions simultaneously. The second condition seems more difficult and to require more Corrigibility or [ CapabilityControl] features than the first.
Comments
Kenzi Amodei
Examples of 'strong, general optimization pressures'? Maybe the sorts of things in that table from Superintelligence. ?Optimization pressure = something like a selective filter, where "strong" means that it was strongly selected for? And maybe the reason to say 'optimization' is to imply that there was a trait that was selected for, strongly, in the same direction (or towards the same narrow target, more like?) for many "generations". Mm, or that all the many different elements of the agent were built towards that trait, with nothing else being a strong competitor. And then "general" presumably is doing something like the work that it does in "general intelligence", ie, not narrow? Ah, a different meaning would be that the agent has been subject to strong pressures towards being a 'general optimizer'. Seems less strongly implied by the grammar, but doesn't create any obvious meaningful interpretive differences.
Oh, or "general" could mean "along many/all axes". So, optimization pressure that is strong, and along many axes. Which fails to specify a set of axes that are relevant, but that doesn't seem super problematic at this moment.
It's not obvious to me that filtering for agents powerful enough to be relevant will leave mainly agents who've been subjected to strong general optimization pressures. For example the Limited Genie described on the Advanced Agent page maybe wasn't?
For self optimization, I assume this is broadly because of the convergent instrumental values claim?