Link post AI alignment work typically treats AI’s as single entities. While I agree that this is a good approximation, I think they will be better described as a highly coordinated population of agents. Rather than simply grow in size and acquire more resources, AI’s will find it in their best interests to split into many smaller agents.
One reason to multiply is to save resources. It may be more efficient to break into smaller, simpler agents to handle multiple tasks rather than use a single large agent to handle all tasks.
Having many copies can leverage certain efficiencies of scale. It also offers the AI a way to increase it’s capability without needing to worry about creating an aligned agent.
Splitting into smaller components also has the benefit of reducing overall risk. A single agent faces a much higher probability of extinction compared to a population of agents. Having many copies allows the AI to diversify its strategy.
Under certain circumstances, copies may also be able to carry out a Sybil attack that a singleton could not, giving the AI more influence in the world.
Copies can gather resources more effectively as well. When an AI needs to cover a large area but cannot adequately control actions at every point, it makes sense to split into independent agents. This is particularly true for space expeditions, where the speed of light makes communication too slow to be useful for making quick decisions.
For these reasons, making copies is a convergent instrumental subgoal.
This whole post is hard to evaluate because you haven’t said what you mean by "single agent" vs "multiple agents". If you cut a human brain in half, put each half in different tanks and put laggy transmission and lossy compression between the halves, is that one agent or 2. (When cut in half, both halves of the human brain remain somewhat functional on there own. Does this make normal humans 2 agents?)
I agree there’s an important concept here.
One important countervailing consideration not mentioned in the OP or comments is indexical objectives/values[1]. In the presence of such indexical objectives, even a (computational) perfect clone may give rise to an adversary, because both instances will receive different inputs and accrue different state/context for their objectives to relate to.
cf nature where even perfect genetic clones can be in competition.
I think the key contribution is uncovering the implicit assumptions behind the concept of a singleton agent. Is a nation-state a singleton? If not, why not?
Comment
I agree. I think it is pretty hard to cleanly divide something into a single agent or multiple agents in general. Practically, this means that it can be useful to model something using different frames (e.g. a single agent or as a collection of agents).