[Question] Conditional on the first AGI being aligned correctly, is a good outcome even still likely?

It depends upon how long it takes, or how likely it is, for AGI to bootstrap into super-intelligence for aligned versus non-aligned AGIs. If the orthogonality thesis holds, then aligned AGI is no less or more intrinsically capable of super-intelligence than non-aligned. I would suspect that aligned super-intelligent AGI would be far more capable than we are at detecting and preventing development of unaligned super-intelligence. The fact that in this hypothetical scenario we successfully produced aligned AGI before non-aligned AGI would be moderately strong evidence that we were not terrible at it. So it would be reasonable to suppose that the chances of future development of unaligned super-intelligence after a friendly super-intelligence would be markedly lowered. So I suspect that the bulk of probability is in the case of developing aligned AGI that does not develop super-intelligence, or at least not fast enough to prevent an unaligned one from overtaking it. This isn’t an implausible scenario, certainly. While I can’t think of any reason why this would be more likely than not, I certainly don’t assign it negligible probability.

[Question] Conditional on the first AGI being aligned correctly, is a good outcome even still likely?

Comment