AI Will Not Want To Self-Improve
Published by The Lawfare Institute
in Cooperation With
In foundational accounts of AI risk, the prospect of AI self-improvement looms large. The idea is simple. For any capable, goal-seeking system, the system’s goal will be more readily achieved if the system first makes itself even more capable. Having become somewhat more capable, the system will be able to improve itself again. And so on, possibly generating a rapid explosion of AI capabilities, resulting in systems that humans cannot hope to control.
This paper argues that explosive cycles of AI self-improvement are less likely than existing accounts commonly assume. The reason is not that AI systems will never become capable enough to do cutting-edge machine learning research. Rather, there are previously-unrecognized incentives cutting against AI self-improvement.
Despite being unappreciated, the incentives against AI self-improvement are familiar. They are the same ones that cut against humans building AI systems more capable than ourselves. Currently, there is no reliable way to ensure that such systems are “aligned” with their creators—that they share their creators’ goals. Absent a major breakthrough in the science of AI alignment, any entity—human or artificial—that makes an AI more capable than itself risks generating not a helpful agent, but a powerful competitor and an existential threat.
For an in-depth discussion of the paper, listen to this Lawfare Daily podcast episode.
You can find that paper here or below: