Systems with the computational power of the human brain are likely to be cheap and ubiquitous within the next few decades. As technology becomes more intelligent, we need to ensure that it remains safe and beneficial. This paper describes a rational framework for analyzing intelligent systems and a strategy for developing them safely. The analysis is based on von Neumann’s model of rational economic behavior. We introduce the “Rationally-Shaped Minds” model of intelligent systems with bounded computation. We show that as computational resources increase, there is a natural progression through stimulus-response systems, learning systems, reasoning systems, self-improving systems, to fully rational systems. We show that rational systems are subject to “drives” toward self-protection, resource acquisition, replication, goal preservation, efficiency, and self-improvement. Several of these drives are anti-social and need to be counteracted with analogs of human cooperativeness and compassion. We analyze the three basic strategies for controlling the behavior of intelligent systems. We describe the “Safe-AI Scaffolding” strategy which builds intentionally limited but safe systems to use in the construction of more powerful systems.
The piece builds on his earlier work, “The Nature of Self-Improving Artificial Intelligence" (2007) and "The Basic AI Drives" (2008). The latter was cited in the latest edition of Russell and Norvig’s famous AI textbook.