Friendly AI Research

Mar 24

Detecting superintelligence

Roman Yampolskiy, AI-Complete CAPTCHAs as Zero Knowledge Proofs of Access to an Artificially Intelligent System:

Experts predict that in the next 10 to 100 years scientists will succeed in creating human-level artificial general intelligence. While it is most likely that this task will be accomplished by a government agency or a large corporation, the possibility remains that it will be done by a single inventor or a small team of researchers. In this paper, we address the question of safeguarding a discovery which could without hesitation be said to be worth trillions of dollars. Specifically, we propose a method based on the combination of zero knowledge proofs and provably AI-complete CAPTCHA problems to show that a superintelligent system has been constructed without having to reveal the system itself.

Mar 17

Bostrom, ‘The Superintelligent Will’

Bostrom (2012). The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents.

This paper discusses the relation between intelligence and motivation in artificial agents, developing and briefly arguing for two theses.  The first, the orthogonality thesis, holds (with some caveats) that intelligence and final goals (purposes) are orthogonal axes along which possible artificial intellects can freely vary—more or less any level of intelligence could be combined with more or less any final goal.  The second, the instrumental convergence thesis, holds that as long as they possess a sufficient level of intelligence, agents having any of a wide range of final goals will pursue similar intermediary goals because they have instrumental reasons to do so. In combination, the two theses help us understand the possible range of behavior of superintelligent agents, and they point to some potential dangers in building such an agent.

Two new papers from Yampolskiy & Fox

Yampolskiy & Fox (2012a). Safety engineering for artificial general intelligence.

Machine ethics and robot rights are quickly becoming hot topics in artificial intelligence and robotics communities. We will argue that attempts to attribute moral agency and assign rights to all intelligent machines are misguided, whether applied to infrahuman or superhuman AIs, as are proposals to limit the negative effects of AIs by constraining their behavior. As an alternative, we propose a new science of safety engineering for intelligent artificial agents based on maximizing for what humans value. In particular, we challenge the scientific community to develop intelligent systems that have humanfriendly values that they provably retain, even under recursive self-improvement.

Yampolskiy & Fox (2012b). Artificial general intelligence and the human mental model.

When the first artificial general intelligences are built, they may improve themselves to far-above-human levels. Speculations about such future entities are already affected by anthropomorphic bias, which leads to erroneous analogies with human minds. In this chapter, we apply a goal-oriented understanding of intelligence to show that humanity occupies only a tiny portion of the design space of possible minds. This space is much larger than what we are familiar with from the human example; and the mental architectures and goals of future superintelligences need not have most of the properties of human minds. A new approach to cognitive science and philosophy of mind, one not centered on the human example, is needed to help us understand the challenges which we will face when a power greater than us emerges.

Mar 02

New JCS issue on the Singularity

The new double-issue of Journal of Consciousness Studies focuses on responses to David Chalmers’ 2010 paper on the Singularity, and includes several articles relevant to Friendly AI.


  1. Uziel Awret - Introduction
  2. Susan Blackmore - She Won’t Be Me
  3. Damien Broderick - Terrible Angels: The Singularity and Science Fiction
  4. Barry Dainton - On Singularities and Simulations
  5. Daniel Dennett - The Mystery of David Chalmers
  6. Ben Goertzel - Should Humanity Build a Global AI Nanny to Delay the Singularity Until It’s Better Understood?
  7. Susan Greenfield - The Singularity: Commentary on David Chalmers
  8. Robin Hanson - Meet the New Conflict, Same as the Old Conflict
  9. Francis Heylighen - Brain in a Vat Cannot Break Out
  10. Marcus Hutter - Can Intelligence Explode?
  11. Drew McDermott - Response to ‘The Singularity’ by David Chalmers
  12. Jurgen Schmidhuber - Philosophers & Futurists, Catch Up!
  13. Frank Tipler - Inevitable Existence and Inevitable Goodness of the Singularity
  14. Roman Yampolskiy - Leakproofing the Singularity: Artificial Intelligence Confinement Problem

Feb 24

New paper: ‘Intelligence Explosion: Evidence and Import’

Luke Muehlhauser and Anna Salamon of the Singularity Institute have released a draft version of their forthcoming book chapter “Intelligence Explosion: Evidence and Import.”

It opens:

Humans may create human-level artificial intelligence (AI) this century. Shortly thereafter, we may see an “intelligence explosion” or “technological singularity” — a chain of events by which human-level AI leads, fairly rapidly, to intelligent systems whose capabilities far surpass those of biological humanity as a whole.

How likely is this, and what will the consequences be? Others have discussed these questions previously…; our aim is to provide a brief review suitable both for newcomers to the topic and for those with some familiarity with the topic but expertise in only some of the relevant fields.

Jan 24

Ordinary Ideas

MIT’s Paul Christiano has written many substantive blog posts related to Friendly AI theory on his blog, Ordinary Ideas.

Dec 12

A new website,, provides a quick introduction to the concept of Friendly AI.

Nov 18

The Singularity and Machine Ethics

Luke Muehlhauser and Louie Helm have posted a draft of their forthcoming article The Singularity and Machine Ethics:

Many researchers have argued that a self-improving artificial intelligence (AI) could become so vastly more powerful than humans that we would not be able to stop it from achieving its goals. If so, and if the AI’s goals differ from ours, then this could be disastrous for humans. One proposed solution is to program the AI’s goal system to want what we want before the AI self-improves beyond our capacity to control it. Unfortunately, it is difficult to specify what we want. After a brief digression concerning human intuitions about intelligence, we offer a series of “intuition pumps” in moral philosophy for our conclusion that human values are complex and difficult to specify. We then survey the evidence from the psychology of motivation, moral psychology, and neuroeconomics that supports our position. We conclude by recommending ideal preference theories of value as a promising approach for developing a machine ethics suitable for navigating the Singularity.

Oct 26

Yudkowsky’s Singularity Summit 2011 Talk

Video of Eliezer’s talk for Singularity Summit 2011, entitled “Open Problems in Friendly AI,” is now online. (Slides here.)

The open problems he lists are:

He also notes:

Most things you need to know to build Friendly AI are rigorous understanding of AGI rather than Friendly parts per se – contrary to what people who dislike the problem would have you believe, we don’t spend all our time pondering morality.

Oct 19

New Article on Oracle AI

FHI's Stuart Armstrong, Anders Sandberg, and Nick Bostrom have released a new article on Oracle AI:

There is no strong reason to believe human level intelligence represents an upper limit of the capacity of artificial intelligence, should it be realized. This poses serious safety issues, since a superintelligent system would have great power to direct the future according to its possibly flawed goals or motivation systems. Solving this issue in general has proven to be considerably harder than expected. This paper looks at one particular approach, Oracle AI. An Oracle AI is an AI that does not act in the world except by answering questions. Even this narrow approach presents considerable challenges and we analyse and critique various methods of control. In general this form of limited AI might be safer than unrestricted AI, but still remains potentially dangerous.