Friendly AI Research
Yudkowsky’s Singularity Summit 2011 Talk

Video of Eliezer’s talk for Singularity Summit 2011, entitled “Open Problems in Friendly AI,” is now online. (Slides here.)

The open problems he lists are:

  • Describe a general decision system that can completely rewrite itself without decreasing the strength of its proof system each time.
  • Prove blackmail-free equilibrium among timeless strategists.
  • Avoid proving contradiction inside Q’s counterfactual.
  • Better formalize hybrid of causal and mathematical inference.
  • Fair division by continuous / multiparty agents (required for EU agents to divide a benefit).
  • Theory of logical uncertainty in temporal bounded agents. If part of you assigns 60% probability to P and part of you assigns 60% probability to ~P it requires a specific operation to notice the contradiction. It’s okay to be outperformed  by a smarter agent who noticed first, it’s not okay to assign 20% probability to everything being true after you notice.
  • Making hypercomputation conceivable – extension of Solomonoff induction to anthropic reasoning and higher-order logic – why ideal rational agents still seem to need anthropic assumptions.
  • AIXI’s reward button will kill you – challenge of extending AIXI to non-Cartesian embedding and a utility function over environments with known ontologies.
  • Shifting ontologies – general problem of expressing resolvable uncertainty in utility functions.
  • How do you construe a utility function from a psychologically realistic detailed model of a human’s decision process?  May end up being 90% morality and 10% math, or what we really want may be formalish statements of desiderata for how to teach a young AI this at the same time as it’s learning about humans.  But worth throwing out there for any ethical philosophers who can understand the difference between computable and non-constructive specifications, on the off-chance that it’s an interesting enough problem that some of them will help save the world.
  • Microeconomic models of self-improving systems – it would be helpful if we could get any further information about how fast self-improving AIs go FOOM, or more powerful/formal arguments to convince anyone open to math that they do go FOOM, for all non-contrived curves of cumulative optimization pressure vs. optimization output that fit human evolution & economics to date.

He also notes:

Most things you need to know to build Friendly AI are rigorous understanding of AGI rather than Friendly parts per se – contrary to what people who dislike the problem would have you believe, we don’t spend all our time pondering morality.