Friendly AI Research

Month

January 2013

1 post

Helm: Course recommendations for Friendliness researchers

In a new article, Louie Helm (SI) outlines textbooks, university and online courses one should study in order to work on Friendliness theory:

Course recommendations for Friendliness researchers

When I first learned about Friendly AI, I assumed it was mostly a programming problem. As it turns out, it’s actually mostly a math problem. That’s because most of the theory behind self-reference, decision theory, and general AI techniques haven’t been formalized and solved yet. Thus, when people ask me what they should study in order to work on Friendliness theory, I say “Go study math and theoretical computer science.”

But that’s not specific enough. Should aspiring Friendliness researchers study continuous or discrete math? Imperative or functional programming? Topology? Linear algebra? Ring theory?

I do, in fact, have specific recommendations for which subjects Friendliness researchers should study. And so I worked with a few of my best interns at Singularity Institute to provide recommendations below:

22 course recommendations are given, with the links to online courses and a few universities.

Jan 12, 20132 notes

November 2012

1 post

Armstrong and Sotala, “How We’re Predicting AI – or Failing to”

New paper from Stuart Armstrong (FHI) and Kaj Sotala (SI) is published as part of the Beyond AI conference proceedings:

How We’re Predicting AI – or Failing to

This paper will look at the various predictions that have been made about AI and propose decomposition schemas for analysing them. It will propose a variety of theoretical tools for analysing, judging and improving these predictions. Focusing specifically on timeline predictions (dates given by which we should expect the creation of AI), it will show that there are strong theoretical grounds to expect predictions to be quite poor in this area. Using a database of 95 AI timeline predictions, it will show that these expectations are born out in practice: expert predictions contradict each other considerably, and are indistinguishable from non-expert predictions and past failed predictions. Predictions that AI lie 15 to 25 years in the future are the most common, from experts and non-experts alike.

Nov 19, 2012

October 2012

1 post

Two new papers from Bill Hibbard (from AGI-12)
Two new papers by Bill Hibbard presented at The Fifth Conference on Artificial General Intelligence (AGI-12).

Avoiding Unintended AI Behaviors

Artificial intelligence (AI) systems too complex for predefined environment models and actions will need to learn environment models and to choose actions that optimize some criteria. Several authors have described mechanisms by which such complex systems may behave in ways not intended in their designs. This paper describes ways to avoid such unintended behavior. For hypothesized powerful AI systems that may pose a threat to humans, this paper proposes a two-stage agent architecture that avoids some known types of unintended behavior. For the first stage of the architecture this paper shows that the most probable finite stochastic program to model a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions.

Decision Support for Safe AI Design

There is considerable interest in ethical designs for artificial intelligence (AI) that do not pose risks to humans. This paper proposes using elements of Hutter’s agent-environment framework to define a decision support system for simulating, visualizing and analyzing AI designs to understand their consequences. The simulations do not have to be accurate predictions of the future; rather they show the futures that an agent design predicts will fulfill its motivations and that can be explored by AI designers to find risks to humans. In order to safely create a simulation model this paper shows that the most probable finite stochastic program to explain a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions. It also discusses the risks of running an AI in a simulated environment.

Oct 9, 2012

July 2012

2 posts

New JCS issue on the Singularity and uploading
The new double-issue of the Journal of Consciousness Studies (Volume 19, Numbers 7-8) focuses on the Singularity and mind uploading.

Contents:

  1. Igor Aleksander - Design and the Singularity: The Philosopher’s Stone of AI?
  2. Selmer Bringsjord - Belief in the Singularity is Logically Brittle
  3. Richard Brown- Zombies and Simulation
  4. Joseph Corabi, S. Schneider - Metaphysics of Uploading
  5. Ray Kurzweil - Science versus Philosophy in the Singularity
  6. Pamela McCorduck - A Response to ‘The Singularity’
  7. Chris Nunn - More Splodge than Singularity?
  8. Arkady Plotnitsky - The Singularity Wager A Response to David Chalmers
  9. Jesse Prinz - Singularity and Inevitable Doom
  10. Murray Shanahan - Satori Before Singularity
  11. Carl Shulman, Nick Bostrom - How Hard is Artificial Intelligence? Evolutionary Arguments and Selection Effects
  12. Eric Steinhart - The Singularity Beyond Philosophy of Mind
  13. Burton Voorhees - Parsing the Singularity
  14. David Chalmers - The Singularity: A Reply to Commentators
  15. Federico Langer - Mental Imagery, Emotion, and ‘Literary Task Sets’ Clues Towards a Literary Neuroart
  16. Kieron O’Connor, F. Aardema - Living in a Bubble Dissociation, Relational Consciousness, and Obsessive Compulsive Disorder
  17. Keith E. Turausky - A Thousand Flowers: Tucson in Bloom
Jul 22, 20121 note
Muehlhauser, AI risk bibliography 2012
Luke Muehlhauser has published AI risk bibliography 2012, an up-to-date chronological list of papers discussing AI risk.

For the purposes of this bibliography, AI risk is defined as the risk of AI-related events that could end human civilization.

This bibliography contains 90 entries. Generally, only sources with an extended analysis of AI risk are included, though there are some exceptions among the earliest sources. Listed sources discuss either the likelihood of AI risk or they discuss possible solutions.

Jul 22, 2012

May 2012

3 posts

Chalmers, “The Singularity: A Reply”

An articled titled The Singularity: A Reply by David J. Chalmers is forthcoming in Journal of Consciousness Studies.

I would like to thank the authors of the 26 contributions to this symposium on my article “The Singularity: A Philosophical Analysis”. I learned a great deal from the reading their commentaries. Some of the commentaries engaged my article in detail, while others developed ideas about the singularity in other directions. In this reply I will concentrate mainly on those in the first group, with occasional comments on those in the second.

May 24, 2012
Goertzel and Pitt, “Nine Ways to Bias Open-Source AGI Toward Friendliness”

New paper from Ben Goertzel and Joel Pitt is published in Journal of Evolution and Technology: Nine Ways to Bias Open-Source AGI Toward Friendliness

While it seems unlikely that any method of guaranteeing human-friendliness (“Friendliness”) on the part of advanced Artificial General Intelligence (AGI) systems will be possible, this doesn’t mean the only alternatives are throttling AGI development to safeguard humanity, or plunging recklessly into the complete unknown. Without denying the presence of a certain irreducible uncertainty in such matters, it is still sensible to explore ways of biasing the odds in a favorable way, such that newly created AI systems are significantly more likely than not to be Friendly. Several potential methods of effecting such biasing are explored here, with a particular but non-exclusive focus on those that are relevant to open-source AGI projects, and with illustrative examples drawn from the OpenCog open-source AGI project. Issues regarding the relative safety of open versus closed approaches to AGI are discussed and then nine techniques for biasing AGIs in favor of Friendliness are presented:

  1. Engineer the capability to acquire integrated ethical knowledge.
  2. Provide rich ethical interaction and instruction, respecting developmental stages.
  3. Develop stable, hierarchical goal systems.
  4. Ensure that the early stages of recursive self-improvement occur relatively slowly and with rich human involvement.
  5. Tightly link AGI with the Global Brain.
  6. Foster deep, consensus-building interactions between divergent viewpoints.
  7. Create a mutually supportive community of AGIs.
  8. Encourage measured co-advancement of AGI software and AGI ethics theory.
  9. Develop advanced AGI sooner not later.
In conclusion, and related to the final point, we advise the serious co-evolution of functional AGI systems and AGI-related ethical theory as soon as possible, before we have so much technical infrastructure that parties relatively unconcerned with ethics are able to rush ahead with brute force approaches to AGI development.

May 24, 2012
Two new Sotala papers

Kaj Sotala has two papers forthcoming in the International Journal of Machine Consciousness.


Advantages of artificial intelligences, uploads and digital minds:

I survey four categories of factors that might give a digital mind, such as an upload or an artifcial general intelligence, an advantage over humans. Hardware advantages include greater serial speeds and greater parallel speeds. Self-improvement advantages include improvement of algorithms, design of new mental modules, and modification of motivational system. Co-operative advantages include copyability, perfect co-operation, improved communication, and transfer of skills. Human handicaps include computational limitations and faulty heuristics, human-centric biases, and socially motivated cognition. The shape of hardware growth curves, as well as the ease of modifying minds, are found to have a major impact on how quickly a digital mind may take advantage of these factors.

Coalescing minds: brain uploading-related group mind scenarios:

We present a hypothetical process of mind coalescence, where artificial connections are created between two or more brains. This might simply allow for an improved form of communication. At the other extreme, it might merge the minds into one in a process that can be thought of as a reverse split-brain operation. We propose that one way mind coalescence might happen is via an exocortex, a prosthetic extension of the biological brain which integrates with the brain as seamlessly as parts of the biological brain integrate with each other. An exocortex may also prove to be the easiest route for mind uploading, as a person’s personality gradually moves away from the aging biological brain and onto the exocortex. Memories might also be copied and shared even without minds being permanently merged. Over time, the borders of personal identity may become loose or even unnecessary.

May 6, 2012

April 2012

1 post

Christiano, "Indirect Normativity"

It’s “just” a blog post, but it’s a fairly significant one: Indirect Normativity by Paul Christiano.

Apr 25, 2012

March 2012

4 posts

Detecting superintelligence

Roman Yampolskiy, AI-Complete CAPTCHAs as Zero Knowledge Proofs of Access to an Artificially Intelligent System:

Experts predict that in the next 10 to 100 years scientists will succeed in creating human-level artificial general intelligence. While it is most likely that this task will be accomplished by a government agency or a large corporation, the possibility remains that it will be done by a single inventor or a small team of researchers. In this paper, we address the question of safeguarding a discovery which could without hesitation be said to be worth trillions of dollars. Specifically, we propose a method based on the combination of zero knowledge proofs and provably AI-complete CAPTCHA problems to show that a superintelligent system has been constructed without having to reveal the system itself.

Mar 24, 2012
Bostrom, 'The Superintelligent Will'

Bostrom (2012). The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents.

This paper discusses the relation between intelligence and motivation in artificial agents, developing and briefly arguing for two theses.  The first, the orthogonality thesis, holds (with some caveats) that intelligence and final goals (purposes) are orthogonal axes along which possible artificial intellects can freely vary—more or less any level of intelligence could be combined with more or less any final goal.  The second, the instrumental convergence thesis, holds that as long as they possess a sufficient level of intelligence, agents having any of a wide range of final goals will pursue similar intermediary goals because they have instrumental reasons to do so. In combination, the two theses help us understand the possible range of behavior of superintelligent agents, and they point to some potential dangers in building such an agent.

Mar 18, 20129 notes
Two new papers from Yampolskiy & Fox

Yampolskiy & Fox (2012a). Safety engineering for artificial general intelligence.

Machine ethics and robot rights are quickly becoming hot topics in artificial intelligence and robotics communities. We will argue that attempts to attribute moral agency and assign rights to all intelligent machines are misguided, whether applied to infrahuman or superhuman AIs, as are proposals to limit the negative effects of AIs by constraining their behavior. As an alternative, we propose a new science of safety engineering for intelligent artificial agents based on maximizing for what humans value. In particular, we challenge the scientific community to develop intelligent systems that have humanfriendly values that they provably retain, even under recursive self-improvement.

Yampolskiy & Fox (2012b). Artificial general intelligence and the human mental model.

When the first artificial general intelligences are built, they may improve themselves to far-above-human levels. Speculations about such future entities are already affected by anthropomorphic bias, which leads to erroneous analogies with human minds. In this chapter, we apply a goal-oriented understanding of intelligence to show that humanity occupies only a tiny portion of the design space of possible minds. This space is much larger than what we are familiar with from the human example; and the mental architectures and goals of future superintelligences need not have most of the properties of human minds. A new approach to cognitive science and philosophy of mind, one not centered on the human example, is needed to help us understand the challenges which we will face when a power greater than us emerges.

Mar 17, 20121 note
New JCS issue on the Singularity

The new double-issue of Journal of Consciousness Studies focuses on responses to David Chalmers’ 2010 paper on the Singularity, and includes several articles relevant to Friendly AI.

Contents:

  1. Uziel Awret - Introduction
  2. Susan Blackmore - She Won’t Be Me
  3. Damien Broderick - Terrible Angels: The Singularity and Science Fiction
  4. Barry Dainton - On Singularities and Simulations
  5. Daniel Dennett - The Mystery of David Chalmers
  6. Ben Goertzel - Should Humanity Build a Global AI Nanny to Delay the Singularity Until It’s Better Understood?
  7. Susan Greenfield - The Singularity: Commentary on David Chalmers
  8. Robin Hanson - Meet the New Conflict, Same as the Old Conflict
  9. Francis Heylighen - Brain in a Vat Cannot Break Out
  10. Marcus Hutter - Can Intelligence Explode?
  11. Drew McDermott - Response to ‘The Singularity’ by David Chalmers
  12. Jurgen Schmidhuber - Philosophers & Futurists, Catch Up!
  13. Frank Tipler - Inevitable Existence and Inevitable Goodness of the Singularity
  14. Roman Yampolskiy - Leakproofing the Singularity: Artificial Intelligence Confinement Problem
Mar 2, 20121 note

February 2012

1 post

New paper: 'Intelligence Explosion: Evidence and Import'

Luke Muehlhauser and Anna Salamon of the Singularity Institute have released a draft version of their forthcoming book chapter “Intelligence Explosion: Evidence and Import.”

It opens:

Humans may create human-level artificial intelligence (AI) this century. Shortly thereafter, we may see an “intelligence explosion” or “technological singularity” — a chain of events by which human-level AI leads, fairly rapidly, to intelligent systems whose capabilities far surpass those of biological humanity as a whole.

How likely is this, and what will the consequences be? Others have discussed these questions previously…; our aim is to provide a brief review suitable both for newcomers to the topic and for those with some familiarity with the topic but expertise in only some of the relevant fields.

Feb 24, 2012

January 2012

1 post

Ordinary Ideas

MIT’s Paul Christiano has written many substantive blog posts related to Friendly AI theory on his blog, Ordinary Ideas.

Jan 24, 2012

December 2011

1 post

Friendly-AI.com

A new website, Friendly-AI.com, provides a quick introduction to the concept of Friendly AI.

Dec 12, 2011

November 2011

1 post

The Singularity and Machine Ethics

Luke Muehlhauser and Louie Helm have posted a draft of their forthcoming article The Singularity and Machine Ethics:

Many researchers have argued that a self-improving artificial intelligence (AI) could become so vastly more powerful than humans that we would not be able to stop it from achieving its goals. If so, and if the AI’s goals differ from ours, then this could be disastrous for humans. One proposed solution is to program the AI’s goal system to want what we want before the AI self-improves beyond our capacity to control it. Unfortunately, it is difficult to specify what we want. After a brief digression concerning human intuitions about intelligence, we offer a series of “intuition pumps” in moral philosophy for our conclusion that human values are complex and difficult to specify. We then survey the evidence from the psychology of motivation, moral psychology, and neuroeconomics that supports our position. We conclude by recommending ideal preference theories of value as a promising approach for developing a machine ethics suitable for navigating the Singularity.

Nov 18, 2011

October 2011

3 posts

Yudkowsky's Singularity Summit 2011 Talk

Video of Eliezer’s talk for Singularity Summit 2011, entitled “Open Problems in Friendly AI,” is now online. (Slides here.)

The open problems he lists are:

  • Describe a general decision system that can completely rewrite itself without decreasing the strength of its proof system each time.
  • Prove blackmail-free equilibrium among timeless strategists.
  • Avoid proving contradiction inside Q’s counterfactual.
  • Better formalize hybrid of causal and mathematical inference.
  • Fair division by continuous / multiparty agents (required for EU agents to divide a benefit).
  • Theory of logical uncertainty in temporal bounded agents. If part of you assigns 60% probability to P and part of you assigns 60% probability to ~P it requires a specific operation to notice the contradiction. It’s okay to be outperformed  by a smarter agent who noticed first, it’s not okay to assign 20% probability to everything being true after you notice.
  • Making hypercomputation conceivable – extension of Solomonoff induction to anthropic reasoning and higher-order logic – why ideal rational agents still seem to need anthropic assumptions.
  • AIXI’s reward button will kill you – challenge of extending AIXI to non-Cartesian embedding and a utility function over environments with known ontologies.
  • Shifting ontologies – general problem of expressing resolvable uncertainty in utility functions.
  • How do you construe a utility function from a psychologically realistic detailed model of a human’s decision process?  May end up being 90% morality and 10% math, or what we really want may be formalish statements of desiderata for how to teach a young AI this at the same time as it’s learning about humans.  But worth throwing out there for any ethical philosophers who can understand the difference between computable and non-constructive specifications, on the off-chance that it’s an interesting enough problem that some of them will help save the world.
  • Microeconomic models of self-improving systems – it would be helpful if we could get any further information about how fast self-improving AIs go FOOM, or more powerful/formal arguments to convince anyone open to math that they do go FOOM, for all non-contrived curves of cumulative optimization pressure vs. optimization output that fit human evolution & economics to date.

He also notes:

Most things you need to know to build Friendly AI are rigorous understanding of AGI rather than Friendly parts per se – contrary to what people who dislike the problem would have you believe, we don’t spend all our time pondering morality.

Oct 26, 2011
New Article on Oracle AI

FHI’s Stuart Armstrong, Anders Sandberg, and Nick Bostrom have released a new article on Oracle AI:

There is no strong reason to believe human level intelligence represents an upper limit of the capacity of artificial intelligence, should it be realized. This poses serious safety issues, since a superintelligent system would have great power to direct the future according to its possibly flawed goals or motivation systems. Solving this issue in general has proven to be considerably harder than expected. This paper looks at one particular approach, Oracle AI. An Oracle AI is an AI that does not act in the world except by answering questions. Even this narrow approach presents considerable challenges and we analyse and critique various methods of control. In general this form of limited AI might be safer than unrestricted AI, but still remains potentially dangerous.  

Oct 19, 20111 note
New Omohundro Article

Steve Omohundro has posted an early copy of the article he has submitted to Springer’s The Singularity Hypothesis, titled “Rationally-Shaped Artificial Intelligence.” Abstract:

Systems with the computational power of the human brain are likely to be cheap and ubiquitous within the next few decades. As technology becomes more intelligent, we need to ensure that it remains safe and beneficial. This paper describes a rational framework for analyzing intelligent systems and a strategy for developing them safely. The analysis is based on von Neumann’s model of rational economic behavior. We introduce the “Rationally-Shaped Minds” model of intelligent systems with bounded computation. We show that as computational resources increase, there is a natural progression through stimulus-response systems, learning systems, reasoning systems, self-improving systems, to fully rational systems. We show that rational systems are subject to “drives” toward self-protection, resource acquisition, replication, goal preservation, efficiency, and self-improvement. Several of these drives are anti-social and need to be counteracted with analogs of human cooperativeness and compassion. We analyze the three basic strategies for controlling the behavior of intelligent systems. We describe the “Safe-AI Scaffolding” strategy which builds intentionally limited but safe systems to use in the construction of more powerful systems.

The piece builds on his earlier work, “The Nature of Self-Improving Artificial Intelligence” (2007) and “The Basic AI Drives” (2008). The latter was cited in the latest edition of Russell and Norvig’s famous AI textbook.

Oct 8, 2011
Next page →
2012 2013
  • January 1
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November
  • December
2011 2012 2013
  • January 1
  • February 1
  • March 4
  • April 1
  • May 3
  • June
  • July 2
  • August
  • September
  • October 1
  • November 1
  • December
2011 2012
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September 3
  • October 3
  • November 1
  • December 1