This tumblr is no longer being updated. Please check out the blog of Machine Intelligence Research Institute instead.
In a new article, Louie Helm (SI) outlines textbooks, university and online courses one should study in order to work on Friendliness theory:
Course recommendations for Friendliness researchers
When I first learned about Friendly AI, I assumed it was mostly a programming problem. As it turns out, it’s actually mostly a math problem. That’s because most of the theory behind self-reference, decision theory, and general AI techniques haven’t been formalized and solved yet. Thus, when people ask me what they should study in order to work on Friendliness theory, I say “Go study math and theoretical computer science.”
But that’s not specific enough. Should aspiring Friendliness researchers study continuous or discrete math? Imperative or functional programming? Topology? Linear algebra? Ring theory?
I do, in fact, have specific recommendations for which subjects Friendliness researchers should study. And so I worked with a few of my best interns at Singularity Institute to provide recommendations below:
22 course recommendations are given, with the links to online courses and a few universities.
New paper from Stuart Armstrong (FHI) and Kaj Sotala (SI) is published as part of the Beyond AI conference proceedings:
How We’re Predicting AI – or Failing to
This paper will look at the various predictions that have been made about AI and propose decomposition schemas for analysing them. It will propose a variety of theoretical tools for analysing, judging and improving these predictions. Focusing speciﬁcally on timeline predictions (dates given by which we should expect the creation of AI), it will show that there are strong theoretical grounds to expect predictions to be quite poor in this area. Using a database of 95 AI timeline predictions, it will show that these expectations are born out in practice: expert predictions contradict each other considerably, and are indistinguishable from non-expert predictions and past failed predictions. Predictions that AI lie 15 to 25 years in the future are the most common, from experts and non-experts alike.
Avoiding Unintended AI Behaviors
Artificial intelligence (AI) systems too complex for predefined environment models and actions will need to learn environment models and to choose actions that optimize some criteria. Several authors have described mechanisms by which such complex systems may behave in ways not intended in their designs. This paper describes ways to avoid such unintended behavior. For hypothesized powerful AI systems that may pose a threat to humans, this paper proposes a two-stage agent architecture that avoids some known types of unintended behavior. For the first stage of the architecture this paper shows that the most probable finite stochastic program to model a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions.
Decision Support for Safe AI Design
There is considerable interest in ethical designs for artificial intelligence (AI) that do not pose risks to humans. This paper proposes using elements of Hutter’s agent-environment framework to define a decision support system for simulating, visualizing and analyzing AI designs to understand their consequences. The simulations do not have to be accurate predictions of the future; rather they show the futures that an agent design predicts will fulfill its motivations and that can be explored by AI designers to find risks to humans. In order to safely create a simulation model this paper shows that the most probable finite stochastic program to explain a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions. It also discusses the risks of running an AI in a simulated environment.
For the purposes of this bibliography, AI risk is defined as the risk of AI-related events that could end human civilization.
This bibliography contains 90 entries. Generally, only sources with an extended analysis of AI risk are included, though there are some exceptions among the earliest sources. Listed sources discuss either the likelihood of AI risk or they discuss possible solutions.
An articled titled The Singularity: A Reply by David J. Chalmers is forthcoming in Journal of Consciousness Studies.
I would like to thank the authors of the 26 contributions to this symposium on my article “The Singularity: A Philosophical Analysis”. I learned a great deal from the reading their commentaries. Some of the commentaries engaged my article in detail, while others developed ideas about the singularity in other directions. In this reply I will concentrate mainly on those in the first group, with occasional comments on those in the second.
New paper from Ben Goertzel and Joel Pitt is published in Journal of Evolution and Technology: Nine Ways to Bias Open-Source AGI Toward Friendliness
While it seems unlikely that any method of guaranteeing human-friendliness (“Friendliness”) on the part of advanced Artificial General Intelligence (AGI) systems will be possible, this doesn’t mean the only alternatives are throttling AGI development to safeguard humanity, or plunging recklessly into the complete unknown. Without denying the presence of a certain irreducible uncertainty in such matters, it is still sensible to explore ways of biasing the odds in a favorable way, such that newly created AI systems are significantly more likely than not to be Friendly. Several potential methods of effecting such biasing are explored here, with a particular but non-exclusive focus on those that are relevant to open-source AGI projects, and with illustrative examples drawn from the OpenCog open-source AGI project. Issues regarding the relative safety of open versus closed approaches to AGI are discussed and then nine techniques for biasing AGIs in favor of Friendliness are presented:
In conclusion, and related to the final point, we advise the serious co-evolution of functional AGI systems and AGI-related ethical theory as soon as possible, before we have so much technical infrastructure that parties relatively unconcerned with ethics are able to rush ahead with brute force approaches to AGI development.
- Engineer the capability to acquire integrated ethical knowledge.
- Provide rich ethical interaction and instruction, respecting developmental stages.
- Develop stable, hierarchical goal systems.
- Ensure that the early stages of recursive self-improvement occur relatively slowly and with rich human involvement.
- Tightly link AGI with the Global Brain.
- Foster deep, consensus-building interactions between divergent viewpoints.
- Create a mutually supportive community of AGIs.
- Encourage measured co-advancement of AGI software and AGI ethics theory.
- Develop advanced AGI sooner not later.
Kaj Sotala has two papers forthcoming in the International Journal of Machine Consciousness.
I survey four categories of factors that might give a digital mind, such as an upload or an artifcial general intelligence, an advantage over humans. Hardware advantages include greater serial speeds and greater parallel speeds. Self-improvement advantages include improvement of algorithms, design of new mental modules, and modification of motivational system. Co-operative advantages include copyability, perfect co-operation, improved communication, and transfer of skills. Human handicaps include computational limitations and faulty heuristics, human-centric biases, and socially motivated cognition. The shape of hardware growth curves, as well as the ease of modifying minds, are found to have a major impact on how quickly a digital mind may take advantage of these factors.
We present a hypothetical process of mind coalescence, where artificial connections are created between two or more brains. This might simply allow for an improved form of communication. At the other extreme, it might merge the minds into one in a process that can be thought of as a reverse split-brain operation. We propose that one way mind coalescence might happen is via an exocortex, a prosthetic extension of the biological brain which integrates with the brain as seamlessly as parts of the biological brain integrate with each other. An exocortex may also prove to be the easiest route for mind uploading, as a person’s personality gradually moves away from the aging biological brain and onto the exocortex. Memories might also be copied and shared even without minds being permanently merged. Over time, the borders of personal identity may become loose or even unnecessary.
It’s “just” a blog post, but it’s a fairly significant one: Indirect Normativity by Paul Christiano.