One of the most significant AI milestones in history was quietly ushered into being this summer. We speak of the quest for Artificial General Intelligence (AGI), probably the most sought-after goal in the entire field of computer science. With the introduction of the Impala architecture, DeepMind, the company behind AlphaGo and AlphaZero, would seem to finally have AGI firmly in its sights.
Let’s define AGI, since it’s been used by different people to mean different things. AGI is a single intelligence or algorithm that can learn multiple tasks and exhibits positive transfer when doing so, sometimes called meta-learning. During meta-learning, the acquisition of one skill enables the learner to pick up another new skill faster because it applies some of its previous “know-how” to the new task. In other words, one learns how to learn — and can generalize that to acquiring new skills, the way humans do. This has been the holy grail of AI for a long time.
As it currently exists, AI shows little ability to transfer learning towards new tasks. Typically, it must be trained anew from scratch. For instance, the same neural network that makes recommendations to you for a Netflix show cannot use that learning to suddenly start making meaningful grocery recommendations. Even these single-instance “narrow” AIs can be impressive, such as IBM’s Watson or Google’s self-driving car tech. However, these aren’t nearly so much so an artificial general intelligence, which could conceivably unlock the kind of recursive self-improvement variously referred to as the “intelligence explosion” or “singularity.”
Those who thought that day would be sometime in the far distant future would be wise to think again. To be sure, DeepMind has made inroads on this goal before, specifically with their work on Psychlab and Differentiable Neural Computers. However, Impala is their largest and most successful effort to date, showcasing a single algorithm that can learn 30 different challenging tasks requiring various aspects of learning, memory, and navigation.
But enough preamble; let’s look under the hood and see what makes Impala tick. First, Impala’s based on reinforcement learning, an AI technique that has its origins in behaviorism. It parallels the way humans build up an intuition-based skill, such as learning to walk or riding a bicycle. Reinforcement learning has already been used for some amazing achievements, such as endowing an AI with emotions (see video below) and learning complex games like Go and Poker.
However even these reinforcement learning algorithms couldn’t transfer what they’d learned about one task to acquiring a new task. In order to realize this achievement, DeepMind supercharged a reinforcement learning algorithm called A3C. In so-called actor-critic reinforcement learning, of which A3C is one variety, acting and learning are decoupled so that one neural network, the critic, evaluates the other, the actor. Together, they drive the learning process. This was already the state of the art, but DeepMind added a new off-policy correction algorithm called V-trace to the mix, which made the learning more efficient, and crucially, better able to achieve positive transfer between tasks.
To be sure, this doesn’t herald the dawn of “conscious robots” or even ones that have an imagination. While we think of such attributes as hallmarks of intelligence because they apply to humans, this is somewhat misleading. As the AI researcher Shane Legg argues in the video below, things like consciousness and imagination may be traits useful for solving particular kinds of problems, such as coordinating between large numbers of people or exchanging information.
However, a superintelligent algorithm or agent can exist without such attributes. In fact, we would likely be wise to ensure no AI ever does possess consciousness as we know it. That could lead to some awkward questions when it begins to interrogate its human creators on their fascination with Beanie Babies, Hummers, and the Kardashians.