Visual object recognition, speech recognition, machine translation - these are among the “holy grails” of artificial intelligence research. But machines are now at a level that the benchmark performance for these three areas has reached, and even surpassed, human levels. Moreover, in the space of 24 hours, a single program, AlphaZero, became by far the world’s best player in three games - chess, Go, and Shogi - to which it had no prior exposure.


These developments have provoked some alarmist reporting in the media, invariably accompanied by pictures of Terminator robots, but predictions of imminent superhuman AI are almost certainly wrong - we’re still several conceptual breakthroughs away.

On the other hand, massive investments in AI research, several hundred billion pounds over the next decade, suggest further rapid advances are not far away. Predictions that superhuman AI is totally impossible are unwise and lack any technical foundation.

So what happens if we succeed in making this superhuman artificial intelligence? The obvious answer is that creating entities more intelligent than ourselves could lead to a loss of human control. For example, here’s how British computer scientist Alan Turing saw it during a 1951 lecture on BBC Radio 3:

  • If a machine can think, it might think more intelligently than we do, and then where should we be? Even if we could keep the machines in a subservient position, for instance by turning off the power at strategic moments, we should, as a species, feel greatly humbled.

How did we come to be investing billions in a technological discipline that, if it continues in the current direction, may lead to catastrophe? The two basic steps in setting up the field of AI went (very roughly) like this:

  • 1. We identified a reasonable notion of intelligence in humans: Humans are intelligent to the extent that our actions can be expected to achieve our objectives
  • 2. We transferred this notion directly to machines: Machines are intelligent to the extent that their actions can be expected to achieve their objectives

Because machines, unlike humans, have no objectives of their own, we gave them objectives to achieve. The same basic scheme underlies classical economics, control theory, statistics, management science, and operations research. Although this scheme is widespread and extremely powerful, we don’t want machines that are intelligent in this sense.

More like this

The basic drawback is already well known. For example, the legend of King Midas involves a mis-specified objective (“Let everything I touch turn to gold”) that leads to catastrophe. As American mathematician and philosopher Norbert Wiener put it in a 1960 paper published in Science:

  • If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively … we had better be quite sure that the purpose put into the machine is the purpose which we really desire.

As an example of a poorly defined purpose, consider the fact that relatively simple AI algorithms designed to maximize social media clicks have influenced the political views of hundreds of millions of people toward extreme positions.

As Wiener also noted, poorly defined purposes weren’t a major risk as long as machines were stupid. We could always reset the machine and try again. When a machine is more capable than the human designers, this option probably won’t be available.

Looking again at the definition of machine intelligence (2, above), we have no reliable way to make sure that their objectives are the same as our objectives. So let’s try this instead:

  • 3. Machines are beneficial to the extent that their actions can be expected to achieve our objectives

This is probably what we should have aimed for all along. The difficult part, of course, is that our objectives are in us, and not in them.

It turns out, however, that it is possible to define a mathematical framework leading to machines that are provably beneficial in this sense. That is, we define a formal problem for machines to solve, and, if they solve it, they are guaranteed to be beneficial to us. In its simplest form, it goes like this:

  • The world contains a human and a machine.
  • The human has preferences about the future and acts (roughly) in accordance with them.
  • The machine’s objective is to optimise for those preferences.
  • The machine is explicitly uncertain as to what they are.

Economists reading this will recognise this kind of problem as falling under the heading of game theory, which studies how to make decisions when more than one decision-making entity is involved.

Here, the entities are the human and the machine, and they are closely coupled to each other because firstly the human’s actions provide information to the machine about what the human wants, and secondly the machine’s actions will become more beneficial to the human as the machine’s understanding of human preferences improves.

Optimal machine strategies in this game turn out to be deferential to humans; for example, machines are motivated to ask permission, to allow themselves to be switched off, and to act cautiously when guidance is unclear. Moreover, humans in this framework are motivated to act instructively - to (try to) teach their preferences to machines.

While these initial results are for a simplified and idealised setting, already my colleagues have successfully applied the same approach to realistic problems, such as self-driving cars interacting with human drivers.

There are two primary sources of difficulty that we are working on right now: satisfying the preferences of many humans and understanding the preferences of real humans.

Many humans

Machines making decisions on behalf of multiple humans face a problem that is closely related to one of the basic problems of moral philosophy: how should a moral person (or a moral government) act? These are not identical situations, because the machine making the decision has no preferences of its own, but the problems remain very similar.

Utilitarianism, an idea widely popularised by the philosophers Jeremy Bentham and John Stuart Mill in the 18th and 19th Centuries, and refined by many thinkers since then, proposes a plausible answer: “the greatest happiness of the greatest number.”

The rise of the conscious machines: how far should we take AI? © Andy Potts

With machines making decisions, however, we have to be very careful of loopholes. For example, the 20th Century philosopher Robert Nozick pointed out in his book Anarchy, State, and Utopia, that utilitarianism can fall victim to “utility monsters”, whose happinesses and miseries, being on a far greater scale than those of ordinary humans, induce the moral decision maker to allocate a grossly unequal share of resources to the monster. Even if real utility monsters don’t exist, utilitarian allocation policies can cause people to pretend to be monsters to get a larger share.

Negatively altruistic human preferences such as “sadism, envy, resentment, and malice” also cause difficulties; should machines simply ignore them? Finally, would anyone actually buy a truly altruistic, utilitarian machine? Here is how it might go when Harriet the human comes home from work, to be greeted by Robbie the altruistic robot:

Robbie: Welcome home! Long day?

Harriet: Yes, worked really hard, not even time for lunch.

Robbie: So you must be quite hungry!

Harriet: Starving! Can you make me some dinner?

Robbie: There’s something I need to tell you …

Robbie: There are humans in Somalia in more urgent need of help. I am leaving now. Please make your own dinner.

While Harriet might be quite proud of Robbie, she cannot help but wonder why she shelled out a small fortune to buy a robot whose first significant act is to disappear.

Real Humans

Machines will need to “invert” actual human behaviour to learn the underlying preferences that drive it. For example, chess players make mistakes because of mental limitations; a machine observing such a mistake should not conclude that the player prefers to lose the game. Instead, the machine must have some idea of how humans actually make decisions; and on this question, we are mostly in the dark at present.

The complex nature of human cognition raises a further question: is it possible to consistently recognise the preferences of a sometimes non-rational entity? For example, Daniel Kahneman, psychologist and author of the best-selling book Thinking, Fast and Slow argues that we have an experiencing self and a remembering self who disagree about the desirability of any given experience. Which one should the machine serve?

Finally, it is essential to consider the plasticity of human preferences, which obviously evolve over time as we grow older, gain more experience, and fall under the sway of social influences. The social-media click-through example shows how rapidly machines can modify human preferences. We have to find ways to prevent machines from satisfying human preferences by modifying those preferences to fit the status quo.

If we solve these problems, we may enjoy a new and beneficial relationship between humans and machines. Will we be out of the woods? Not quite. We still face two other major problems.

The first is misuse - how can we be sure that malevolent human actors won’t deploy powerful and dangerous forms of AI for their own ends? We have not had much success in containing malware so far, and this would be far worse.

The second major problem is overuse - how do we avoid the gradual enfeeblement of humanity as machines take over more and more of the running of our civilization? Well-designed machines should insist that humans retain control and responsibility for their own wellbeing, but short-sighted humans may disagree.

Professor Stuart Russell will be in dialogue with Richard Sargeant at Westminster Abbey on November 20th as part of Westminster Abbey Institute's Autumn Programme Embracing Global Challenges.

To book (free) go to


Follow Science Focus on Twitter, Facebook, Instagram and Flipboard