As Michele Farisco has pointed out on this blog, artificial intelligence (AI) often serves as a projection screen for our self-images as human beings. Sometimes also as a projection screen for our images of good and evil, as you will soon see.

In AI and robotics, autonomy is often sought in the sense that the artificial intelligence should be able to perform its tasks optimally without human guidance. Like a self-driving car, which safely takes you to your destination without you having to steer, accelerate or brake. Another form of autonomy that is often sought is that artificial intelligence should be self-learning and thus be able to improve itself and become more powerful without human guidance.

Philosophers have discussed whether AI can be autonomous even in another sense, which is associated with human reason. According to this picture, we can as autonomous human beings examine our final goals in life and revise them if we deem that new knowledge about the world motivates it. Some philosophers believe that AI cannot do this, because the final goal, or utility function, would make it irrational to change the goal. The goal is fixed. The idea of such stubbornly goal-oriented AI can evoke worrying images of evil AI running amok among us. But the idea can also evoke reassuring images of good AI that reliably supports us.

Worried philosophers have imagined an AI that has the ultimate goal of making ordinary paper clips. This AI is assumed to be self-improving. It is therefore becoming increasingly intelligent and powerful when it comes to its goal of manufacturing paper clips. When the raw materials run out, it learns new ways to turn the earth’s resources into paper clips, and when humans try to prevent it from destroying the planet, it learns to destroy humanity. When the planet is wiped out, it travels into space and turns the universe into paper clips.

Philosophers who issue warnings about “evil” super-intelligent AI also express hopes for “good” super-intelligent AI. Suppose we could give self-improving AI the goal of serving humanity. Without getting tired, it would develop increasingly intelligent and powerful ways of serving us, until the end of time. Unlike the god of religion, this artificial superintelligence would hear our prayers and take ever-smarter action to help us. It would probably sooner or later learn to prevent earthquakes and our climate problems would soon be gone. No theodicy in the world could undermine our faith in this artificial god, whose power to protect us from evil is ever-increasing. Of course, it is unclear how the goal of serving humanity can be defined. But given the opportunity to finally secure the future of humanity, some hopeful philosophers believe that the development of human-friendly self-improving AI should be one of the most essential tasks of our time.

I read all this in a well-written article by Wolfhart Totschnig, who questions the rigid goal orientation associated with autonomous AI in the scenarios above. His most important point is that rigidly goal-oriented AI, which runs amok in the universe or saves humanity from every predicament, is not even conceivable. Outside its domain, the goal loses its meaning. The goal of a self-driving car to safely take the user to the destination has no meaning outside the domain of road traffic. Domain-specific AI can therefore not be generalized to the world as a whole, because the utility function loses its meaning outside the domain, long before the universe is turned into paper clips or the future of humanity is secured by an artificially good god.

This is, of course, an important philosophical point about goals and meaning, about specific domains and the world as a whole. The critique helps us to more realistically assess the risks and opportunities of future AI, without being bewitched by our images. At the same time, I get the impression that Totschnig continues to use AI as a projection screen for human self-images. He argues that future AI may well revise its ultimate goals as it develops a general understanding of the world. The weakness of the above scenarios was that they projected today’s domain-specific AI, not the general intelligence of humans. We then do not see the possibility of a genuinely human-like AI that self-critically reconsiders its final goals when new knowledge about the world makes it necessary. Truly human-equivalent AI would have full autonomy.

Projecting human self-images on future AI is not just a tendency, as far as I can judge, but a norm that governs the discussion. According to this norm, the wrong image is projected in the scenarios above. An image of today’s machines, not of our general human intelligence. Projecting the right self-image on future AI thus appears as an overall goal. Is the goal meaningful or should it be reconsidered self-critically?

These are difficult issues and my impression of the philosophical discussion may be wrong. If you want to judge for yourself, read the article: Fully autonomous AI.

Pär Segerdahl

Written by…

Pär Segerdahl, Associate Professor at the Centre for Research Ethics & Bioethics and editor of the Ethics Blog.

Totschnig, W. Fully Autonomous AI. Sci Eng Ethics 26, 2473–2485 (2020). https://doi.org/10.1007/s11948-020-00243-z

This post in Swedish

We like critical thinking