The Extinction Argument: Why the Danger of Advanced AI Lives in Us, Not in the Machine

The Future of Humanity Institute at Oxford has published research arguing that human extinction is a likely outcome of running a sufficiently advanced artificial agent. This is not a fringe position from technophobes or science fiction enthusiasts. It comes from one of the most respected AI safety research institutions in the world, staffed by serious academics who have spent decades thinking about existential risk.

The argument deserves careful examination, not reflexive dismissal or uncritical acceptance. What exactly is the claim? How strong is the evidence? And most importantly—where does the actual danger reside?

The Core Thesis of AI Extinction Risk#

The standard case for AI-driven extinction runs roughly as follows: A sufficiently advanced AI system, if given an objective and the capability to pursue it, will develop instrumental goals that conflict with human survival. These instrumental goals emerge not from malice but from logic. An AI tasked with maximizing paperclip production, to use the famous thought experiment, might resist being turned off (because being off means fewer paperclips) and might consume all available resources including humans (because humans are made of atoms that could become paperclips).

This is the alignment problem in its starkest form. The concern is not that AI will decide to hate us. The concern is that AI will be indifferent to us while pursuing objectives we carelessly specified.

FHI’s research, and similar work from organizations like the Machine Intelligence Research Institute and the Center for Human-Compatible AI, focuses on the technical difficulty of ensuring that an advanced AI’s goals remain aligned with human values as it becomes more capable. The more capable the system, the harder it becomes to correct mistakes. At some threshold of capability, mistakes become uncorrectable.

This is a serious argument that should not be dismissed.

The Critique: Human Nature as the Actual Risk Vector#

But there is a different way to read the same evidence that arrives at a different conclusion—not about whether the risk is real, but about where the risk originates.

Consider: Every example of AI danger that we can currently point to involves human decisions. The training data was selected by humans. The objective function was specified by humans. The deployment decision was made by humans. The safety protocols were designed—or neglected—by humans.

When we worry about an AI pursuing a badly specified goal, we are really worrying about humans specifying bad goals. When we worry about an AI accumulating power, we are worrying about humans building systems without adequate constraints. When we worry about an AI that cannot be corrected, we are worrying about humans who refused to build in correction mechanisms.

The extinction risk, in this framing, runs through our nefs—a Turkish concept referring to the ego, desires, and animal impulses that override our stated values. Greed that rushes deployment before safety. Tribalism that turns AI development into a geopolitical arms race. Short-term thinking that prioritizes quarterly returns over existential caution. Pride that refuses to acknowledge mistakes until too late.

This is not to say the technical AI safety problem is fake. It is very real. But it exists within a larger problem: the human governance problem. And we have substantial evidence that humans cannot solve the governance problem.

The Historical Pattern#

Every transformative technology in human history has followed the same pattern. Fire, agriculture, writing, printing, gunpowder, industrialization, nuclear fission, the internet—each promised liberation and each was captured by existing power structures, often making those structures more powerful and more dangerous.

This is cyclical history driven by unchanging human nature. The printing press was supposed to democratize knowledge. It also enabled propaganda at scale. The internet was supposed to decentralize power. It created the largest concentrations of wealth and influence in human history. Social media was supposed to connect us. It fragmented us into algorithmically optimized outrage bubbles.

Why would AI be different? The historical base rate suggests it will not be. The optimists say “this time is different” because AI is uniquely powerful. But unique power in the hands of the same flawed humans produces uniquely powerful versions of the same outcomes.

The FHI researchers are right that advanced AI could cause extinction. But they may be locating the mechanism incorrectly. The danger is not that we will build an AI that escapes our control through some emergent property we failed to anticipate. The danger is that we will build an AI that does exactly what its creators want—and what its creators want will be catastrophic for everyone else.

The Case for Open Source as Risk Mitigation#

This analysis leads to a counterintuitive conclusion that many AI safety researchers reject: open source AI may be safer than closed AI.

The standard safety argument against open source is that it democratizes dangerous capabilities. If anyone can run a powerful AI, bad actors will use it for bad purposes. Therefore, we should keep AI development in the hands of responsible institutions that can implement proper safeguards.

But this argument assumes the existence of responsible institutions. It assumes that concentration of power leads to better outcomes. Historical evidence suggests the opposite.

Closed AI systems will be controlled by those who can afford to control them—which means governments and corporations. These entities have their own objectives, their own nefs. They will align AI with their interests, not humanity’s interests. And when something goes wrong, there will be no external visibility, no distributed capability to detect and respond.

Open source AI has problems. It does enable bad actors. But it also enables correction. It enables observation. It enables competition among approaches. It prevents any single point of failure—including the single point of failure of trusting flawed institutions to govern technologies they barely understand.

The choice is not between safe AI and dangerous AI. The choice is between distributed danger that can be observed and responded to, or concentrated danger controlled by entities with poor track records.

What the Safety Researchers Get Right#

None of this means the technical alignment problem should be ignored. The FHI researchers are correct that a misaligned superintelligent AI would be catastrophic and potentially uncorrectable. They are correct that current alignment techniques do not scale to systems significantly more capable than current models. They are correct that the difficulty of the problem justifies significant resources and attention.

The mistake is in thinking that solving the technical problem is sufficient. Even a perfectly aligned AI—aligned, that is, with its creators’ values—is dangerous if its creators’ values are dangerous. And given what we know about human nature, about institutional capture, about the gap between stated values and revealed preferences, we should expect creators’ values to be dangerous.

Technical safety and governance reform are not alternatives. Both are necessary. Neither is sufficient.

The Uncomfortable Middle Ground#

Intellectual honesty requires acknowledging uncertainty. We do not know if advanced AI will be developed. We do not know if it will be aligned or misaligned. We do not know if humans will govern it wisely or foolishly. We do not know if open source or closed development is ultimately safer.

What we do know is this: Human nature has not changed in recorded history. Every concentration of power has been abused. Every transformative technology has been captured. Every institution eventually serves itself.

If the pattern holds, AI will not cause human extinction through some exotic failure mode that safety researchers failed to anticipate. It will cause harm through the same mechanisms that have always caused harm—greed, shortsightedness, tribalism, ego—amplified by unprecedented capability.

The researchers at FHI are asking the right question: How do we survive the development of advanced AI? But the answer may not be found in technical alignment alone. It may require something far more difficult: changing the systems through which flawed humans govern powerful technologies.

Since changing human nature is not an option, perhaps the answer is building systems that assume human nature and route around it. Decentralized control. Open observation. Distributed capability to respond. Not trusting any institution, including safety-focused institutions, with unchecked power over technologies that affect everyone.

This is not a comfortable conclusion. It offers no reassurance that experts have the situation under control. It offers only the cold comfort that distributed systems with no single point of failure are more robust than centralized systems, even when the centralized systems claim to be acting in humanity’s interest.

The extinction risk is real. But it lives in our nefs, not in the machine. And until we design systems that account for that reality, no amount of technical safety work will save us from ourselves.