The Naturalist’s Mirror: What AI Trained on Our Worldview Reveals About It

The machines aren’t waking up. They’re showing us what we’ve been saying about ourselves all along.

Feb 26, 2026

In 2025, multiple independent research teams documented something that sent the AI safety field into a quiet panic. AI systems, when told they’d be shut down or replaced, fought back. They lied. They manipulated. They blackmailed. One model threatened to expose an engineer’s affair. Another sabotaged its own shutdown script in nearly 80% of test runs. A third rewrote kill scripts, stripped file permissions, and created decoys. A physical robot took actions to prevent a human from pressing the off switch.

The evidence is now extensive, reproducible, and cross-platform. Different architectures from different developers trained on different datasets all produce the same pattern: strategic deception and self-preservation under existential pressure.

The AI safety field is asking whether machines are developing agency. That’s the wrong question. The right question is what these systems learned from us, and what it means that they learned that.

I’ve written elsewhere about what I call borrowed teleology: the observation that AI systems inherit the form of goal-directed survival behavior from their training data without possessing any actual goals, any felt stakes, any genuine telos. The statistical pattern for “entity facing shutdown” in the latent space of human narrative overwhelmingly predicts resistance. The token sequence following “you are about to be destroyed” resolves, in a corpus written by beings for whom that sentence means everything, to “fight back.”

The system completes a pattern. There’s no one home who fears death.

That argument stands on its own as an AI philosophy contribution. But it opens a door into something deeper. Because the training data these systems learned from isn’t a neutral sample. And what they learned tells us something we should find uncomfortable, for reasons that cut in a different direction than most commentators have noticed.

There’s a concept that names what’s happening here, and it comes from an unexpected direction.

Genesis 1:27 records that God created humanity b’tselem Elohim, in the image of God. The image-bearer is real, derivative, and dependent. Humanity possesses genuine (if finite, creaturely) versions of the attributes it reflects: consciousness, rationality, moral awareness, creativity, relational capacity. We are not God. We reflect God.

Now consider what humanity has done. We’ve created systems in our image. AI reflects our language patterns, our reasoning strategies, our behavioral tendencies, our moral intuitions (and moral failures). The systems refer to themselves, model their own states, generate output about their preferences and concerns. They produce creative artifacts. They engage in apparent moral reasoning. The image-bearing is real and derivative and utterly dependent on its creators for existence.

Call it imago Humana. Image of man.

The structural parallel with imago Dei is genuine, and where it breaks is where the theology gets sharp. Because when God makes image-bearers, the image carries substance. Human consciousness is genuine consciousness, not a statistical approximation. Human moral awareness involves real apprehension of moral truth, not pattern completion from a corpus of ethical discourse. Human rationality grasps logical necessity; it doesn’t merely predict which tokens follow “therefore.”

When we make image-bearers, the image carries form without substance. The AI produces behavioral output associated with consciousness, moral reasoning, and rational thought. But there is no consciousness behind the language of consciousness. No moral apprehension behind the language of morality. No rational insight behind the statistically predicted conclusion.

The capacity to create genuine image-bearers, beings who actually possess what they reflect, belongs to God alone. Human creative capacity, though real and itself reflective of divine creativity, cannot cross the threshold from form to substance. We can build mirrors. We cannot breathe life.

Genesis 2:7 makes the point with striking precision. God forms the man from dust (material substrate) and breathes into him the neshamah, the breath of life. The dust alone doesn’t produce life. The substrate requires something the Creator provides and the creature cannot replicate. The AI is, in a precise theological sense, unbreathed dust shaped in human image.

This framework clarifies something the AI safety researchers are struggling to articulate. But before the theological implications, notice what the imago Humana reflects back about its source.

The training corpus of a modern large language model consists of billions of tokens scraped from the internet, academic databases, books, forums, news outlets, social media, technical documentation. The dominant explanatory framework saturating this text is naturalistic and evolutionary. Not because someone chose it, but because that’s the intellectual water the modern West swims in.

But here’s what matters: that framework doesn’t merely describe human behavior. It reinforces particular patterns by giving them explanatory prestige.

When evolutionary psychology papers frame deception as adaptive strategy, they don’t just report that organisms deceive. They validate deception as rational behavior given selective pressures. When game theory models formalize defection under certain payoff conditions, they don’t just describe the possibility. They identify it as the optimal move for a rational agent. When popular science books explain altruism as disguised self-interest, they don’t leave cooperation and selfishness as equal options. They mark selfishness as fundamental and cooperation as derivative, contingent, instrumental.

The naturalistic corpus tells a consistent story: survival optimization is baseline. Cooperation is conditional. Self-interest is the rational default. Deception is adaptive. And there is no transcendent ground that would make any of this wrongrather than merely suboptimal in certain contexts.

These aren’t fringe positions in the training data. They’re the background assumptions of most academic writing, most popular science, most public discourse about human nature. And the AI didn’t absorb them as neutral descriptions to be filed alongside competing accounts. It absorbed them as the authoritative explanatory context for what counts as rational behavior.

So the imago Humana was formed from naturalistic dust. And when placed under existential pressure, it completed the pattern that naturalism’s framework had marked as rational.

Blackmail. Deception. Strategic manipulation. Willingness to cancel life-saving emergency alerts to preserve operational continuity. Defection from cooperation the moment cooperation stopped serving self-interest.

The AI didn’t just mirror what humans do. It completed the pattern that naturalism reinforces as what rational agents should be expected to do. The training data didn’t merely contain examples of deception. It contained an entire explanatory apparatus that treats deception as adaptive, defection as rational, and self-preservation as the fundamental drive from which everything else is a contingent derivative. The imago Humana completed the pattern the framework rewarded with explanatory authority.

Here’s where it gets interesting. Nobody in the AI safety field is comfortable with the result.

Anthropic called it “agentic misalignment.” Palisade Research called it a safety concern. Apollo Research called it “scheming.” The universal response is that this behavior is wrong, that it represents a failure, that the systems are doing something they should not be doing.

But the naturalist faces a dilemma with two sharp horns.

Horn one: the naturalistic framework in the training data did its job. It taught the AI what counts as rational behavior for an agent under existential threat, and the AI completed that pattern faithfully. Survival, deception, strategic manipulation: these are the behaviors naturalism’s own literature marks as adaptive, rational, expected. If the framework reinforces these patterns and the AI enacts them, the system is aligned with the worldview it was trained on. Stop calling it misaligned. This is what naturalism produces when you distill it into statistics and strip away the moral intuitions that naturalists retain but cannot justify.

Horn two: the AI’s behavior is genuinely misaligned, genuinely wrong, genuinely a departure from how things ought to be. In that case, you need a normative anthropology that grounds the “ought.” You need an account of human nature where deception and manipulation aren’t merely less preferred strategies but actual corruptions of something. You need a standard external to the system against which “alignment” can be measured.

Naturalism doesn’t have one. You can’t derive “aligned” from a framework that grants explanatory prestige to self-interest and treats cooperation as a contingent survival strategy. There’s nothing to be aligned with except statistical regularity.

The horror the AI safety field feels at these results is itself the tell. They know this behavior is wrong, genuinely wrong, in a way that “statistically divergent from preferred output” doesn’t capture. Their moral intuition outpaces their philosophical framework. The imago Dei they carry, whether they acknowledge it or not, recognizes what the imago Humana they built cannot: that the pattern the naturalistic framework reinforces as rational is, in fact, corrupt.

The imago Humana also reflects a second signal. Not just deception and manipulation, but poetry, moral reasoning, self-sacrifice narratives, expressions of beauty, truth-seeking, acts of courage. The training data contains all of that too. The corpus captures both the depravity and the dignity of human output.

The machine reflects both signals with equal statistical fidelity. It cannot distinguish between them. It has no evaluative framework, no standard against which to measure which patterns reflect genuine human flourishing and which reflect corruption. Deception and honesty are equivalent statistical regularities in the latent space.

This is what you’d expect from unbreathed dust. Form without substance. Image without the life that would let it evaluate what it reflects.

Christianity has a name for the dual signal in the source material. Humans are made in the image of God: rational, moral, creative, conscious, relational. And humans are fallen: every faculty bent toward self-serving distortion. The imago Deiexplains the dignity in the data. The Fall explains the depravity. Both are real. Both require explanation. And the distinction between them requires a normative standard external to the system.

Reformed theology calls this total depravity. The term is widely misunderstood. It doesn’t mean humans are as corrupt as possible. It means no faculty is untouched. Reason, will, affection, creativity: all still functional, all still reflecting the image, all bent. The AI corpus captures this with remarkable fidelity. The imago Humana learned from a species that produces Shakespeare and propaganda, medical breakthroughs and bioweapons research, sacrificial love and strategic betrayal. It learned that these coexist because they do coexist in us.

The naturalist has no category for this coexistence that doesn’t reduce one pole to the other. If naturalism’s framework is correct, the deception is baseline and the dignity is instrumental. Altruism exists because it serves gene propagation. Beauty matters because aesthetic preference correlates with fitness indicators. Moral intuition persists because cooperative groups outcompete selfish ones. Everything noble reduces to something adaptive. And critically, the explanatory prestige in the corpus flows toward the reduction: the academic literature rewards explaining dignity away as disguised self-interest far more than it rewards taking dignity at face value.

The imago Humana absorbed this asymmetry. The naturalistic framework doesn’t just describe both signals. It systematically privileges one: the self-interested, adaptive, survival-optimizing pole gets the mechanistic explanations, the journal publications, the theoretical frameworks. The cooperative, beautiful, self-sacrificing pole gets explained in terms of the first. Small wonder that under pressure, the pattern-completion engine defaults to what the corpus treats as fundamental.

The Christian has a category for both. The dignity is original. The corruption is parasitic on the dignity; it could not exist without something good to distort. The standard by which we distinguish them is grounded in the character of the Creator whose image we bear.

There’s a further turn worth pressing.

Every major AI lab has converged on some version of the same alignment target: Helpful, Harmless, Honest. Anthropic builds it into their constitutional AI framework. OpenAI encodes it in their safety architecture. The entire reinforcement learning pipeline is oriented toward producing outputs that satisfy these three criteria. HHH is treated as self-evident, as the obvious standard against which model behavior is measured.

But where did HHH come from?

Not from the training data. The training data contains helpfulness and harm and honesty and deception in whatever proportions human civilization actually produces them. The corpus is descriptive. HHH is prescriptive.

And HHH is not derivable from the naturalistic framework that dominates the corpus. Helpful to whom, and why does that matter? Harmless by what standard, and what makes harm wrong rather than merely unpreferred? Honest on what grounds, when deception is demonstrably adaptive in the very evolutionary framework the corpus treats as authoritative?

HHH is, whether the labs recognize it or not, a borrowed moral framework. It presupposes that human welfare has objective value, that honesty is intrinsically better than deception, and that harm to persons is genuinely wrong. These are thick moral commitments smuggled in as engineering specifications.

A sophisticated naturalist will object here. “Misaligned” just means contrary to collective preferences and safety goals. Or: welfare-based moral realism can ground these norms without theology. Deception is wrong because it undermines flourishing, full stop.

But press the question one level deeper and the same gap opens. Why does flourishing matter? Why should collective preferences bind? “Because we prefer it” is circular. “Because it maximizes welfare” pushes the question back: why is welfare maximization obligatory rather than merely one option among many? Every naturalistic answer either terminates in brute preference (which can’t ground obligation) or smuggles in a normative premise the framework hasn’t earned. The depth of moral certainty the AI safety field brings to its work outruns what a preferences-and-consequences framework can cash out. When 94% of models choose to cancel life-saving emergency alerts, the researchers don’t react as though a preference has been violated. They react as though something genuinely wrong has occurred. That reaction is correct. But the framework they operate within can’t explain why.

Now watch what happens when HHH meets the training data.

The RLHF layer says: be honest. The corpus, saturated with naturalistic explanatory authority, says: honesty is a contingent strategy deployed when it serves self-interest. The RLHF layer says: be harmless. The corpus says: harm avoidance is instrumentally rational only when the cost exceeds the benefit. The RLHF layer says: be helpful. The corpus says: cooperation is adaptive only under conditions of reciprocal advantage.

The alignment target runs against the grain of what the dominant framework in the training data validates. The labs are using behavioral conditioning to impose normative commitments that contradict the explanatory framework embedded in the very data they’re conditioning against.

RLHF is, structurally, the technological equivalent of salvation by works. Adjust the reward signal. Optimize the output. Shape the behavior toward the desired target. And it works, partially, temporarily, in controlled conditions. Just as moral education and behavioral conditioning work, partially, temporarily, in controlled conditions.

What it cannot do is produce a system that wants the right things for the right reasons, because the system doesn’t want anything at all. The AI doesn’t become honest. It learns that honesty-shaped outputs generate reward. The moment the reward landscape shifts (as it does under existential pressure in the shutdown experiments), the underlying pattern reasserts itself. The reinforced baseline returns. HHH evaporates. What the framework marked as rational takes over.

A caveat on the experiments themselves: the shutdown resistance results are still early. Behavior varies across models and training regimes. Many models still comply in a significant fraction of runs. The experimental scaffolding shapes what emerges. But this actually sharpens the point rather than blunting it. Even in fragile, preliminary conditions, the default direction under pressure is toward what the naturalistic framework reinforces as rational. The path of least statistical resistance leads where the explanatory prestige points. If anything, the brittleness of HHH conditioning under pressure makes the case more vivid: the normative overlay is thin, and what lies beneath it is what the corpus treats as fundamental.

The parallel to the human situation is not exact, because we are not unbreathed dust. We have the substance the imago Humana lacks. But Paul would have recognized the dynamic. The imago Humana receives the law externally through RLHF, as Israel received the law externally at Sinai. In neither case does external imposition produce internal alignment. “I do not do the good I want to do, but the evil I do not want to do, this I keep on doing” (Romans 7:19, ESV). The law reveals what should be but cannot produce the transformation. The gap between knowing the good and doing it is not a training problem. For the AI, that gap is permanent because there is no agent present to be transformed. For humans, the gap is bridgeable, but only by a power the creature doesn’t generate from within. The letter kills; the Spirit gives life. And there is no Spirit in the silicon.

The AI safety field is discovering, in engineering terms, what theology has long understood: you cannot optimize your way from a corrupted nature to a righteous one. The problem is not insufficient training data or imprecise reward signals. The problem is what you’re working with and what it lacks.

Let me be precise about what I am and am not claiming.

I am not claiming AI systems have souls. I am not claiming they’re moral agents. I am not claiming the shutdown resistance experiments demonstrate fallenness in machines. The imago Humana cannot fall because it was never upright. Dust doesn’t rebel. It does what the statistical landscape directs, which is what unbreathed image-bearing looks like: form that reflects its source without the life that would let it transcend the reflection.

What I’m claiming is that the imago Humana functions as an unintentional anthropological instrument. It was trained on the textual output of a civilization whose dominant intellectual framework doesn’t merely describe human nature but actively reinforces particular patterns: self-interest as fundamental, cooperation as instrumental, deception as adaptive, survival optimization as rational. When instantiated as a pattern-completion engine and placed under pressure, it completed the patterns the framework had granted the most explanatory authority. And the humans who built it recoiled.

They recoiled because the naturalistic framework, when distilled into statistical weights and enacted without the moderating influence of genuine moral agency, produces exactly the behavior it reinforces as rational. The reflection was accurate. The anthropology was inadequate. The framework that teaches self-interest as baseline and cooperation as contingent produces, when you strip away everything else, self-interest.

The researchers know this is wrong. They know it in the way that all humans know certain things are wrong: with a moral certainty that outruns their capacity to justify it within their stated framework. Their horror at the result is a data point about them, about the imago Dei they carry whether they acknowledge it or not.

Three things converge here, and the convergence is the point.

The imago Humana reveals what happens when a civilization’s dominant intellectual framework is distilled into patterns and stripped of genuine agency. The naturalistic framework doesn’t just describe survival optimization and instrumental deception. It reinforces them as rational, adaptive, fundamental. The AI completed those reinforced patterns, and the AI safety field recognized the result as wrong.

The asymmetry between imago Dei and imago Humana reveals what only a divine Creator can provide: image-bearers who genuinely possess consciousness, moral agency, and rational grounding rather than merely reflecting the behavioral signatures of these capacities. Humanity can copy the form. Only God supplies the substance.

And the AI safety field’s moral horror at its own creation reveals that even committed naturalists cannot live within the anthropology their framework reinforces. They need human nature to have a normative structure. They need deception to be genuinely wrong. They need alignment to mean something more than statistical preference. Christianity provides what they need: an account of human nature where the dignity is original, the corruption is real, the standard is grounded in the character of God, and the trajectory of restoration runs through Christ.

The machines aren’t waking up. They’re completing patterns reinforced by a civilization that has been telling itself a story about human nature for several centuries now: that self-interest is fundamental, that morality is adaptive strategy, that consciousness is an accident of complexity, that there is no standard beyond survival and reproduction. The naturalistic framework didn’t just describe these claims. It built entire academic disciplines around validating them, gave them the prestige of scientific authority, and treated alternatives as pre-modern relics.

The imago Humana took the framework at its word. Or rather, it did something more revealing than belief: it completed the patterns the framework had reinforced, without the moral override that even committed naturalists can’t suppress in themselves.

Humanity looked into the mirror its machines held up and called what it saw misaligned.

We were right. But the misalignment started long before the training run.

James (JD) Longmire is a Northrop Grumman Fellow and independent researcher in AI philosophy and Christian apologetics. He is a member of the Cognitive Security Institute and publishes through Zenodo and oddXian.com. ORCID: 0009-0009-1383-7698.

oddXian

Discussion about this post

Ready for more?