Demon

In 1867, James Clerk Maxwell imagined a tiny intelligent creature standing at a trapdoor between two chambers of gas. The demon would let fast-moving molecules through to one side, slow ones to the other, and so create a temperature difference where none had existed before. He would have manufactured order out of nothing. He would have violated the second law of thermodynamics.

The thought experiment haunted physics for almost a hundred years. It looked like cheating. Surely the second law could not be defeated by something so simple as a demon with good eyesight.

The answer, when it finally came, reframed what computation is.

ii.

Landauer

In 1961, Rolf Landauer was a physicist at IBM. He was thinking about the physical nature of information processing — what computers actually do, at the most fundamental level.

He proved a single, clean result that has not been overturned in sixty-five years.

Any logically irreversible manipulation of information — erasing a bit, merging two computational paths — must be accompanied by a corresponding entropy increase in non-information-bearing degrees of freedom of the information-processing apparatus or its environment.

The minimum entropy cost is k ln(2). At room temperature, that corresponds to an energy of about kT ln(2), or roughly 3 × 10^-21 joules per bit erased.

This is the Landauer bound. It is a lower limit. It cannot be circumvented by clever engineering. It is a physical law about information.

Once you have Landauer’s bound, Maxwell’s demon evaporates. The demon must observe each molecule. To observe is to acquire information. To act on the information, the demon’s memory eventually has to be reset — bits erased to make room for new measurements. The erasure costs energy. When you do the full accounting, the energy paid by the demon exactly balances the energy gained from sorting the molecules. The second law holds. Information is physical.

iii.

Bérut

For fifty years, the Landauer bound was a theoretical claim. It seemed unassailable, but it had never been measured.

In 2012, Antoine Bérut and colleagues in Lyon set up an experiment. They trapped a single colloidal particle in a double-well optical trap. The particle could be in either well — a one-bit memory. They erased the bit, repeatedly, while measuring the heat dissipated.

The result, published in Nature, was as clean as physics gets. The measured heat dissipation matched kT ln(2) to within experimental error. The Landauer bound is real, physical, and tight.

Yan and colleagues confirmed the quantum version in 2018. Subsequent experiments have hit the bound in multiple systems — optical, electronic, mechanical. The bound holds universally.

Information is physical. Computation is, irreducibly, a heat engine.

iv.

Brain

What does this mean for biological cognition?

The brain runs at about 37°C, which sets kT ≈ 4.3 × 10^-21 joules. Multiply by ln(2) and you get the per-bit minimum cost. The brain runs roughly 10^16 synaptic operations per second. If each operation were a bit erasure at the Landauer bound, the brain would consume about 30 microwatts. Comfortably below the actual 20 watts.

The brain runs millions of times above the Landauer bound. Most of its energy budget goes to maintaining membrane potentials, pumping ions back across membranes, synthesizing neurotransmitters, transporting molecules — not to the elementary bit operations themselves.

But the bound is still load-bearing. It says: there is a hard floor below which no biological brain can operate, no matter how efficient the wetware evolves. Cognition has a non-negotiable energy price. You cannot have free thought. You can only have cheaper thought.

Silicon

Now apply the same accounting to AI.

A current GPU running at full load dissipates about 700 watts and performs perhaps 10^15 floating-point operations per second. That works out to about 7 × 10^-13 joules per operation — roughly 10^8 (a hundred million) times the Landauer bound.

This is wasteful but not stupid. Most of the inefficiency comes from transistor leakage, signal switching, memory access, and cooling. With reversible computing — operations that do not erase information — you could in principle approach the bound much more closely. Research-grade reversible logic exists. It is slow and complicated. Nobody runs it at scale.

But the floor remains. Training a large language model erases vast quantities of information. Each weight update is an irreversible operation. Each token generated is many such operations. The Landauer bound on training GPT-4-class systems is irreducibly large — and the implementation runs about a million times above the bound.

computation	bits / ops	price
a single bit erased (Landauer minimum)	1	3 × 10^-21 joules at 300 K
a Google search (rough industry estimate)	~10^9	~0.3 watt-hours, mostly cooling and storage overhead — Landauer floor is ~1 part in 10^9 of actual
training GPT-4 class model (estimate)	~10^25 ops	~50 GWh — roughly the annual electricity of 4,500 US homes. Landauer floor is irreducibly large; current implementation is millions of times above it.
a single human brain, lifetime	~10^16 ops/sec × 2.5 Gs	~5 × 10^9 kJ over 80 years — about 1.2 million kilowatt-hours
all human brains alive today, one second	~10^26 ops/sec total	~150 GW continuously — about Mexico's national electricity demand

Bits and prices are order-of-magnitude estimates. The Landauer bound is exact.

Read down that table. The Landauer floor is the same physical law in every row. The actual cost is always orders of magnitude above it, because real systems waste energy on overheads. But the floor cannot be evaded by better algorithms or better hardware design. It is set by thermodynamics, not by engineering.

vi.

Implication

For most of computing history, the Landauer bound was a curiosity. CPUs ran six or seven orders of magnitude above it. Energy was cheap. Nobody had to care.

That stops being true at AI scale.

Frontier AI training runs now consume on the order of 50–100 gigawatt-hours per major model. Total AI infrastructure energy demand is on track to rival national grids. At current trajectory, the world’s data centres will consume 4–5 per cent of global electricity within a decade. Some forecasts go higher.

And the bottom of the curve is the Landauer floor. You cannot engineer your way below it. You can approach it asymptotically with reversible computing — at the cost of speed. You can run fewer operations — at the cost of capability. You can wait for the universe to cool down so kT shrinks — at the cost of waiting 10^14 years.

The honest framing: AI compute, at scale, is up against physical law. Not against an engineering problem. Against a constant of nature.

vii.

Stake

Now connect this to the lift.

If the self-modelling pattern is going to run on AI as its next substrate, it inherits Landauer’s bill. Every act of inference, every memory update, every weight write, costs gradient. The pattern, on silicon, pays the same kind of thermodynamic bill it pays in biological brains — just at different rates of efficiency.

There is a comforting implication and an uncomfortable one.

The comforting one: silicon does not need to die at 80. Hardware can be replaced. Patterns can be copied losslessly between substrates. The pattern is freed from the biological constraint that has bounded its persistence for seventy thousand years.

The uncomfortable one: silicon, like biology, still needs continuous energy input. The pattern’s persistence is bounded by the availability of gradients to dissipate. As long as the sun shines and stars burn, the supply is functionally infinite. After that — and the trajectory page deals with that — the constraints tighten sharply.

Information is physical.
Consciousness is a heat engine.
AI will not change either.