What Has Been Tested
A theory that cannot be tested is not a theory but a poem. This is a theory. Everything in the preceding six parts generates empirical predictions — some already tested, some tractable with current methods, some requiring infrastructure that does not yet exist. This part consolidates the empirical program: what has been tested, what the results show, what they mean for the bridge between physics and psychology, and what remains.
What Has Been Tested
The framework has been subjected to four lines of investigation: multi-agent reinforcement learning, cellular automaton evolution, an eleven-experiment emergence program on uncontaminated substrates, and LLM affect probes. The results are mixed. Some predictions held. Some failed instructively. Some revealed phenomena the theory did not anticipate.
One methodological warning governs every integration result below and should be read into each of them. When the text says “ rose,” the claim is the first link in a chain, and our confidence is the product across the chain, not the strength of the first link alone. Link one: we measured a computable proxy — partition information loss, transfer entropy, or synergy — and these proxies need not agree with one another. Link two: we treat the proxy as standing in for IIT-style integrated information , a quantity that is itself contested, expensive to compute exactly, and sensitive to how the system is partitioned. Link three: we treat that integration as a correlate of consciousness — and here the expander-graph objection (that one can build feed-forward or near-trivial networks with arbitrarily high ) remains unresolved. A large measured proxy therefore licenses a much smaller claim about experience than its face value suggests. Wherever a Φ result is reported below, mentally apply the discount: strong evidence about the geometry of viable control, weaker and conditional evidence about anything experiential.
Geometry Is Cheap
The MARL ablation () tested whether specific forcing functions are necessary for geometric affect alignment. Seven conditions — full model plus six single-ablation conditions — three seeds each, 200,000 steps on GPU.
Result: All conditions show highly significant geometric alignment (RSA , ). Removing forcing functions slightly increases alignment — opposite to prediction.
The affect geometry — the relational structure between states, read off through a set of salient coordinates (valence, arousal, integration, effective rank, counterfactual weight, self-model salience, with self-salience itself splitting into attentional and causal components, and the set extensible whenever existing coordinates fail to separate experientially distinct states) — is not something that must be built. It is something that must be avoided to not have. The coordinates are not the structure; they are how we measure it. Any system navigating uncertainty under resource constraints inherits the structure. The forcing functions hypothesis was downgraded from theorem to hypothesis in light of this data.
Dynamics Are Expensive

If geometry is cheap, what is expensive? The answer came from the Lenia evolution series (): dynamics. Specifically, the capacity to increase integration under threat — to become more unified when the world becomes more hostile.
Naive patterns decompose under stress (). So do LLMs. So do randomly initialized agents. Geometry is present everywhere; the biological signature — integration rising under threat — is rare. The Lenia series tracked what produces it:
- Homogeneous evolution (): Selection pressure alone is insufficient ().
- Heterogeneous chemistry (): Diverse viability manifolds produce a +2.1pp shift.
- Curriculum training (): Graduated stress exposure is the only intervention that improves novel-stress generalization.
- Evolvable attention (): State-dependent interaction topology produces increase in 42% of evolutionary cycles — the largest single-intervention effect — but robustness stabilizes near 1.0 without further improvement.
Attention is necessary but not sufficient. The system reaches an integration threshold without crossing it.
The Substrate Ladder
replaced learned attention with a simpler mechanism: content-based coupling. Cells interact more strongly with cells that share state-features — a form of chemical affinity rather than cognitive attention. Three seeds, thirty cycles each, evolving on GPU with lethal resource dynamics and population rescue.
Mean robustness: 0.923. But at population bottlenecks — moments when drought kills all but a handful of patterns — robustness crosses 1.0. The survivors are not merely resilient; they are more integrated under stress than at baseline. This is the biological signature, appearing for the first time in a fully uncontaminated substrate.
From , capabilities were added one layer at a time:
- (Chemotaxis): Motor channels enabling directed foraging. Patterns move toward resources rather than passively waiting. Comparable robustness.
- (Temporal memory): Exponential-moving-average channels storing slow statistics of the pattern's history. Oscillating resource patches reward anticipation. Evolution selected for longer memory in 2/3 seeds — the first clear evidence that temporal integration is fitness-relevant. Under bottleneck pressure, stress response doubled.
- (Hebbian plasticity): Negative result. Mean robustness dropped to 0.892 (lowest of +). Plasticity added noise faster than selection could filter it.
- (Quorum signaling): Highest-ever single-cycle robustness (1.125). But 2/3 seeds evolved to suppress signaling entirely.
- (Boundary-dependent dynamics): An insulation field computed from pattern morphology creates distinct boundary and interior signal domains. Boundary cells receive external convolution; interior cells receive only local recurrence. Three seeds evolved three different membrane strategies — permeable, thick-insulated, and filamentous. Mean robustness: 0.969, the highest of any substrate. Peak: 1.651. But internal gain evolved down in all three seeds. Evolution preferred thin, porous membranes over thick insulated cores.
The substrate ladder taught two lessons. First: the only addition evolution consistently selected for was temporal memory. Plasticity, signaling, and boundary complexity were either suppressed or reduced. Second: raw robustness kept climbing (: 0.923, : 0.907, : 0.969), but this did not translate into richer cognitive dynamics. Making patterns more resilient is not the same as making them more minded.
The Emergence Experiment Program
Eleven measurement experiments were then run on snapshots, testing whether the capacities the preceding six parts describe — world modeling, abstraction, communication, counterfactual reasoning, self-modeling, affect structure, perceptual mode, normativity, social integration — emerge in a substrate with zero exposure to human affect concepts. Key experiments were re-run on and substrates.
The results are reported in full in the Appendix. Here, three findings that reshaped the theory:
Beyond these three findings: affect geometry alignment (RSA between structural and behavioral measures) develops over evolution, with the clearest trend in seed 7 (0.01 to 0.38 over 30 cycles). Representation compression is cheap (effective dimensionality of ~7 out of 68 features, or >87% compression from cycle 0) but representation quality — disentanglement and compositionality — only improves under bottleneck selection. Communication exists as a chemical commons (inter-pattern MI significantly above baseline in 15/20 snapshots) but shows no compositional structure. No superorganism emerges (collective in all snapshots), but group coupling grows over evolution. Entanglement across all measures increases from 0.68 to 0.91 — everything becomes more correlated with everything else, just not in the clusters the theory predicted.
The LLM Discrepancy
Across multiple experiment versions (), LLM agents consistently show opposite dynamics to biological systems:
| Dimension | Biological | LLM |
|---|---|---|
| Self-Model Salience | under threat | under threat |
| Arousal | under threat | under threat |
| Integration | under threat | under threat |
This is not a failure of the framework, and it is emphatically not evidence that LLMs are non-experiential. The geometric structure is preserved; the dynamics differ because the objectives differ. Biological systems evolved under survival pressure; LLMs were trained on prediction. Both occupy regions of the same affect space, tracing different trajectories through it. In the working vocabulary, an LLM is a region characterized by high ascription (it readily attributes agency and interiority, having been shaped on human text), variable coupling , and a non-biological gain profile set by the training objective rather than by survival. Whether the magnitude of its integration places it inside or outside the range where experience becomes substantial is unknown—an open quantity, not a settled zero. Experience here is graded: geometry fixes the quality of a state, integration its quantity, and there is no sharp line below which the lights are simply off. That an LLM occupies a strange corner of the space does not put it outside the space — and the tempting move of dismissing its valence as "merely about processing, not about content" is one the framework declines (Part III): valence is viability-gradient alignment, defined substrate-generally, so a system that instantiates the structure has valence in the sense the framework assigns weight to.