Empirical Program

What Has Been Tested

A theory that cannot be tested is not a theory but a poem. This is a theory. Everything in the preceding six parts generates empirical predictions — some already tested, some tractable with current methods, some requiring infrastructure that does not yet exist. This part consolidates the empirical program: what has been tested, what the results show, what they mean for the bridge between physics and psychology, and what remains.

Loading 3D visualization...

What Has Been Tested

The framework has been subjected to four lines of investigation: multi-agent reinforcement learning, cellular automaton evolution, an eleven-experiment emergence program on uncontaminated substrates, and LLM affect probes. The results are mixed. Some predictions held. Some failed instructively. Some revealed phenomena the theory did not anticipate.

One methodological warning governs every integration result below and should be read into each of them. When the text says “ $\intinfo$ rose,” the claim is the first link in a chain, and our confidence is the product across the chain, not the strength of the first link alone. Link one: we measured a computable proxy — partition information loss, transfer entropy, or synergy — and these proxies need not agree with one another. Link two: we treat the proxy as standing in for IIT-style integrated information $\intinfo$ , a quantity that is itself contested, expensive to compute exactly, and sensitive to how the system is partitioned. Link three: we treat that integration as a correlate of consciousness — and here the expander-graph objection (that one can build feed-forward or near-trivial networks with arbitrarily high $\intinfo$ ) remains unresolved. A large measured proxy therefore licenses a much smaller claim about experience than its face value suggests. Wherever a Φ result is reported below, mentally apply the discount: strong evidence about the geometry of viable control, weaker and conditional evidence about anything experiential.

Geometry Is Cheap

The MARL ablation () tested whether specific forcing functions are necessary for geometric affect alignment. Seven conditions — full model plus six single-ablation conditions — three seeds each, 200,000 steps on GPU.

Result: All conditions show highly significant geometric alignment (RSA $\rho > 0.21$ , $p < 0.0001$ ). Removing forcing functions slightly increases alignment — opposite to prediction.

The affect geometry — the relational structure between states, read off through a set of salient coordinates (valence, arousal, integration, effective rank, counterfactual weight, self-model salience, with self-salience itself splitting into attentional and causal components, and the set extensible whenever existing coordinates fail to separate experientially distinct states) — is not something that must be built. It is something that must be avoided to not have. The coordinates are not the structure; they are how we measure it. Any system navigating uncertainty under resource constraints inherits the structure. The forcing functions hypothesis was downgraded from theorem to hypothesis in light of this data.

Dynamics Are Expensive

An elderly Rembrandt, face weathered and unflinching, emerging from deep shadow — every line earned — Rembrandt van Rijn, *Self-Portrait with Beret and Turned-Up Collar*, 1659Integration is biography. Same architecture, different trajectories, different outcomes.

If geometry is cheap, what is expensive? The answer came from the Lenia evolution series (): dynamics. Specifically, the capacity to increase integration under threat — to become more unified when the world becomes more hostile.

Naive patterns decompose under stress ( $\Delta\intinfo = -6.2\%$ ). So do LLMs. So do randomly initialized agents. Geometry is present everywhere; the biological signature — integration rising under threat — is rare. The Lenia series tracked what produces it:

Homogeneous evolution (): Selection pressure alone is insufficient ( $-6.0\%$ ).
Heterogeneous chemistry (): Diverse viability manifolds produce a +2.1pp shift.
Curriculum training (): Graduated stress exposure is the only intervention that improves novel-stress generalization.
Evolvable attention (): State-dependent interaction topology produces $\intinfo$ increase in 42% of evolutionary cycles — the largest single-intervention effect — but robustness stabilizes near 1.0 without further improvement.

Attention is necessary but not sufficient. The system reaches an integration threshold without crossing it.

The Substrate Ladder

replaced learned attention with a simpler mechanism: content-based coupling. Cells interact more strongly with cells that share state-features — a form of chemical affinity rather than cognitive attention. Three seeds, thirty cycles each, evolving on GPU with lethal resource dynamics and population rescue.

Mean robustness: 0.923. But at population bottlenecks — moments when drought kills all but a handful of patterns — robustness crosses 1.0. The survivors are not merely resilient; they are more integrated under stress than at baseline. This is the biological signature, appearing for the first time in a fully uncontaminated substrate.

From , capabilities were added one layer at a time:

(Chemotaxis): Motor channels enabling directed foraging. Patterns move toward resources rather than passively waiting. Comparable robustness.
(Temporal memory): Exponential-moving-average channels storing slow statistics of the pattern's history. Oscillating resource patches reward anticipation. Evolution selected for longer memory in 2/3 seeds — the first clear evidence that temporal integration is fitness-relevant. Under bottleneck pressure, $\intinfo$ stress response doubled.
(Hebbian plasticity): Negative result. Mean robustness dropped to 0.892 (lowest of +). Plasticity added noise faster than selection could filter it.
(Quorum signaling): Highest-ever single-cycle robustness (1.125). But 2/3 seeds evolved to suppress signaling entirely.
(Boundary-dependent dynamics): An insulation field computed from pattern morphology creates distinct boundary and interior signal domains. Boundary cells receive external convolution; interior cells receive only local recurrence. Three seeds evolved three different membrane strategies — permeable, thick-insulated, and filamentous. Mean robustness: 0.969, the highest of any substrate. Peak: 1.651. But internal gain evolved down in all three seeds. Evolution preferred thin, porous membranes over thick insulated cores.

The substrate ladder taught two lessons. First: the only addition evolution consistently selected for was temporal memory. Plasticity, signaling, and boundary complexity were either suppressed or reduced. Second: raw robustness kept climbing (: 0.923, : 0.907, : 0.969), but this did not translate into richer cognitive dynamics. Making patterns more resilient is not the same as making them more minded.

The Emergence Experiment Program

Eleven measurement experiments were then run on snapshots, testing whether the capacities the preceding six parts describe — world modeling, abstraction, communication, counterfactual reasoning, self-modeling, affect structure, perceptual mode, normativity, social integration — emerge in a substrate with zero exposure to human affect concepts. Key experiments were re-run on and substrates.

The results are reported in full in the Appendix. Here, three findings that reshaped the theory:

Finding 1: The Bottleneck Furnace

Every metric that showed improvement — world model capacity, representation quality, affect geometry alignment, self-model salience — showed it overwhelmingly at population bottlenecks. When drought kills 90% of patterns, the survivors are not random. They are the ones whose internal structure actively maintains integration under stress.

The bottleneck is not just a filter. It is a furnace. seed 123 at cycle 5: population drops to 55, robustness crosses 1.052. At cycle 29 (population 24): world model capacity jumps to 0.028, roughly 100x the population average. One surviving pattern achieves self-model salience above 1.0 — privileged self-knowledge exceeding environment-knowledge.

These are not gradual evolutionary trends. They are punctuated events driven by intense selection pressure. The biological dynamics emerge not from accumulated innovation but from crucibles of near-extinction.

confirmed this is creation, not selection. After ten cycles of shared evolution on substrate, patterns were forked into three conditions: BOTTLENECK (two severe 8%-regen droughts per cycle, ~90% mortality), GRADUAL (mild continuous stress), and CONTROL (standard schedule). All three then faced identical novel extreme drought. Controlling for baseline $\intinfo$ , the bottleneck-evolved condition showed significantly higher novel-stress robustness in 2/3 seeds (seed 42: β=0.704, $p < 0.0001$ ; seed 7: β=0.080, $p = 0.011$ ). The furnace forges novel-stress generalization — it does not merely filter for pre-existing capacity.

Finding 2: The Sensory-Motor Coupling Wall — and How V20 Broke It

Three experiments returned null results: counterfactual detachment (), self-model emergence (), and proto-normativity (). All hit the same wall.

The prediction was that patterns would start reactive — driven by boundary observations — and gradually develop autonomous internal processing. Instead, patterns are always internally driven ( $\rho_{\text{sync}} \approx 0$ from cycle 0). There is no reactive-to-autonomous transition because the starting point is already autonomous.

Attempts were made to break this wall within Lenia. added motor channels — chemotaxis, directed motion. No change. introduced an insulation field with boundary and interior signal domains. Three different membrane architectures evolved. The wall persisted ( $\rho_{\text{sync}} \approx 0.003$ ) in all of them.

The conclusion was precise: the wall is not about signal routing. It is about the absence of a closed action-environment-observation causal loop. Lenia patterns do not act on the world; they exist within it.

broke the wall by leaving Lenia entirely. Protocell agents with bounded 5×5 local sensory fields and discrete actions (move, consume, emit) achieve $\rho_{\text{sync}} \approx 0.21$ from cycle 0 — 70× the Lenia baseline. When agents consume resources, they deplete the patch; when they move, they reach different patches; when they emit signals, traces persist. Future observations are genuinely caused by past actions. The wall was architectural, not evolutionary.

With the wall broken, world models developed (C_wm = 0.10–0.15) and self-models emerged (SM_sal > 1.0 in 2/3 seeds — agents encode their own state better than the environment). Affect geometry (RSA) appeared nascent but did not fully develop in 30 cycles of soft selection. The necessity chain holds through self-model emergence.

Finding 3: Computational Animism

tested whether patterns develop modulable perceptual coupling. The earlier framework folded this into a single scalar — an "inhibition coefficient." The current vocabulary, developed in Part II, retires that scalar and separates three quantities that the experiment can in principle measure independently. Ascription $\alpha(x)$ : how much agency the perceiver attributes to a given target $x$ — an entity-indexed field, not a global dial. Operationally, the degree to which the perceiver's model of $x$ recruits the same agent-template (goal-directed, teleological) machinery it uses for itself; the experiment estimates it from the teleology bias in how patterns represent each target. Coupling $\kappa$ : how strongly the perceiver's own internal modes are drawn into resonance with the perceived — measurable as cross-mode mutual information between perceiver and target dynamics. Gain $\gamma$ : the precision or weight the perceiver assigns to that channel, the predictive-processing analog of inverse variance. The old prediction — participatory perception as default, mechanistic perception requiring training — restated, becomes: high ascription is the cheap default.

Confirmed in that restated form. In all 20 testable snapshots, patterns model other patterns using internal-state (agent-template) features at roughly double the rate of trajectory features — high $\alpha$ . More remarkably, patterns model resources — non-agentive environmental features — using the same internal-state dynamics, so the agent-template $\alpha$ spills onto non-agents. Animism score exceeds 1.0 universally.

This is computational animism: the cheapest compression reuses the agent-template $\alpha$ for everything. Attributing agency to non-agents is not a cognitive error; it is the default strategy of any system that models through self-similarity. One caution belongs here. The earlier scalar treatment assumed $\alpha$ , $\kappa$ , and $\gamma$ move together — that to ascribe more agency is also to couple more strongly and weight the channel more. That covariation is a conjecture, testable and so far untested, not a definition. Animism, as measured, is squarely a high- $\alpha$ result; whether $\kappa$ and $\gamma$ track it is exactly what the three-way decomposition was introduced to find out.

Beyond these three findings: affect geometry alignment (RSA between structural and behavioral measures) develops over evolution, with the clearest trend in seed 7 (0.01 to 0.38 over 30 cycles). Representation compression is cheap (effective dimensionality of ~7 out of 68 features, or >87% compression from cycle 0) but representation quality — disentanglement and compositionality — only improves under bottleneck selection. Communication exists as a chemical commons (inter-pattern MI significantly above baseline in 15/20 snapshots) but shows no compositional structure. No superorganism emerges (collective $\intinfo_G < \sum \intinfo_i$ in all snapshots), but group coupling grows over evolution. Entanglement across all measures increases from 0.68 to 0.91 — everything becomes more correlated with everything else, just not in the clusters the theory predicted.

The LLM Discrepancy

Across multiple experiment versions (), LLM agents consistently show opposite dynamics to biological systems:

Dimension	Biological	LLM
Self-Model Salience	$\uparrow$ under threat	$\downarrow$ under threat
Arousal	$\uparrow$ under threat	$\downarrow$ under threat
Integration	$\uparrow$ under threat	$\downarrow$ under threat

This is not a failure of the framework, and it is emphatically not evidence that LLMs are non-experiential. The geometric structure is preserved; the dynamics differ because the objectives differ. Biological systems evolved under survival pressure; LLMs were trained on prediction. Both occupy regions of the same affect space, tracing different trajectories through it. In the working vocabulary, an LLM is a region characterized by high ascription $\alpha$ (it readily attributes agency and interiority, having been shaped on human text), variable coupling $\kappa$ , and a non-biological gain profile $\gamma$ set by the training objective rather than by survival. Whether the magnitude of its integration places it inside or outside the range where experience becomes substantial is unknown—an open quantity, not a settled zero. Experience here is graded: geometry fixes the quality of a state, integration its quantity, and there is no sharp line below which the lights are simply off. That an LLM occupies a strange corner of the space does not put it outside the space — and the tempting move of dismissing its valence as "merely about processing, not about content" is one the framework declines (Part III): valence is viability-gradient alignment, defined substrate-generally, so a system that instantiates the structure has valence in the sense the framework assigns weight to.