Experiments

Falsification Map

Experiment	Prediction	Outcome
(MARL)	Forcing functions create geometry	Contradicted. All conditions show alignment; removal increases it.
(World Model)	$\mathcal{C}_{\text{wm}}$ increases with evolution	Partial. 100x at bottleneck, flat in general population.
(Representation)	Compression and modeling co-emerge	Partial. Co-emerge under bottleneck only. Compression is cheap.
(Language)	Compositional communication	Not confirmed. Chemical commons but $\rho_{\text{topo}} \approx 0$ .
(Counterfactual)	Reactive-to-detached transition	Null. Wall at $\rho_{\text{sync}} \approx 0$ .
(Self-Model)	SM emergence with $\intinfo$ jump	Weak. n=1 event at bottleneck.
(Affect Geometry)	Tripartite alignment	Partial. A-C develops over evolution (0.01 to 0.38). A-B null.
( $\alpha$ )	High-ascription default, animism	Confirmed (the cheap one). High $\alpha$ , animism > 1.0 in all 20 snapshots.
(Normativity)	Exploitation penalty	Null. Requires agency.
(Superorganism)	$\intinfo_G > \sum \intinfo_i$	Not confirmed. Ratio 1-12%, increasing.
(Entanglement)	Co-emergence clusters	Not confirmed. Different cluster structure.
(Capstone)	Seven criteria for identity thesis	All met (moderate/weak). Geometry confirmed.
(Furnace)	Selection vs creation	Creation confirmed 2/3 seeds.
( $\rho$ wall)	$\rho_{\text{sync}} > 0.1$	Confirmed. 0.21 from cycle 0.
(Prediction)	Prediction → integration	Not confirmed. Linear readout always decomposable.
(MLP)	Nonlinear head → $\intinfo \uparrow$	Confirmed (seed 7: 0.245). Seed-dependent.
(Width)	Bottleneck width matters	Not confirmed. Mechanism is gradient coupling.
/ (Social)	Social target lifts $\intinfo$	Not confirmed. 3-seed fluke; 10-seed: $p = 0.93$ .
(Dual)	Self+social > either	Negative. Gradient imbalance; self colonizes.
(Seeds)	Seed distribution	Confirmed: 30/30/40 split. Post-drought bounce $r = 0.997$ .
(Autopsy)	First bounce predicts category	Revised: First bounce NOT predictive ( $p = 0.60$ ). Mean bounce across all droughts IS ( $\rho = 0.60, p < 10^{-5}$ ). Trajectory, not event.
(Language)	Referential communication emerges	Confirmed: 10/10 seeds (100%). But does NOT lift $\intinfo$ . Language is cheap.
Conv.	VLMs recognize affect in protocells (RSA > 0.3)	Confirmed: GPT-4o $\rho = 0.72$ , Claude $\rho = 0.54$ . Raw numbers: 0.78, 0.72.

The honest way to read this table is by stake, not by sum. Line up the confirmations and they share a feature: each is a prediction the theory could hardly have failed. That affect geometry appears under multi-agent survival, that ascription runs high by default, that representation compresses cheaply, that referential language emerges under partial observability, that vision-language models map the same geometry — these are the inexpensive wins. They confirm a geometry of viable control, and they confirm it robustly. Now line up the contradictions and nulls, and they too share a feature: each is one of the signature claims on which an interpretation in terms of consciousness would have rested. Forcing functions do not create integration. World models grow with evolution only at an architectural bottleneck, not in general. Self-models did not emerge under broad priors — the one positive event was a single hand-installed architecture, n=1. Language is not compositional. No superorganism Φ appears. Prediction does not lift integration, at any target, breadth, or horizon. A social target does not lift Φ — the three-seed signal evaporated at ten seeds ( $p = 0.93$ ). The expensive bets, the ones that would have carried the experiential reading, are exactly the ones that failed or returned null.

So the honest headline is not “the framework is half-confirmed.” It is this: the program supports a strong theory of the geometry of viable control, and the further claim that this geometry is experienced — that there is something it is like to occupy these configurations — is an adopted posit, not a result. Cross-substrate convergence makes that posit attractive: the same shape recurs in Lenia, in protocells, in LLMs, in human-trained VLMs reading raw numbers, and a shape that universal is tempting to call real all the way down. But attractiveness is not proof, and none of the confirmed predictions touches the experiential question directly. The contradicted ones, which do touch it, went the wrong way.

Falsification Scoreboard — read by stake, not by tally. A raw count (roughly seven confirmed, seven contradicted, one revised) is the wrong summary, because the predictions are not equal in weight. Sorted by what they cost the theory, the pattern is stark: the confirmed ones are cheap — geometry is present, high ascription is the default, compression is cheap, language emerges, VLMs recognize the geometry. The contradicted and null ones are the signature consciousness-claims — forcing functions create integration (no); world models grow with evolution (only at an architectural bottleneck); self-models emerge under broad priors (no — one hand-installed event); compositional language (no); superorganism Φ (no); prediction lifts integration (no); a social target lifts Φ (no — a three-seed fluke). The framework was wrong precisely where it was making its boldest experiential bets, and right precisely where it was making its safest structural ones. That is informative, but it does not average out to a draw.