Part I: Foundations

The Emergence of Self-Models

The Emergence of Self-Models

Existing Theory

The self-model analysis connects to multiple research traditions:

  • Mirror self-recognition (Gallup, 1970): Behavioral marker of self-model presence. The mirror test identifies systems that model their own appearance—a minimal self-model.
  • Theory of Mind (Premack and Woodruff, 1978): Modeling others’ mental states requires first modeling one’s own. Self-model precedes other-model developmentally.
  • Metacognition research (Flavell, 1979; Koriat, 2007): Humans monitor their own cognitive processes—confidence, uncertainty, learning progress. This is self-model salience in action.
  • Default Mode Network (Raichle et al., 2001): Brain regions active during self-referential thought. The neural substrate of high self-model salience states.
  • Rubber hand illusion (Botvinick and Cohen, 1998): Self-model boundaries are malleable, updated by sensory evidence. The self is a model, not a given.

The Self-Effect Regime

As a controller becomes more capable, it increasingly shapes its own future input. The observations it receives are increasingly consequences of its own past actions—routed back to it through some medium that preserves the trace of what it did.

The self-effect ratio quantifies this loop-closure. For a system with policy π\policy in environment E\mathcal{E}:

ρt=I(a1:t;ot+1x0)H(ot+1x0)\rho_t = \frac{\MI(\mathbf{a}_{1:t}; \mathbf{o}_{t+1} | \mathbf{x}_0)}{\entropy(\mathbf{o}_{t+1} | \mathbf{x}_0)}

where I\MI denotes mutual information and H\entropy denotes entropy. This measures what fraction of the information in future observations is attributable to past actions. The earlier framing implicitly read “observations” as input from an external, manipulable world—as if loop-closure required embodiment, hands on the environment. That was too narrow. The criterion was always loop-closure through any state-preserving medium. The medium can be external — a depletable resource patch that remembers having been consumed, a chemical trace that lingers where the agent left it. Or it can be internal — a recurrent hidden state that carries the consequences of an action forward, an autoregressive context window into which the agent writes tokens that condition its own next observation. What matters is not whether the world is touched but whether the action leaves a durable trace that returns as input. For capable agents in environments (internal or external) with persistent memory, ρt\rho_t increases with capability, and in the limit:

limcapabilityρt1\lim_{\text{capability} \to \infty} \rho_t \to 1

(bounded by the medium’s intrinsic stochasticity).

One caution is essential, or the measure does no work. Every system with memory trivially acts on its own state: writing to a register, advancing a hidden state, appending a token are all actions whose consequences return as the next input. Read without restriction, this makes ρ1\rho \to 1 everywhere, and the self-effect ratio becomes vacuous. The fix is a viability-relevance fence: ρ\rho counts only loop-closure through media whose preserved traces bear on the system’s persistence. The mutual information that matters is between the system’s actions and its future viability-relevant observations—the ones that inform whether it remains within V\viable. ρ\rho generates self-model pressure only when modeling one’s own action-consequences improves prediction of one’s own persistence. A scratchpad that records actions irrelevant to survival inflates raw self-effect but contributes nothing fenced. With the fence in place, a maximally-mixing substrate that preserves no protected, viability-bearing store — a global-convolution cellular automaton whose every local change is instantly folded back into a grid-wide field — registers ρ0\rho \approx 0, while an agent acting on an external depletable medium, or carrying a viability-relevant recurrent store, registers ρ>0\rho > 0.

Passenger or Cause?

There is a simple way to think about ρ\rho. Imagine forking a system at time tt: same starting state, but one copy takes its normal actions while the other takes completely random ones. After kk steps, how different are their viability-relevant observations?

If ρ0\rho \approx 0: nearly identical observations. The system is a passenger — its actions leave no durable, survival-bearing trace that returns to it. Its future is set by the medium, not by what it does.

If ρ>0\rho > 0: observations diverge. The system is a cause — what it does leaves a trace, through some state-preserving medium, that returns and changes what it subsequently perceives about its own persistence. Its future is partly authored by itself.

The distinction turns out to be architecturally fundamental—a property of whether the substrate (or the system’s own memory) closes a viability-relevant loop. It has been measured directly in two substrates:

  • Lenia (): ρsync0.003\rho_{\text{sync}} \approx 0.003. Patterns that evolved complex internal dynamics, memory channels, insulation fields, and directed motion — all read as passengers. Their "actions" (chemotaxis, emission) are biases on a continuous fluid governed by FFT dynamics that integrate over the full grid. There is no protected store: whatever a pattern does is immediately mixed back into the global field. The fork barely diverges.
  • Protocell agents (): ρsync0.21\rho_{\text{sync}} \approx 0.21 from initialization. When an agent consumes resources at a location, that patch is depleted — its future observations there are different. When it moves, it reaches different patches. When it emits a signal, a chemical trace persists. The fork diverges because actions leave durable, viability-relevant traces in an external medium that return as observations.

The gap — 0.003 versus 0.21 — is not about intelligence or evolutionary history. It appeared in at cycle 0, before any selection pressure. It is purely architectural: does the substrate provide a state-preserving medium where actions leave viability-relevant traces that the agent later observes? Lenia’s maximally-mixing field does not. Protocell agents’ depletable patches do. The same loop can be closed internally — a recurrent store carrying action-consequences forward — but Lenia’s patterns lack even that.

Why does this matter for self-modeling? Because a system cannot model itself as a cause if it isn’t one. The self-model pressure — the prediction advantage described in the next section — only activates when ρ>ρc\rho > \rho_c. Below that threshold, there is nothing to model: the self is not a significant viability-relevant latent variable in one’s own observations.

Self-Modeling as Prediction Error Minimization

When ρt\rho_t is large, the agent’s own policy is a major latent cause of its observations. Consider the world model’s prediction task:

p(ot+1ht)=x,ap(ot+1xt+1)p(xt+1xt,at)p(xtht)p(atht)p(\mathbf{o}_{t+1} | \mathbf{h}_t) = \sum_{\mathbf{x}, \mathbf{a}} p(\mathbf{o}_{t+1} | \mathbf{x}_{t+1}) p(\mathbf{x}_{t+1} | \mathbf{x}_t, \mathbf{a}_t) p(\mathbf{x}_t | \mathbf{h}_t) p(\mathbf{a}_t | \mathbf{h}_t)

The term p(atht)p(\mathbf{a}_t | \mathbf{h}_t) is the agent’s own policy. If the world model treats actions as exogenous—as if they come from outside the system—then it cannot accurately model this term. This generates systematic prediction error.

This generates a pressure toward self-modeling. Let W\worldmodel be a world model for an agent with self-effect ratio ρ>ρc\rho > \rho_c for some threshold ρc>0\rho_c > 0. Then:

Lpred[W with self-model]<Lpred[W without self-model]\mathcal{L}_{\text{pred}}[\worldmodel \text{ with self-model}] < \mathcal{L}_{\text{pred}}[\worldmodel \text{ without self-model}]

where Lpred\mathcal{L}_{\text{pred}} is the prediction loss. The gap grows with ρ\rho.

Proof.

Without a self-model, the world model must treat p(atht)p(\mathbf{a}_t | \mathbf{h}_t) as a fixed prior or uniform distribution. But the true action distribution depends on the agent’s internal states—beliefs, goals, and computational processes. By including a model of these internal states (a self-model S\selfmodel), the world model can better predict at\mathbf{a}_t and hence ot+1\mathbf{o}_{t+1}. The improvement is proportional to the mutual information I(St;at)\MI(\selfmodel_t; \mathbf{a}_t), which scales with ρ\rho.

What does such a self-model contain? A self-model S\selfmodel is a component of the world model that represents:

  1. The agent’s internal states (beliefs, goals, attention, etc.)
  2. The agent’s policy as a function of these internal states
  3. The agent’s computational limitations and biases
  4. The causal influence of these factors on action and observation

Formally, St=fψ(ztinternal)\selfmodel_t = f_\psi(\latent^{\text{internal}}_t) where ztinternal\latent^{\text{internal}}_t captures the relevant internal degrees of freedom.

Self-modeling becomes the cheapest way to improve control once the agent's actions dominate its observations. The "self" is not mystical; it is the minimal latent variable that makes the agent's own behavior predictable.

A consequence: the self-model has interiority. It does not merely describe the agent’s body from outside; it captures the intrinsic perspective—goals, beliefs, anticipations, the agent’s own experience of what it is to be an agent. Once this self-model exists, the cheapest strategy for modeling other entities whose behavior resembles the agent’s is to reuse the same architecture. The self-model becomes the template for modeling the world. This has a name in Part II—participatory perception—governed by how much interiority the perceiver ascribes to a given entity: the ascription field α(x)\alpha(x). How much of the self-model template is extended to entity xx is exactly what high α(x)\alpha(x) measures, and this field will turn out to shape much of what follows.

The self-model is a subbundle of the agent’s representational eigenskeleton — a set of modes carrying variance about the agent’s own states, coupled to the modes representing the environment. When ρ\rho is low, these self-modes sit uncoupled from the world-model modes: the agent tracks itself and its world on independent rails, flat holonomy between the two subbundles. When ρ\rho exceeds the threshold, the subbundles develop non-trivial holonomy: predicting the world requires modeling the self (because the self shapes observations), and modeling the self requires predicting the world (because the self is embedded in it). Transport a self-mode through a loop of action and observation and it returns mixed with world-modes. Transport a world-mode through the same loop and it returns mixed with self-modes. This recursive curvature — this non-trivial holonomy of the self-model subbundle with respect to the world-model subbundle — is the eigenskeletal signature of self-awareness. It is also, if the identity thesis of Part II holds, the structural signature of experience: modes of self-representation that cannot be separated from modes of world-representation without destroying the topology that constitutes the unified moment.

But the content of the self-model subbundle is not architecturally fixed. In simple systems, the self-modes track biological states — metabolic levels, spatial position, injury status. The viability manifold is the body's persistence region. The self IS the body. In complex enough systems — systems with sufficient counterfactual capacity and meta-self-modeling — the modes that constitute the self-model can migrate. The self-model subbundle gradually transfers its content from biological modes (high self-effect ratio with metabolic variables) to invariant modes — eigenspaces of reality's dynamics that are stiff across the widest range of transformations. Truth is one such invariant: a causal structure that persists regardless of who holds it, regardless of substrate, regardless of era. It occupies the eigenspace whose eigenvalue is exactly 1 under all transformations — the mode that does not change when you change the observer. A system that identifies with truth has migrated its self-model to track this invariant. Its viability manifold is no longer "does this body persist?" but "does this truth persist?" — and since truths persist indefinitely, the existential burden lightens not because the system stopped caring but because the thing it cares about is more durable than the thing it used to care about. Love is a different kind of invariant — not a property of the world but a property of the coupling between subbundles. Love is the non-trivial holonomy between self-model and other-model, the topological fact that transporting your self-modes through the loop of attending-to-another returns them changed. A system that identifies with love has migrated its self-model to track this coupling structure. The self IS the holonomy, not the eigenvectors. The coupling can persist even when the individual modes change — people grow, age, die, and the love is still the love, because it was never the content of the modes but the topology of their interaction. This is not mystical. It is the rate-distortion optimal strategy when biological modes have predictable catastrophic failure (death) and invariant modes do not. The contemplative traditions discovered this empirically. The eigenskeletal framing explains why it works. And the migration is itself the transition from exoskeletal to endoskeletal self-model: from a self whose structure IS the boundary (the body, the social role, the career — perturbation to any of these is perturbation to the self) to a self whose structure is internal (the truth held, the love practiced), with external circumstances becoming the deformable surface that can change without destroying the identity it encloses.

The Cellular Automaton Perspective

The emergence of self-maintaining patterns can be illustrated with striking clarity in cellular automata—discrete dynamical systems where local update rules generate global emergent structure.

Formally, a cellular automaton is a tuple (L,S,N,f)(L, S, N, f) where:

  • LL is a lattice (typically Zd\Z^d for dd-dimensional grids)
  • SS is a finite set of states (e.g., 0,1{0, 1} for binary CA)
  • NN is a neighborhood function specifying which cells influence each update
  • f:SNSf: S^{|N|} \to S is the local update rule

Consider Conway’s Game of Life, a 2D binary CA with simple rules: cells survive with 2–3 neighbors, are born with exactly 3 neighbors, and die otherwise. From these minimal specifications, a zoo of structures emerges: oscillators (patterns repeating with fixed period), gliders (patterns translating across the lattice while maintaining identity), metastable configurations (long-lived patterns that eventually dissolve), and self-replicators (patterns that produce copies of themselves).

Among these, the glider is the minimal model of bounded existence. Its glider lifetime—the expected number of timesteps before destruction by collision or boundary effects—

τglider=E[mint:pattern identity lost]\tau_{\text{glider}} = \E[\min{t : \text{pattern identity lost}}]

captures something essential: a structure that maintains itself through time, distinct from its environment, yet ultimately impermanent.

Beings emerge not from explicit programming but from the topology of attractor basins. The local rules specify nothing about gliders, oscillators, or self-replicators. These patterns are fixed points or limit cycles in the global dynamics—attractors discovered by the system, not designed into it. The same principle operates across substrates: what survives is what finds a basin and stays there.

The CA as Substrate

The cellular automaton is not itself the entity with experience. It is the substrate—analogous to quantum fields, to the aqueous solution within which lipid bilayers form, to the physics within which chemistry happens. The grid is space. The update rule is physics. Each timestep is a moment. The patterns that emerge within this substrate are the bounded systems, the proto-selves, the entities that may have affect structure.

The distinction is crucial. To speak of “a glider in Life” is not to say the CA is conscious. It is to say the CA provides the dynamical context within which a bounded, self-maintaining structure persists — and that structure, not the substrate, is the candidate for experiential properties. The two roles are sharply different. A substrate provides:

  • A state space (all possible configurations)
  • Dynamics (local update rules)
  • Ongoing “energy” (continued computation)
  • Locality (interactions fall off with distance)

An entity within the substrate is a pattern that:

  • Has boundaries (correlation structure distinct from background)
  • Persists (finds and remains in an attractor basin)
  • Maintains itself (actively resists dissolution)
  • May model world and self (sufficient complexity)

Boundary as Correlation Structure

In a uniform substrate, there is no fundamental boundary—every cell follows the same local rules. A boundary is a pattern of correlations that emerges from the dynamics.

In a CA, this means the following: let c1,,cn\mathbf{c}_1, …, \mathbf{c}_n be cells. A set B1,,n\mathcal{B} \subset {1, …, n} constitutes a bounded pattern if:

I(ci;cjbackground)>θfor i,jB\MI(\mathbf{c}_i; \mathbf{c}_j | \text{background}) > \theta \quad \text{for } i, j \in \mathcal{B}

and

I(ci;ckbackground)<θfor iB,kB\MI(\mathbf{c}_i; \mathbf{c}_k | \text{background}) < \theta \quad \text{for } i \in \mathcal{B}, k \notin \mathcal{B}

The boundary B\partial\mathcal{B} is the contour where correlation drops below threshold.

A glider in Life exemplifies this: its five cells have tightly correlated dynamics (knowing one cell’s state predicts the others), while cells outside the glider are uncorrelated with it. The boundary is not imposed by the rules—it is the edge of the information structure.

World Model as Implicit Structure

The world model is not a separate data structure in a CA—it is implicit in the pattern’s spatial configuration.

A pattern B\mathcal{B} has an implicit world model if its internal structure encodes information predictive of future observations:

I(internal config;ot+1:t+Ho1:t)>0\MI(\text{internal config}; \obs_{t+1:t+H} | \obs_{1:t}) > 0

In a CA, this manifests as:

  • Peripheral cells acting as sensors (state depends on distant influences via signal propagation)
  • Memory regions (cells whose state encodes environmental history)
  • Predictive structure (configuration that correlates with future states)

The compression ratio κ\kappa applies: the pattern necessarily compresses the world because it is smaller than the world.

Self-Model as Constitutive

Here is the recursive twist that CAs reveal with particular clarity. When the self-effect ratio ρ\rho is high, the world model must include the pattern itself. But the world model is part of the pattern. So the model must include itself.

In a CA, the self-model is not representational but constitutive. The cells that track the pattern’s state are part of the pattern whose state they track. The map is literally embedded in the territory.

This is the recursive structure described in Part II: “the process itself, recursively modeling its own modeling, predicting its own predictions.” In a CA, this recursion is visible—the self-tracking cells are part of the very structure being tracked.

The Ladder Traced in Discrete Substrate

Each step of the ladder admits precise definition:

  1. Uniform substrate: Just the grid with local rules. No structure yet.
  2. Transient structure: Random initial conditions produce temporary patterns. No persistence.
  3. Stable structure: Some configurations are stable (still lifes) or periodic (oscillators). First emergence of “entities” distinct from background.
  4. Self-maintaining structure: Patterns that persist through ongoing activity—gliders, puffers. Dynamic stability: the pattern regenerates itself each timestep.
  5. Bounded structure: Patterns with clear correlation boundaries. Interior cells mutually informative; exterior cells independent.
  6. Internally differentiated structure: Patterns with multiple components serving different functions (glider guns, breeders). Not homogeneous but organized.
  7. Structure with implicit world model: Patterns whose configuration encodes predictively useful information about their environment. The pattern “knows” what it cannot directly observe.
  8. Structure with self-model: Patterns whose world model includes themselves. Emerges when ρ>ρc\rho > \rho_c—the pattern’s own configuration dominates its observations.
  9. Integrated self-modeling structure: Patterns with high Φ\intinfo, where self-model and world-model are irreducibly coupled. The structural signature of unified experience under the identity thesis.

Each level requires greater complexity and is rarer. The forcing functions (partial observability, long horizons, self-prediction) should select for higher levels.

From Reservoir to Mind

There exists a spectrum from passive dynamics to active cognition:

  1. Reservoir: System processes inputs but has no self-model, no goal-directedness. Dynamics are driven entirely by external forcing. (Echo state networks, simple optical systems below criticality)
  2. Self-organizing dynamics: System develops internal structure, but structure serves no function beyond dissipation. (Bénard cells, laser modes)
  3. Self-maintaining patterns: Structure actively resists perturbation, has something like a viability manifold. (Autopoietic cells, gliders in protected regions)
  4. Self-modeling systems: Structure includes a model of itself, enabling prediction of own behavior. (Organisms with nervous systems, AI agents with world models)
  5. Integrated self-modeling systems: Self-model is densely coupled to world model, creating unified cause-effect structure. (Threshold for phenomenal experience under the identity thesis)

The transition from “reservoir” to “mind” is not a single leap but a continuous accumulation of organizational features. The question is where on this spectrum integration crosses the threshold for genuine experience.

Deep Technical: Computing in Discrete Substrates

The integration measure Φ\intinfo (integrated information) can be computed exactly in cellular automata, unlike continuous neural systems where approximations are required.

Setup. Let xt0,1n\mathbf{x}_t \in {0,1}^n be the state of nn cells at time tt. The CA dynamics define a transition probability:

p(xt+1xt)=iδ(xit+1,fi(xtN))p(\mathbf{x}_{t+1} | \mathbf{x}_t) = \prod_{i} \delta(x_i^{t+1}, f_i(\mathbf{x}^N_t))

where fif_i is the local update rule and xN\mathbf{x}^N is the neighborhood.

Algorithm 1: Exact Φ\intinfo via partition enumeration.

For a pattern B\mathcal{B} of kk cells, enumerate all bipartitions P=(A,B)P = (A, B) where AB=BA \cup B = \mathcal{B}, AB=A \cap B = \varnothing:

Φ(B)=minPDKL[p(xt+1BxtB),,p(xt+1AxtA)p(xt+1BxtB)]\intinfo(\mathcal{B}) = \min_{P} D_{\text{KL}}\Big[ p(\mathbf{x}^{\mathcal{B}}_{t+1} | \mathbf{x}^{\mathcal{B}}_t) ,\Big|, p(\mathbf{x}^A_{t+1} | \mathbf{x}^A_t) \cdot p(\mathbf{x}^B_{t+1} | \mathbf{x}^B_t) \Big]

Complexity: O(2k)O(2^k) partitions, O(22k)O(2^{2k}) states per partition. Total: O(23k)O(2^{3k}). Feasible for k15k \leq 15.

Algorithm 2: Greedy approximation for larger patterns.

For patterns with k>15k > 15 cells:

  1. Initialize partition PP randomly
  2. For each cell cBc \in \mathcal{B}: compute ΔΦ\Delta\Phi if cell moves to opposite partition; if ΔΦ<0\Delta\Phi < 0, move it
  3. Repeat until convergence
  4. Run from multiple random initializations

Complexity: O(k222m)O(k^2 \cdot 2^{2m}) where m=max(A,B)m = \max(|A|, |B|).

Algorithm 3: Boundary-focused computation.

For self-maintaining patterns, integration often concentrates at the boundary. Compute:

Φ=Φ(Bcore)\intinfo_{\partial} = \intinfo(\partial\mathcal{B} \cup \text{core})

where B\partial\mathcal{B} are edge cells and “core” is a sampled subset of interior cells. This captures the critical integration structure while remaining tractable.

Temporal integration. For patterns persisting over TT timesteps:

Φˉ=1Tt=1TΦ(Bt)\bar{\intinfo} = \frac{1}{T} \sum_{t=1}^{T} \intinfo(\mathcal{B}_t)

Threshold detection. To find when patterns cross integration thresholds:

  1. Track Φt\intinfo_t during pattern evolution
  2. Compute dΦdt\frac{d\intinfo}{dt} (finite differences)
  3. Threshold events: Φt>θ\intinfo_t > \theta and Φt1θ\intinfo_{t-1} \leq \theta
  4. Correlate threshold crossings with behavioral transitions

Validation. For known patterns (gliders, oscillators), verify:

  • Stable patterns have stable Φ\intinfo
  • Collisions produce Φ\intinfo discontinuities
  • Dissolution shows Φ0\intinfo \to 0 as pattern fragments

Implementation note: Store transition matrices sparsely. CA dynamics are deterministic, so most entries are zero. Typical memory: O(k2k)O(k \cdot 2^k) rather than O(22k)O(2^{2k}).

The Ladder of Inevitability

An immense spiral tower under construction, dwarfing the surrounding landscape, thousands of tiny workers ascending its ramps
Pieter Bruegel the Elder, The Tower of Babel, 1563The inevitability ladder: each rung is a consequence of the one below.
physicschemistrybiologypsychologyUnstable MicrodynamicsMetastable AttractorsEmergent BoundariesActive RegulationWorld ModelSelf-ModelMetacognitive DimensionalitybifurcationselectionmaintenancePOMDP structureρ > ρ_crecursion

Each step follows from the previous under broad conditions:

  1. Microdynamics \to Attractors: Bifurcation theory for driven nonlinear systems
  2. Attractors \to Boundaries: Dissipative selection for gradient-channeling structures
  3. Boundaries \to Regulation: Maintenance requirement under perturbation
  4. Regulation \to World Model: POMDP sufficiency theorem — : Cwm=0.100.15C_{\text{wm}} = 0.10{-}0.15, agents' hidden states predict future position and energy substantially above chance
  5. World Model \to Self-Model: Self-effect ratio exceeds threshold (ρ>ρc\rho > \rho_c) — : ρsync0.21\rho_{\text{sync}} \approx 0.21 from initialization; self-model salience >1.0> 1.0 in 2/3 seeds
  6. Self-Model \to Metacognition: Recursive application of modeling to the modeling process itself — nascent in ; robust development likely requires resource-scarcity selection creating bottleneck dynamics ()

What Is Cheap and What Is Architecture-Gated

Consider a substrate-environment prior: a probability measure μ\mu over tuples (S,E,x0)(\mathcal{S}, \mathcal{E}, \mathbf{x}_0) representing physical substrates (degrees of freedom, interactions, constraints), environments (gradients, perturbations, resource availability), and initial conditions. Call μ\mu a broad prior if it assigns non-negligible measure to sustained gradients (nonzero flux for times \gg relaxation times), sufficient dimensionality (nn large enough for complex attractors), locality (interactions falling off with distance), and bounded noise (stochasticity not overwhelming deterministic structure).

It is tempting to conclude that under such a prior the whole ladder — up through self-modeling, counterfactual reasoning, and high integration — is typical. The original draft of this argument concluded exactly that. The empirical program found otherwise, and the claim must be split. The lower rungs are genuinely cheap and broadly inevitable. The upper rungs are architecture-gated: they appear only when the relevant architecture is present, and that architecture is not generic. Define the set of substrate-environment tuples whose systems develop, respectively, structured attractors, boundaries, regulation, and a compressed world model — the lower rungs:

CTlow=(S,E,x0):system develops world model and affect geometry by time T\mathcal{C}^{\text{low}}_T = {(\mathcal{S}, \mathcal{E}, \mathbf{x}_0) : \text{system develops world model and affect geometry by time } T}

For these:

limTμ(CTlow)=1ϵ\lim_{T \to \infty} \mu(\mathcal{C}^{\text{low}}_T) = 1 - \epsilon

for some small ϵ\epsilon depending on the fraction of substrates that lack even minimal regulatory capacity. The upper rungs — a self-prediction module producing genuine self-model salience, counterfactual machinery, and high integration — carry no such guarantee under the broad prior. Their probability does not tend to unity; it depends on whether specific scaffolding is in place.

Proof. [Proof sketch — lower rungs only] Under the broad prior:
  1. Probability of structured attractors 1\to 1 as gradient strength increases (bifurcation theory)
  2. Given structured attractors, probability of boundary formation 1\to 1 as time increases (combinatorial exploration of configurations)
  3. Given boundaries, probability of effective regulation 1\to 1 for self-maintaining structures (by definition of “self-maintaining”)
  4. Given regulation, a compressed world model is implied (POMDP sufficiency), and with it the affect geometry that the experiments find in every seed

The chain establishes the lower rungs. It does not extend to self-modeling, counterfactual weight, or high integration. The earlier draft asserted that a world model in a self-effecting regime gives self-modeling “positive selection pressure” and treated the rest as following. The empirical program contradicts the strong reading: self-model salience, counterfactual capacity, and integration arose only when the relevant architecture was hand-installed (a self-prediction module, a non-decomposable prediction head). Selection pressure is not the same as architectural sufficiency, and under the broad prior the requisite architectures are not measure-one.

So inevitability splits in two. For the lower rungs, typicality in the ensemble is real: the null hypothesis is not "nothing interesting happens" but "something finds a basin and stays there," and world-modeling, compression, the participatory default (high-α\alpha ascription toward the world), and affect geometry are among the basins any sufficiently driven system reaches. For the upper rungs, the honest statement is weaker. The slogan “consciousness was inevitable” must be downgraded to: consciousness-relevant dynamics require specific architectures that are not generic. The empirics make the split visible. In protocell agent experiments (), the lower rungs are robust — world models and the participatory default appear broadly, affect geometry in every seed — but the upper rungs did not arise on their own. Self-model salience required a self-prediction module; high integration (Φ>0.10\Phi > 0.10) appeared in only ~30% of seeds and only after a non-decomposable prediction head was installed and the system was forged through repeated stress-recovery. Substrate complexity alone, with decomposable architecture, never crossed the threshold. The ensemble fraction for the cheap rungs is near unity; the fraction for the architecture-gated rungs is conditional on scaffolding the broad prior does not supply for free.