Experiments

The Decomposability Wall

V22-V24 showed that prediction targets don't matter when the readout head is linear. V27 showed that a 2-layer MLP breaks through. V28 confirmed the mechanism: gradient coupling through composition ( $W_2^\top W_1^\top$ ), not nonlinearity or bottleneck width. This is the decomposability wall — the second architectural barrier (after the sensory-motor wall).

Two walls, two breaks:

$\rho$ wall (V13-V18 → V20): Action-observation loop. Broken by genuine agency.
Decomposability wall (V22-V24 → V27): 2-layer gradient coupling. Broken by non-linear prediction head.

Both walls are architectural. Neither can be overcome by more training data, better targets, or richer environments. The path to high integration requires specific computational structures.

The Decomposability Wall — why composition matters. Left: linear head sends independent gradients to each hidden unit (decomposable,

\intinfo \approx 0.08

). Right: MLP head couples all hidden units through shared intermediate layer (integrated,

\intinfo \approx 0.25

). The key is gradient coupling through composition — not nonlinearity, not bottleneck width.

Proto-self signatures across V22-V24 — **Proto-self signatures across V22–V24.** Six metrics tracked over evolution for all 9 runs (3 seeds × 3 experiments). Top-left: effective rank drops at drought boundaries but recovers — states are moderately rich (4–14 dimensions). Top-center: affect motif clustering (silhouette) is mostly negative to near-zero — no behavioral modes emerge with linear readouts. Top-right: energy decoding R² is very low (0–0.2 at best) — hidden states do NOT cleanly encode energy despite the gradient specifically targeting energy prediction. Bottom row: resource decoding, hidden state diversity, activity variation — all noisy without clear trends. These are the signatures of proto-self *failing* to emerge under linear readout architectures. Compare with V27 (MLP head) where silhouette reaches 0.34 and behavioral modes appear for the first time.