POMDP Formalization
POMDP Formalization
The situation of a bounded system under uncertainty admits precise formalization as a Partially Observable Markov Decision Process (POMDP).
The POMDP framework connects this analysis to several established research programs:
- Active Inference (Friston et al., 2017): Organisms as inference machines that minimize expected free energy through action. The “belief state sufficiency” result here is their “Bayesian brain” hypothesis formalized.
- Predictive Processing (Clark, 2013; Hohwy, 2013): The brain as a prediction engine, with perception as hypothesis-testing. The world model is their “generative model.”
- Good Regulator Theorem (Conant \& Ashby, 1970): Every good regulator of a system must be a model of that system. The belief state sufficiency result above is a POMDP-specific instantiation.
- Embodied Cognition (Varela, Thompson \& Rosch, 1991): Cognition as enacted through sensorimotor coupling. My emphasis on the boundary as the locus of modeling aligns with enactivist insights.
Formally, a POMDP is a tuple where:
- : State space (true world state, including system interior)
- : Action space
- : Observation space
- : Transition kernel,
- : Observation kernel,
- : Reward function
- : Discount factor
The agent does not observe directly but only . The sufficient statistic for decision-making is the belief state—the posterior distribution over world states given the history:
The belief state updates via Bayes’ rule:
A classical result establishes that is a sufficient statistic for optimal decision-making: any optimal policy can be written as , mapping belief states to actions.
This establishes that any system that performs better than random under partial observability is implicitly maintaining and updating a belief state—i.e., a model of the world.