PokéDreamer - Model-Based RL in Pokémon Red

Overview

What is PokéDreamer?

PokéDreamer is a research project that teaches an agent to build an internal world model of Pokémon Red - then reason, plan, and act inside that imagined world rather than the real emulator.

🧠

World Model

A neural network that learns the transition dynamics of the game: given the current pixel observation and an action, predict the next frame - entirely in a compressed latent space.

🎮

Imagination-Based Planning

Instead of repeatedly stepping the emulator to explore, the agent "dreams" future trajectories inside the world model at high speed. The RSSM prior enables pure latent rollouts.

📡

Dreamer Lineage

Architecturally inspired by Hafner et al.'s DreamerV2/V3 - using a Recurrent State-Space Model (RSSM) with discrete categorical latents and Gumbel-Softmax straight-through gradients.

🔬

Research-Grade

Built from scratch to understand every component - from the VAE bottleneck to KL-balancing in the RSSM. Not a wrapper around existing frameworks.

Evolution

The Research Journey

From symbolic state probes to pixel-level imagination - three generations of world models.

v1

Completed ✓

VAE + GRU MPC Planner

The first world model - a Variational Autoencoder compressing frames to 32-dim latents, with a GRU dynamics model predicting the next latent. An MPC planner imagines action sequences and picks the best path using linear probes for map position.

40×36Resolution

R³²Latent Space

98.7%Map Accuracy

1.23 tilesCoord Error

Full v1 Documentation →

v2

Completed ✓

Discrete RSSM World Model

The major leap - native 160×144 resolution, a 4-layer Residual CNN encoder, and a full Recurrent State-Space Model with 32×32 discrete categorical latents trained via Gumbel-Softmax. The RSSM's prior enables pure imagination rollouts without any emulator steps.

160×144Resolution

1024-dimDiscrete Latent

0.1003Best Val Recon

4 epochsTrained

Full v2 Documentation →

v3

In Planning 🚀

Dual-Agent SOTA System

The vision: System 1 (a PPO/GRPO actor trained inside RSSM imagination) directed by System 2 (a multimodal LLM strategic planner). The first version where "policy trained inside imagination" becomes the explicit target - approaching DIAMOND-style pixel-fidelity world modeling.

≥2Target Badges

<5 tilesImagination Drift

<200MBTarget Model Size

LLM+RLDual Agent

Full v3 Roadmap →

Results

Key Experimental Findings

Quantitative results from training and evaluation across both completed versions.

📈 v1: Scheduled Sampling vs. Teacher Forcing - Rollout Drift

Measured over 14,564 validation trajectories of length 29 steps. Scheduled sampling (SS) is critical: the SS model's compounding drift stays flat under 3.5 tiles out to 29 imagined steps, while the pure teacher-forcing (TF) ablation exceeds 10.4 tiles - a 3× degradation.

Rollout Step	SS Latent MSE	TF Latent MSE	SS Tile Error	TF Tile Error	Improvement
Step 1	0.09268	0.12240	3.72 tiles	4.06 tiles	1.09×
Step 5	0.08751	0.23061	3.32 tiles	5.08 tiles	1.53×
Step 10	0.08842	0.40189	3.30 tiles	6.47 tiles	1.96×
Step 15	0.09238	0.59876	3.33 tiles	7.81 tiles	2.35×
Step 20	0.09830	0.78455	3.36 tiles	9.13 tiles	2.72×
Step 25	0.10606	0.92376	3.42 tiles	9.72 tiles	2.84×
Step 29	0.11197	1.04063	3.47 tiles	10.44 tiles	3.01×

🌟 v2: RSSM World Model Training Progression

4 epochs on 20 NPZ files (~16,000 native-resolution transitions). Batch size 64, sequence length 15. Reconstruction loss steadily decreases - the best checkpoint (epoch 4, val recon = 0.1003) demonstrates pixel-level world modeling at native Game Boy resolution.

Epoch	Train Loss	Train Recon	Train KL	Val Loss	Val Recon	Val KL
1	0.1476	0.1379	0.0078	0.1266	0.1256	0.0010
2	0.1207	0.1144	0.0063	0.1172	0.1110	0.0062
3	0.1490	0.1068	0.0422	0.1228	0.1142	0.0086
4	0.1021	0.1015	0.0005	0.1651	0.1003 ★	0.0648

Architecture

v1 vs v2 Architecture Comparison

How the world model evolved across versions - from continuous latents and teacher forcing to discrete RSSM with KL-balancing.

Component	v1 (VAE + GRU)	v2 (Discrete RSSM)
Resolution	40×36 pixels (PWhiddy downsampled)	160×144 pixels (native Game Boy)
Encoder	Variational Autoencoder (VAE)	4-layer Residual CNN → 512-dim embed
Latent Space	Continuous R³² (reparameterization)	Discrete 32×32 categorical (1024-dim)
Dynamics	Autoregressive GRU (scheduled sampling)	RSSM: GRU h_t (512) + Stochastic s_t
Gradient Estimator	Reparameterization trick	Gumbel-Softmax straight-through
KL Balancing	Standard ELBO	80% prior / 20% posterior balancing
Imagination	Prior-only rollout	Full posterior + prior with KL balancing
Decoders	Pixel decoder only	Pixel + Reward predictor + Continue predictor
Controller	Lookahead MPC planner (coordinate probe)	Actor-Critic trained in imagination

Get Started

Quick Start Guide

Get PokéDreamer running in minutes. You'll need a legally-obtained Pokémon Red ROM.

01

Clone & Install

bash

git clone https://github.com/xoTEMPESTox/PokeDreamer.git
cd PokeDreamer
conda env create -f environment.yml
conda activate pokemon-rl

02

Place ROM & Collect Data

Copy your legally-obtained Pokemon - Red Version (USA, Europe).gb to the project root. Alternatively, download the pre-collected dataset from Hugging Face.

bash

# Collect data (optional - dataset available on HF)
python scripts/collect_data.py --episodes 20 --out-dir data

03

Train the RSSM World Model

bash

python scripts/train_rssm.py \
    --data-dir data \
    --epochs 12 \
    --batch-size 64 \
    --out-dir checkpoints/rssm_v2

04

Generate Imagination Demo

Renders a side-by-side video comparing the real emulator vs. the RSSM's imagined frames.

bash

python scripts/generate_demo_video_v2.py \
    --checkpoint checkpoints/rssm_v2/best_world_model.pt \
    --save-state saves/intro_done.state \
    --out-video checkpoints/rssm_v2/side_by_side_demo_v2.mp4

Resources

Models, Datasets & References

🤖

Model Checkpoints

RSSM v2 best checkpoint (best_world_model.pt), v1 VAE and dynamics checkpoints - all on Hugging Face.

HuggingFace Models →

📦

Transition Dataset

20 NPZ files of native-resolution (160×144) gameplay transitions with full RAM state annotations. ~340MB total.

HuggingFace Dataset →

💻

Source Code

Full Python source - models, dataset loader, game state extractor, training scripts, and demo video generator.

GitHub Repository →

📚

Code Reference

Detailed documentation of every module: models.py, dataset.py, game_state.py, ram_addresses.py, and all scripts.

API Docs →

📖 References & Credits

DreamerV2/V3 Hafner et al. (2021, 2023) - Mastering Atari with Discrete World Models / DreamerV3. Primary RSSM architecture inspiration.
PWhiddy PPO PokemonRedExperiments - Used as the data collection policy for gathering gameplay transitions.
PyBoy Baekalfen/PyBoy - Game Boy emulator used for environment interaction and screen capture.
DIAMOND DIAMOND - Diffusion As a Model Of eNvironment Dreams. Long-term architectural reference point for v3.