Scaling Latent Reasoning via Looped Language Models

Large language models traditionally scale through increasing parameters, data, and compute. This paper introduces a third scaling dimension: iterative computation depth through parameter reuse. The authors present Ouro, a family of Looped Language Models (LoopLMs) that achieve 2-3× parameter efficiency by implementing recurrent computation with shared weights.

The Core Innovation

LoopLMs apply the same transformer layers multiple times in sequence, creating deeper computation without additional parameters. Unlike chain-of-thought reasoning that extends output sequences, LoopLMs deepen internal processing while maintaining fixed context length.

The architecture includes:

Shared transformer blocks applied T times recurrently
Adaptive exit gates that learn when to stop iterating
Entropy-regularized training to prevent collapse to single depths

Key Results

The 1.4B and 2.6B Ouro models match performance of 4B and 8B standard transformers across reasoning benchmarks:

MMLU-Pro: Ouro-2.6B achieves 55.73 vs 53.72 for Qwen3-8B
BBH: Ouro-2.6B reaches 80.46 vs 77.65 for Qwen3-8B
MATH500: Ouro-1.4B scores 82.40 vs 59.60 for Qwen3-4B

Performance scales predictably with recurrent depth, peaking around the trained maximum of 4 steps.

Training Methodology

The training uses a two-stage approach:

Stage I optimizes an entropy-regularized objective:

L = Σ p(t|x) L^(t) - β H(p(·|x))

where p(t|x) is the learned exit distribution and L^(t) is loss at step t.

Stage II fine-tunes exit gates using performance improvement signals, teaching the model when additional computation helps.

Understanding the Mechanism

Controlled experiments reveal LoopLMs don’t increase knowledge storage capacity (both looped and standard models achieve ~2 bits per parameter). Instead, they excel at knowledge manipulation - composing facts and multi-step reasoning.

On synthetic tasks requiring knowledge composition:

Mano arithmetic: LoopLMs outperform iso-parameter baselines
Multi-hop QA: LoopLMs learn with fewer training examples
MMLU analysis: Largest gains appear in reasoning-heavy categories (logic, math) rather than knowledge-heavy ones (facts, trivia)

Practical Benefits

Inference Efficiency: KV cache sharing during decoding reduces memory by 4× with minimal performance loss.

Safety: Model safety improves with additional recurrent steps, even when extrapolating beyond training depth.

Faithfulness: Unlike chain-of-thought, LoopLM’s latent reasoning shows genuine decision revision across steps rather than post-hoc rationalization.

Implementation Details

The models use standard transformer architecture with:

RoPE positional embeddings
SwiGLU activations
Sandwich normalization for stability
49,152-token vocabulary

Training spans 7.7T tokens across four stages, progressing from web data to high-quality reasoning datasets.

Implications

This work establishes recurrent depth as a viable third scaling axis beyond parameters and data. The approach offers particular value for deployment scenarios requiring parameter efficiency while maintaining reasoning capability.

The results suggest that architectural innovation through parameter reuse can achieve scaling benefits traditionally requiring larger models, opening new directions for efficient language model development.

Scaling Latent Reasoning via Looped Language Models

Scaling Latent Reasoning via Looped Language Models

The Core Innovation

Key Results

Training Methodology

Understanding the Mechanism

Practical Benefits

Implementation Details

Implications

GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

From Hierarchy to Intelligence: How AI is Reshaping Organizational Design

How We Built Our Multi-Agent Research System