AutoTTS: Automated Discovery of Test-Time Scaling Strategies for Large Language Models

Large language models waste computation during inference because researchers manually design test-time scaling strategies. AutoTTS changes this by automatically discovering optimal strategies that improve accuracy while reducing costs.

The Manual Design Problem

Test-time scaling (TTS) improves LLM performance by allocating additional computation during inference. Current approaches require researchers to manually craft reasoning patterns and tune heuristics by intuition. This leaves most of the computation-allocation space unexplored and creates suboptimal strategies.

Manual design forces researchers to guess which combinations of branching, pruning, and stopping will work best. These guesses often miss better solutions that automated discovery can find.

AutoTTS Framework

AutoTTS replaces manual heuristic design with environment-driven automated discovery. Instead of designing individual TTS strategies, researchers design environments where optimal strategies emerge automatically.

The framework formulates width-depth TTS as controller synthesis over pre-collected reasoning trajectories. Controllers decide when to:

Branch reasoning paths
Continue current paths
Probe for quality signals
Prune weak paths
Stop computation

This approach evaluates strategies cheaply without repeated LLM calls, making discovery tractable.

Key Technical Innovations

Beta Parameterization: Makes the search space tractable by constraining controller parameters within reasonable bounds.

Fine-grained Execution Traces: Provides detailed feedback that helps agents diagnose why specific TTS programs fail, improving discovery efficiency.

Environment Construction: Creates discovery environments with tractable control spaces and frequent, cheap feedback for TTS search.

Implementation Results

AutoTTS discovered strategies that outperform manually designed baselines on mathematical reasoning benchmarks. The automated approach:

Improves accuracy-cost tradeoffs over strong manual baselines
Generalizes to held-out benchmarks and different model scales
Completes discovery in 160 minutes for $39.90

The discovered strategies work across different problem types and model sizes, demonstrating robust generalization beyond training conditions.

Development Impact

AutoTTS shifts LLM optimization from manual strategy design to automated discovery. This approach:

Reduces development time from weeks to hours
Explores strategy spaces humans cannot efficiently search
Produces strategies that generalize across benchmarks and models
Costs less than manual experimentation cycles

Next Steps

Implement AutoTTS by setting up the discovery environment for your specific use case. Define your reasoning trajectory collection process, establish probe signals for quality assessment, and configure the controller synthesis parameters. The framework’s modular design allows adaptation to different reasoning tasks beyond mathematical problems.

AutoTTS: Automated Discovery of Test-Time Scaling Strategies for Large Language Models

AutoTTS: Automated Discovery of Test-Time Scaling Strategies for Large Language Models

The Manual Design Problem

AutoTTS Framework

Key Technical Innovations

Implementation Results

Development Impact

Next Steps

AutoTTS: Automated Discovery of Test-Time Scaling Strategies for Large Language Models

RustAssistant: Using Large Language Models to Automatically Fix Rust Compilation Errors

Semantic Search Without Embeddings: Hierarchical Taxonomies and BM25