Janus-Q: Event-Driven Trading with Hierarchical Reward Modeling
Financial markets react to discrete events—earnings announcements, mergers, regulatory changes—that traditional time-series models struggle to capture. Most trading systems treat news as auxiliary information rather than the primary driver of market movements. Janus-Q changes this approach by making financial events the core decision unit.
The Event-First Problem
Current trading systems face two fundamental challenges. First, they lack datasets that connect specific events to measurable market impacts. Second, language models can interpret news semantically but struggle to translate that understanding into profitable trading decisions.
Consider how markets actually move: A company announces unexpected earnings, and the stock jumps 5%. A regulatory warning emerges, and shares plummet 8%. These aren’t gradual trends—they’re discrete shocks that require event-specific responses.
Building an Event-Centric Dataset
Janus-Q starts with a comprehensive dataset of 62,400 financial news articles, each annotated with:
- Event type: 10 categories from dividend announcements to risk warnings
- Market impact: Cumulative abnormal returns (CAR) that isolate event-driven price movements
- Semantic labels: Direction and trading strength indicators
- Company context: Industry and firm-specific information
The CAR calculation removes broad market movements and systematic factors, revealing the pure impact of each event. This creates ground truth for training models to recognize which events matter and how much.
Hierarchical-Gated Reward Modeling
The core innovation lies in HGRM—a structured reward system that aligns language model reasoning with trading reality. Instead of optimizing for simple profit, HGRM enforces a hierarchy of constraints:
Hard Gate: Direction correctness blocks all rewards when the predicted market direction is wrong. No profit from lucky trades with incorrect reasoning.
Soft Gate: Event-type consistency discounts rewards when the model misclassifies the event category. Understanding what happened matters for sustainable performance.
Trading Rewards: Cost-aware profit calculation includes transaction costs and regularizes trading frequency to prevent overtrading.
Process Rewards: Magnitude accuracy and reasoning quality ensure the model learns robust patterns rather than exploiting noise.
This hierarchy prevents the model from gaming the system while encouraging economically meaningful decision-making.
Training Process
Janus-Q uses a two-stage approach:
- Supervised Fine-tuning: Establishes basic event-to-market mappings using the annotated dataset
- Reinforcement Learning: Optimizes trading decisions using HGRM to balance multiple objectives
The reinforcement phase uses Group Relative Policy Optimization (GRPO) guided by the hierarchical reward structure. This prevents the model from collapsing into degenerate strategies like always trading or never trading.
Performance Results
Janus-Q consistently outperforms market indices and competing models:
- Direction accuracy: 17.5% improvement over the best time-series model
- Sharpe ratio: 102% improvement over the runner-up strategy
- Event classification: 18.2% better at identifying event types
During a challenging market period with broad corrections, Janus-Q maintained positive returns while most baselines suffered persistent drawdowns. The model captured sharp upswings in late December and sustained growth thereafter.
Key Insights
Event heterogeneity matters: Risk warnings and violations generate larger market reactions than routine announcements. The model learns to weight events by their historical impact magnitude.
Semantic understanding transfers: Human evaluators agreed with Janus-Q’s event interpretations 74-83% of the time, indicating robust semantic reasoning.
Hierarchical rewards work: Removing any component of HGRM degraded performance, with direction gates being most critical for preventing spurious profits.
Implementation Considerations
The framework requires careful position management. Unlimited position ratios can lead to excessive exposure as events overlap. Moderate constraints (2-3x leverage) provide the best balance between capital efficiency and risk control.
Holding periods matter significantly. Event impacts decay quickly, so shorter horizons (1-2 days) typically outperform longer ones. This aligns with the transient nature of news-driven market movements.
Next Steps
Janus-Q demonstrates that treating events as primary decision units improves both interpretability and profitability. The hierarchical reward structure provides a template for aligning language models with domain-specific objectives beyond simple accuracy metrics.
Future work could extend this approach to multi-asset portfolios, incorporate richer event structures, and adapt the framework to different markets and asset classes.
The key insight remains: financial markets are event-driven, and trading systems should be too.