How DoorDash leverages LLMs for better search retrieval
DoorDash uses large language models to improve search accuracy by segmenting complex queries and linking them to structured knowledge graphs. This hybrid approach combines LLM contextual understanding with controlled vocabulary to enforce specific rules while maintaining flexibility for novel queries.
The Core Challenge
Users search with compound requirements: “vegan chicken sandwich” should return only items matching both dietary restrictions and protein preferences. Traditional embedding-based retrieval systems struggle with this specificity, often returning partially matching results like “chicken sandwiches” or “vegetarian sandwiches.” DoorDash needed a system that enforces strict rules for critical attributes while allowing flexibility for others.
System Architecture
Search engines process documents (items and stores) and queries through parallel pipelines. Documents receive metadata annotation before indexing. Queries undergo understanding steps including parsing, segmentation, and entity linking. DoorDash built knowledge graphs covering food items and retail products, creating taxonomies for cuisines, dish types, dietary preferences, brands, and product categories.
Query Segmentation with LLMs
Traditional methods like pointwise mutual information fall short on complex queries. For “turkey sandwich with cranberry sauce,” these systems cannot determine if “cranberry sauce” modifies “sandwich” or represents a separate item.
LLMs provide contextual understanding but hallucinate without constraints. DoorDash solved this by prompting models to map segments directly to taxonomy categories:
| |
This structured output reduces hallucinations below one percent while immediately categorizing segments for retrieval.
Entity Linking Through RAG
Once segmented, queries map to knowledge graph concepts. DoorDash uses retrieval-augmented generation:
- Generate embeddings for queries and taxonomy concepts
- Use approximate nearest neighbor retrieval to find 100 candidate labels per query
- Prompt the LLM to select the best match from candidates
For “no-milk,” the system retrieves candidates like “dairy-free” and “vegan,” then selects “dairy-free” based on context. This constrains LLM output to verified knowledge graph concepts.
Retrieval Logic
The final query understanding signal enables precise retrieval control. For “small no-milk vanilla ice cream,” the system matches:
| |
Retrieval logic makes dietary restrictions MUST conditions while treating flavors as SHOULD conditions. This ensures strict enforcement where needed with flexibility elsewhere.
Quality Control
Post-processing prevents hallucinations in final output. Annotators manually audit statistically significant samples from each batch, verifying segment accuracy and entity links. This maintains high precision for critical attributes like dietary preferences.
Balancing Memorization and Generalization
Batch inference on fixed query sets provides high accuracy but doesn’t scale. New queries emerge constantly in DoorDash’s dynamic environment. The solution combines LLM precision with methods that generalize to unseen queries:
- Lightweight heuristics
- BM25 statistical matching
- Embedding retrieval
This hybrid approach maintains adaptability while leveraging LLM contextual understanding.
Integration with Ranking
The ranker orders retrieved items by relevance. After introducing new query understanding signals, DoorDash retrained rankers with more comprehensive engagement data. This alignment between retrieval precision and ranking capability drove metric improvements.
Results
The Popular Dishes carousel shows the system’s impact. For queries like “açaí bowl,” the carousel displays relevant dishes across multiple stores for quick comparison.
Improvements delivered:
- 30% increase in carousel trigger rate
- 2% improvement in whole page relevance for dish queries
- 1.6% additional relevance gain from retrained ranker
- Increased same-day conversions and marketplace value
Implementation Details
The system handles version-specific considerations. When rolling out improvements, teams monitor how rankers adapt to new retrieval patterns and consumer engagement signals. Full benefits materialize as all components align.
Future Applications
Better query and catalog understanding enables:
- Query rewriting and alternative search path suggestions
- Popular item recommendations for new users in specific markets
- Improved recall through more granular attribute coverage
- Consumer behavior profiling based on attribute preferences
Key Takeaway
Combining LLMs with knowledge graphs and flexible retrieval solves the precision-recall tradeoff in complex search scenarios. Constrain LLM output through controlled vocabularies and RAG techniques. Build hybrid systems that leverage multiple retrieval methods. Align all pipeline components—from understanding to ranking—to maximize impact.
Start with structured taxonomies in your domain. Use them to guide LLM segmentation and linking. Test incrementally, measuring both precision and generalization to unseen queries.