How DoorDash leverages LLMs for better search retrieval

DoorDash uses large language models to improve search accuracy by segmenting complex queries and linking them to structured knowledge graphs. This hybrid approach combines LLM contextual understanding with controlled vocabulary to enforce specific rules while maintaining flexibility for novel queries.

The Core Challenge

Users search with compound requirements: “vegan chicken sandwich” should return only items matching both dietary restrictions and protein preferences. Traditional embedding-based retrieval systems struggle with this specificity, often returning partially matching results like “chicken sandwiches” or “vegetarian sandwiches.” DoorDash needed a system that enforces strict rules for critical attributes while allowing flexibility for others.

System Architecture

Search engines process documents (items and stores) and queries through parallel pipelines. Documents receive metadata annotation before indexing. Queries undergo understanding steps including parsing, segmentation, and entity linking. DoorDash built knowledge graphs covering food items and retail products, creating taxonomies for cuisines, dish types, dietary preferences, brands, and product categories.

Query Segmentation with LLMs

Traditional methods like pointwise mutual information fall short on complex queries. For “turkey sandwich with cranberry sauce,” these systems cannot determine if “cranberry sauce” modifies “sandwich” or represents a separate item.

LLMs provide contextual understanding but hallucinate without constraints. DoorDash solved this by prompting models to map segments directly to taxonomy categories:

1
2
3
4
5
6
{
  "Quantity": "small",
  "Dietary_Preference": "no-milk",
  "Flavor": "vanilla",
  "Product_Category": "ice cream"
}

This structured output reduces hallucinations below one percent while immediately categorizing segments for retrieval.

Entity Linking Through RAG

Once segmented, queries map to knowledge graph concepts. DoorDash uses retrieval-augmented generation:

Generate embeddings for queries and taxonomy concepts
Use approximate nearest neighbor retrieval to find 100 candidate labels per query
Prompt the LLM to select the best match from candidates

For “no-milk,” the system retrieves candidates like “dairy-free” and “vegan,” then selects “dairy-free” based on context. This constrains LLM output to verified knowledge graph concepts.

Retrieval Logic

The final query understanding signal enables precise retrieval control. For “small no-milk vanilla ice cream,” the system matches:

1
2
3
4
5
{
  "Dietary_Preference": "Dairy-Free",
  "Flavor": "Vanilla",
  "Product_Category": "Ice cream"
}

Retrieval logic makes dietary restrictions MUST conditions while treating flavors as SHOULD conditions. This ensures strict enforcement where needed with flexibility elsewhere.

Quality Control

Post-processing prevents hallucinations in final output. Annotators manually audit statistically significant samples from each batch, verifying segment accuracy and entity links. This maintains high precision for critical attributes like dietary preferences.

Balancing Memorization and Generalization

Batch inference on fixed query sets provides high accuracy but doesn’t scale. New queries emerge constantly in DoorDash’s dynamic environment. The solution combines LLM precision with methods that generalize to unseen queries:

Lightweight heuristics
BM25 statistical matching
Embedding retrieval

This hybrid approach maintains adaptability while leveraging LLM contextual understanding.

Integration with Ranking

The ranker orders retrieved items by relevance. After introducing new query understanding signals, DoorDash retrained rankers with more comprehensive engagement data. This alignment between retrieval precision and ranking capability drove metric improvements.

Results

The Popular Dishes carousel shows the system’s impact. For queries like “açaí bowl,” the carousel displays relevant dishes across multiple stores for quick comparison.

Improvements delivered:

30% increase in carousel trigger rate
2% improvement in whole page relevance for dish queries
1.6% additional relevance gain from retrained ranker
Increased same-day conversions and marketplace value

Implementation Details

The system handles version-specific considerations. When rolling out improvements, teams monitor how rankers adapt to new retrieval patterns and consumer engagement signals. Full benefits materialize as all components align.

Future Applications

Better query and catalog understanding enables:

Query rewriting and alternative search path suggestions
Popular item recommendations for new users in specific markets
Improved recall through more granular attribute coverage
Consumer behavior profiling based on attribute preferences

Key Takeaway

Combining LLMs with knowledge graphs and flexible retrieval solves the precision-recall tradeoff in complex search scenarios. Constrain LLM output through controlled vocabularies and RAG techniques. Build hybrid systems that leverage multiple retrieval methods. Align all pipeline components—from understanding to ranking—to maximize impact.

Start with structured taxonomies in your domain. Use them to guide LLM segmentation and linking. Test incrementally, measuring both precision and generalization to unseen queries.

How DoorDash leverages LLMs for better search retrieval

How DoorDash leverages LLMs for better search retrieval

The Core Challenge

How DoorDash leverages LLMs for better search retrieval

The Core Challenge

System Architecture

Query Segmentation with LLMs

Entity Linking Through RAG

Retrieval Logic

Quality Control

Balancing Memorization and Generalization

Integration with Ranking

Results

Implementation Details

Future Applications

Key Takeaway

AI Engineering Essentials: A High-Level Summary of Chip Huyen's Book

Building DoorDash’s product knowledge graph with large language models

Introducing EmbeddingGemma: The Best-in-Class Open Model for On-Device Embeddings