Keyword Search is All You Need: Achieving RAG-Level Performance Without Vector Databases

Researchers at Amazon Web Services have discovered that simple keyword search tools can match the performance of complex vector database systems in document question-answering tasks. Their study shows that agentic keyword search achieves over 90% of traditional RAG performance while being simpler and more cost-effective.

The Problem with Traditional RAG

Retrieval-Augmented Generation (RAG) systems combine large language models with external knowledge bases to reduce hallucinations and improve factual accuracy. However, RAG presents significant challenges:

High maintenance overhead: Vector databases require frequent updates and substantial infrastructure
Integration complexity: Setting up and maintaining embeddings, chunking strategies, and retrieval pipelines
Cost burden: Especially problematic for organizations with rapidly changing knowledge bases

The Agentic Alternative

The research team developed an agent-based approach that uses basic Linux command-line tools instead of vector databases. Their system leverages:

PDF metadata analysis: Understanding document structure before searching
RipGrep-All (rga): Regex-based pattern matching across multiple file types
PDFGrep: PDF-specific search with page-range targeting
Iterative refinement: Agents modify search strategies based on results

The agent follows a simple workflow: analyze available documents, perform broad keyword searches, then use targeted searches with error handling and automatic retry mechanisms.

Experimental Results

Testing across six diverse datasets revealed impressive performance:

Average Attainment Scores (vs. RAG baseline):

Faithfulness: 94.52%
Context Recall: 88.05%
Answer Correctness: 91.48%

Standout Performance:

BlockchainSolana dataset: 99.97% answer correctness
LLM Survey paper: 99.51% answer correctness
FinanceBench dataset: 6 percentage point improvement over traditional RAG

The keyword search approach performed particularly well on technical documentation and complex financial documents, where active search capabilities outperformed static chunk-based retrieval.

Implementation Advantages

The agentic approach offers several practical benefits:

Simplicity: No vector database setup or embedding model management required

Cost-effectiveness: Eliminates infrastructure costs for maintaining large-scale vector stores

Flexibility: Adapts to new document types without retraining or knowledge base updates

Real-time capability: Searches current documents without preprocessing delays

Limitations and Considerations

The research identified several constraints:

Large document performance: Degradation with very large files
Context window limits: Bounded by LLM token constraints
Multimedia handling: Limited to text-based content
Contextual nuance: May miss subtle semantic relationships that embeddings capture

When to Choose Keyword Search

This approach works best for:

Frequently updated knowledge bases: Where vector database maintenance becomes burdensome
Resource-constrained environments: Where infrastructure costs matter
Technical documentation: Where precise term matching is crucial
Rapid prototyping: When you need quick results without complex setup

Implementation Guide

To implement this approach:

Set up agent framework: Use LangChain with ReAct reasoning
Configure search tools: Install rga, pdfgrep, and metadata extraction scripts
Design search strategy: Start with metadata analysis, then iterative keyword searches
Add error handling: Implement retry mechanisms for failed searches
Optimize context extraction: Use surrounding text capture (-C flag) for better context

The Bottom Line

This research challenges the assumption that vector databases are essential for high-quality document retrieval. For many applications, especially those requiring frequent updates or operating under resource constraints, agentic keyword search provides a compelling alternative that’s both simpler to implement and maintain.

The 90%+ performance achievement suggests that semantic search may be less critical than previously thought for many document QA tasks. Consider starting with keyword search for your next RAG project—you might find it’s all you need.

Keyword Search is All You Need: Achieving RAG-Level Performance Without Vector Databases Using Agentic Tool Use