LLMDFA: Using Large Language Models for Compilation-Free Dataflow Analysis
Traditional dataflow analysis requires successful compilation and expert customization, limiting its use on broken code and evolving analysis needs. LLMDFA introduces a framework that leverages Large Language Models to perform dataflow analysis without compilation while achieving 87.10% precision and 80.77% recall for bug detection.
The Compilation Problem
Dataflow analysis identifies how values flow through programs to detect bugs and security vulnerabilities. Traditional tools fail when code doesn’t compile—a common scenario during development, code review, or when analyzing incomplete codebases. Developers need analysis capabilities that work on any code, regardless of compilation status.
How LLMDFA Works
LLMDFA decomposes complex dataflow analysis into manageable subtasks that LLMs can handle reliably:
Task Decomposition Strategy
The framework breaks analysis into three core components:
- Value extraction: Identifies program variables and expressions of interest
- Path analysis: Determines which execution paths are feasible
- Dataflow summarization: Tracks how values flow between program points
Mitigating LLM Hallucinations
LLMDFA addresses LLM reliability issues through external tool integration:
| |
The system generates code that outsources delicate reasoning to proven tools like parsing libraries and automated theorem provers rather than relying solely on LLM reasoning.
Few-Shot Chain-of-Thought Prompting
For function-level analysis, LLMDFA uses structured prompting that aligns LLMs with program semantics:
- Provide examples of correct dataflow analysis
- Break down reasoning into explicit steps
- Focus on small code snippets to reduce complexity
- Validate results against known patterns
Performance Results
LLMDFA demonstrates significant improvements over existing approaches:
- Precision: 87.10% average across test cases
- Recall: 80.77% for bug detection scenarios
- F1 Score: Up to 0.35 improvement over traditional methods
The framework successfully detects three representative bug types in synthetic programs and identifies custom vulnerabilities in real-world Android applications.
Implementation Advantages
Compilation-Free Operation
You can analyze code immediately without setting up build environments, resolving dependencies, or fixing compilation errors. This enables analysis during:
- Code review processes
- Incomplete development phases
- Legacy codebase exploration
- Security audits of third-party code
Customizable Analysis
Unlike traditional tools that require expert configuration, LLMDFA adapts to new analysis requirements through natural language descriptions. You specify what to look for, and the system generates appropriate analysis logic.
Getting Started
The LLMDFA framework is open-sourced and available for immediate use. The system works with existing codebases without modification and integrates into standard development workflows.
Start by identifying specific dataflow patterns or vulnerabilities you want to detect in your codebase. LLMDFA handles the complexity of analysis generation and execution, providing actionable results without requiring compilation or expert tool configuration.