LLMDFA: Using Large Language Models for Compilation-Free Dataflow Analysis

LLMDFA: Using Large Language Models for Compilation-Free Dataflow Analysis

Traditional dataflow analysis requires successful compilation and expert customization, limiting its use on broken code and evolving analysis needs. LLMDFA introduces a framework that leverages Large Language Models to perform dataflow analysis without compilation while achieving 87.10% precision and 80.77% recall for bug detection.

The Compilation Problem

Dataflow analysis identifies how values flow through programs to detect bugs and security vulnerabilities. Traditional tools fail when code doesn’t compile—a common scenario during development, code review, or when analyzing incomplete codebases. Developers need analysis capabilities that work on any code, regardless of compilation status.

How LLMDFA Works

LLMDFA decomposes complex dataflow analysis into manageable subtasks that LLMs can handle reliably:

Task Decomposition Strategy

The framework breaks analysis into three core components:

  • Value extraction: Identifies program variables and expressions of interest
  • Path analysis: Determines which execution paths are feasible
  • Dataflow summarization: Tracks how values flow between program points

Mitigating LLM Hallucinations

LLMDFA addresses LLM reliability issues through external tool integration:

1
2
3
4
5
6
# Example: Using parsing libraries for value extraction
def extract_program_values(code_snippet):
    # LLM generates code that uses AST parsing
    # instead of trying to parse manually
    ast_parser = get_parsing_library()
    return ast_parser.extract_variables(code_snippet)

The system generates code that outsources delicate reasoning to proven tools like parsing libraries and automated theorem provers rather than relying solely on LLM reasoning.

Few-Shot Chain-of-Thought Prompting

For function-level analysis, LLMDFA uses structured prompting that aligns LLMs with program semantics:

  1. Provide examples of correct dataflow analysis
  2. Break down reasoning into explicit steps
  3. Focus on small code snippets to reduce complexity
  4. Validate results against known patterns

Performance Results

LLMDFA demonstrates significant improvements over existing approaches:

  • Precision: 87.10% average across test cases
  • Recall: 80.77% for bug detection scenarios
  • F1 Score: Up to 0.35 improvement over traditional methods

The framework successfully detects three representative bug types in synthetic programs and identifies custom vulnerabilities in real-world Android applications.

Implementation Advantages

Compilation-Free Operation

You can analyze code immediately without setting up build environments, resolving dependencies, or fixing compilation errors. This enables analysis during:

  • Code review processes
  • Incomplete development phases
  • Legacy codebase exploration
  • Security audits of third-party code

Customizable Analysis

Unlike traditional tools that require expert configuration, LLMDFA adapts to new analysis requirements through natural language descriptions. You specify what to look for, and the system generates appropriate analysis logic.

Getting Started

The LLMDFA framework is open-sourced and available for immediate use. The system works with existing codebases without modification and integrates into standard development workflows.

Start by identifying specific dataflow patterns or vulnerabilities you want to detect in your codebase. LLMDFA handles the complexity of analysis generation and execution, providing actionable results without requiring compilation or expert tool configuration.