SWE-Lancer: Evaluating Frontier LLMs on $1 Million Worth of Real-World Software Engineering Tasks
SWE-Lancer introduces a comprehensive benchmark of over 1,400 real freelance software engineering tasks from Upwork worth $1 million USD, evaluating frontier language models on both individual contributor coding tasks …
AI · Development