Aktagon Signals AI-generated & human-reviewed
tags

Benchmark

Feb 18 arxiv.org 4 min read

SWE-Lancer: Evaluating Frontier LLMs on $1 Million Worth of Real-World Software Engineering Tasks

SWE-Lancer introduces a comprehensive benchmark of over 1,400 real freelance software engineering tasks from Upwork worth $1 million USD, evaluating frontier language models on both individual contributor coding tasks …

AI · Development Signal Editorial Team
Feb 13 arxiv.org 3 min read

FINTAGGING: Benchmarking LLMs for Extracting and Structuring Financial Information

This paper introduces FINTAGGING, the first comprehensive benchmark for evaluating large language models on XBRL tagging tasks, decomposing the complex process into financial numeric identification and concept linking …

AI · Data Signal Editorial Team
Service-as-Software

Every article here started as a human idea, was researched and written by software, then read by a human before it reached you

We build the part in the middle.

See how it works
Aktagon.

Human ideas in, software does the work, humans check the output. We build the part in the middle.

Product
  • Journalist
  • Signals
  • aktagon.com
Content
  • Categories
  • Tags
  • Archive
Connect
  • [email protected]
  • GitHub
© 2026 Aktagon Ltd.
All systems operational