Built by Agents, Tested by Agents, Trusted by Whom? The Rise of AI-Driven Software Factories

StrongDM, a security infrastructure company, has eliminated humans from their software development process. Their “Software Factory” uses AI agents to write, test, and deploy production code without any human review. This represents a fundamental shift in how software gets built—and who takes responsibility when it breaks.

The End of Human Code Review

StrongDM’s approach follows two strict rules: “Code must not be written by humans” and “Code must not be reviewed by humans.” Three engineers manage the entire system, focusing on specifications and monitoring rather than programming. Their CTO suggests spending at least $1,000 daily on AI tokens per engineer as a productivity benchmark.

This isn’t experimental. StrongDM builds access management software that protects enterprise systems. They’ve decided human oversight creates bottlenecks rather than safety.

The technology enabling this shift has evolved rapidly. By late 2025, AI models became reliable enough that the question changed from “can agents write code?” to “why are humans still writing code?” Each model generation compounds previous gains, creating exponential improvements in complex task reliability.

Testing Without Humans

StrongDM solved a critical problem: how do you verify AI-written code without human reviewers? Their solution involves detailed customer usage scenarios kept hidden from coding agents. Instead of asking “does it pass tests?”, they ask “how often would real users get what they need?”

They built a “Digital Twin Universe”—working replicas of services like Okta, Jira, and Slack. Against these replicas, they run thousands of test scenarios hourly without rate limits or API costs. What was economically impossible six months ago became routine as AI development costs collapsed.

However, this creates a circular dependency. The same AI technology that writes code also judges whether it works. When both builder and inspector share identical blind spots, subtle failures become harder to catch.

The Accountability Gap

When software fails, existing legal frameworks assume someone reviewed the work. StrongDM’s model breaks this assumption. No human examined the failing code. No human designed the test that missed the problem. No human built the replica that validated the behavior.

Three critical gaps emerge:

Liability confusion: If an AI-written access management system fails, who bears responsibility? The three engineers who designed the architecture? The AI provider whose model generated the code? The company that sold the product?

Disclosure problems: When customers ask “how was this built?”, the truthful answer—“agents wrote it, other agents tested it against replicas”—provides no framework for evaluation. No industry standards define adequate satisfaction scores for agent-built software.

Contractual mismatch: Software contracts still use boilerplate language designed for human-built systems. The same limitation-of-liability clauses that once disclaimed human imperfection now disclaim the absence of humans entirely.

Skills and Oversight Erosion

StrongDM’s model doesn’t augment traditional software engineering—it replaces it. The humans design systems and monitor outputs but never program. The fundamental skill of reading and writing code becomes unnecessary.

As these approaches spread, the institutional knowledge needed to understand failures may disappear. When something breaks in an AI-built system, the humans who could diagnose the problem may no longer exist within the organization.

The Regulatory Challenge

Current software regulation operates reactively, responding to harm after it occurs. But AI-driven software factories create new categories of risk that existing frameworks don’t address:

  • No regulatory standard covers software that no human has reviewed
  • No audit methodology exists for agent-built code tested against simulated services
  • No procurement guidelines help buyers evaluate AI-generated software claims

The exponential adoption curve means the window for regulatory preparation is narrow. If StrongDM’s approach spreads at current rates, software factories could produce significant portions of commercial software within two years.

What Comes Next

The greatest risk isn’t that AI-written code will be worse than human-written code—it may be better. The risk is that when it fails, no one will understand why or how to fix it.

Organizations considering AI-driven development need to address three questions: Who takes responsibility for AI-generated failures? How do you verify software quality without human expertise? What happens when the humans who could debug the system are no longer part of the process?

The software factory represents more than a productivity improvement. It’s a fundamental inversion of how we assign responsibility for software behavior. The regulatory and legal frameworks that govern this transition will determine whether this technology serves users or simply optimizes for metrics that miss what users actually need.