Why we built ARC
around the spec.
Spec-driven development is how serious teams will ship software in the GenAI era. Here is the argument, from the top.
Code without coordination
is meaningless.
Specs are not paperwork you do before the real work. They are the real work.
A spec is the only place in a software project where humans, agents, and systems can agree on what is being built, how it behaves, and how it fits with everything else. Every other artifact — handlers, tests, infrastructure, documentation, client SDKs — is a downstream consequence of that agreement. When the agreement is clear, the consequences are cheap to produce and easy to change. When the agreement is vague, the consequences are where all the work ends up, and the work is almost always the wrong work.
For most of software history, teams got away with treating the spec as a planning document. You wrote it before you started coding. You referred to it when you needed to remember what you decided. You abandoned it when it drifted from reality — which it always did, because keeping two sources of truth in sync is a tax nobody wants to pay. The code became the spec by default, and the spec became a misleading artifact that new engineers learned not to trust.
That arrangement was survivable when humans wrote all the code. It is not survivable now.
In a world where any agent can generate plausible code in seconds, the bottleneck is no longer typing. The bottleneck is making sure the code that gets written is the code you actually wanted — that it integrates with the rest of the system, and that it doesn't quietly break the thing you shipped last week. An AI coding agent produces code that plausibly satisfies a prompt; a spec is what tells you whether the code satisfies the contract.
Plausible and correct are not the same thing.
LLMs may commoditize code generation. Code coordination and contract governance will be the durable layer.
The teams that ship the most valuable software in the next decade will treat the spec as the source of truth and generate all other artifacts from it.
A spec is an input to development,
not an output of it.
Spec-driven development inverts the relationship between specification and implementation. In the traditional model, you build the system and the spec describes what you built — the spec is an artifact of the code, created after the fact and maintained (or not) as the code evolves. In a spec-driven model, you write the spec and the implementation is derived from it — the code is an artifact of the spec, produced in response to it and regenerated when the spec changes.
OpenAPI is the right starting point for spec-driven development on HTTP APIs because it is the only widely adopted specification format that captures the things you actually need to generate code from: request and response shapes, validation rules, authentication, error contracts, operation semantics. It has a mature tooling ecosystem, a real standards body, and enough expressiveness to describe most of what backend APIs actually do.
It is imperfect in specific ways that matter for generation — which is a problem the tooling layer can solve — but it is the format everyone already knows, and that is worth more than any theoretically cleaner alternative. The remaining question is how far you take it. That is what the levels are for.
Where is your team today?
Three levels, each closing a specific gap the previous level left open. Each level has a distinct limitation that motivates the next one.
You write the spec before you write the code. You use it to scaffold handlers, generate types, or guide an AI coding agent. The spec is an input to the first commit and then it stops being consulted.
A better first commit. AI coding agents produce dramatically more accurate code when handed a validated spec instead of a prose description.
Teams that design in the spec catch API inconsistencies before they calcify into implementations.
An ongoing source of truth. The spec is a planning document that gets abandoned once the code exists. Within weeks of the first deploy, the spec and the code have diverged.
The spec lives in version control next to the code. Contract tests run on every pull request, checking that the implementation matches the spec. Breaking changes are blocked unless the spec is bumped explicitly. Spectral lints the spec itself.
A spec you can trust. Consumer teams can rely on it. New engineers can onboard by reading it.
Generated artifacts — SDKs, docs, mock servers — stay fresh because the spec is actively maintained.
One source of truth. The spec and the code are still two separate things that humans keep in sync through discipline.
CI catches drift after it happens. Under deadline pressure, the discipline erodes. The tooling just narrows the window in which the erosion is invisible.
The spec is the only file a human edits. Handlers, validators, infrastructure configuration, tests, documentation, client SDKs — all derived from the spec by a generation pipeline. The code is a build output. The workspace is the contract.
Drift becomes architecturally impossible, not just detectable. There is no path by which a handler can disagree with the spec, because the handler is the spec, materialized.
The human design surface shrinks to the contract layer. Teams spend time on API design, business logic, and data modeling — not on boilerplate, validation wiring, or documentation maintenance.
An entire class of production bug disappears.
Trust in the generation pipeline. If the pipeline has a bug, the bug propagates to every generated artifact simultaneously.
Brownfield adoption is a migration, not a flip of a switch.
We think the tradeoff is worth it. We think it is obviously worth it once a team feels it work. That is what ARC is for.
AI coding agents make
drift worse, not better.
AI-generated PRs have more critical & major issues — and 75% more logic errors.
A study conducted by AI-powered code review company CodeRabbit shows that AI accelerates output, but it also amplifies certain categories of mistakes.
10.83 issues per PR for AI vs. 6.45 for human-only. High-issue outliers were significantly more common, creating unpredictable review workloads.
Business logic mistakes, incorrect dependencies, flawed control flow, misconfigurations — the most expensive class of bug to find and fix.
AI coding tools are powerful accelerators, but acceleration without guardrails increases risk. The analysis shows that AI-generated code is consistently more variable, more error-prone, and more likely to introduce high-severity issues without the right protections in place.
Utilizing AI code output successfully requires mechanisms to ensure that the code isn't just plausible, it's spec-compliant. Without a contract to measure against, you have no verifiable criterion for correctness.
We have a coordination problem, not a speed problem.
The rise of AI-integrated IDEs and agentic coding tools has made spec-driven development go from "a good idea most teams should adopt" to the only sustainable way to ship at AI-assisted velocity. A model can produce a plausible handler in seconds. A team deciding whether that handler matches the contract takes hours. The faster the generation, the more important the contract.
Code generation has become cheap, but code coordination has not. Without a single source of truth for the entire system, you have an unknown number of agents and engineers making diverging decisions — decisions that don't become deployable code, but merge conflicts, reworks, and outages.
Adopting AI effectively
CodeRabbit identified seven practices that measurably reduce AI-introduced defects. These aren't process suggestions — they're structural interventions that close specific failure modes.
- 01
Give AI the context it needs
AI makes more mistakes when it lacks business rules, configuration patterns, or architectural constraints. Provide prompt snippets, repo-specific instruction capsules, and configuration schemas to reduce misconfigurations and logic drift.
ARCitect provides the context at the source — your OpenAPI spec carries the business rules, schema constraints, and architectural decisions that agents need.
ARC Code Reactor uses deterministic code generation to eliminate the possibility of invalid code output.
- 02
Use policy-as-code to enforce style
Readability and formatting are among the biggest AI-driven gaps. CI-enforced formatters, linters, and style guides eliminate entire categories of issues before they reach review.
ARC Code Reactor generates systems with standard formatting configured with Git hooks enabled by default.
- 03
Add correctness safety rails
Given the rise in logic and error-handling issues: require tests for non-trivial control flow, mandate nullability and type assertions, standardize exception-handling rules, and explicitly prompt for guardrails where needed.
ARC Code Reactor goes beyond generation-time validation — it generates comprehensive automated contract testing suites.
- 04
Strengthen security defaults
Mitigate elevated vulnerability rates by centralizing credential handling, blocking ad-hoc password usage, and running SAST and security linters automatically on every commit.
ARC Code Reactor systems follow the AWS Well-Architected Framework, with automated security posture analysis on the roadmap.
- 05
Nudge the model toward efficient patterns
Offer guidelines for batching I/O, choosing appropriate data structures, and embedding performance hints in prompts before generation begins.
ARC Code Reactor excels at this — the entire deterministic system is comprised of consistent, efficient patterns. No need for nudges when you have guaranteed code output.
- 06
Adopt AI-aware PR checklists
Reviewers should explicitly ask: Are error paths covered? Are concurrency primitives correct? Are configuration values validated? Are credentials handled via the approved helper? These questions target the areas where AI is most error-prone.
We are actively developing the next level of ARC Code Reactor test suite generation, which will provide automatic comprehensive system testing.
- 07
Get help reviewing and testing AI code
Code review pipelines weren't built to handle the volume teams are now shipping with AI. Reviewer fatigue leads to more missed bugs. An AI code review tool like CodeRabbit standardizes quality across different AI tools, acts as a third-party source of truth, and reduces the cognitive load of review — letting developers concentrate on the complex parts.
ARC Code Reactor provides deterministic, validated code along with a comprehensive test suite for additional validation — but we recommend leveraging any additional tools that help the team collaborate with AI most effectively.
What Level 3 looks like
in practice.
Two versions of a feature request. Same team, same project, very different outcomes.
The Level 3 project is not an impossible utopia. It is what happens when the tool is right and the discipline is architectural instead of cultural. The code is a build output, and trusting it is the same kind of trust you already extend to your TypeScript compiler.
Level 3 means you can compile systems from specs.
Ready to start?
ARCitect is the first piece of spec-as-source we're putting in your hands. The editor where the spec lives, and where everything downstream gets derived from.