Case Study: ARC Pilot Project #1

Overview

The Short Version

This case study analyzes ARC Pilot Project #1, an application we built for a design partner. ARC Pilot Project #1 is a production-ready multi-tenant SaaS platform that automates TCPA-compliant SMS payment reminders for lenders — handling everything from borrower consent and timezone-aware scheduling to exactly-once delivery guarantees.

Code Reactorv0.0

Using Code Reactor v0 in late 2025, 85% of the code for the finished system was generated in a single shot on the first day of the project.

Code Reactorv1.0

We recently released Code Reactor v1, which improved code and test generation across the board. v1 specifically improved coverage of unit and contract testing to 100% across all services.

Five purpose-built micro-services form a cohesive pipeline from tenant onboarding through to compliant, time-aware borrower communication. ARC Code Reactor generated the overwhelming majority of that codebase — source code, test suites, and infrastructure-as-code — from a validated OpenAPI spec.

The ARC workflow used to build ARC Pilot Project #1 can accelerate your next complex software project from months to days — without sacrificing architecture quality or compliance rigor.

Case Study Metrics

Two Things We Set Out to Prove

This case study measures ARC Code Reactor against two distinct questions:

How much of a production system can be generated from a spec?
How well does the generated code hold up under automated testing?

Both matter — generation rate without coverage is untested code at scale, and coverage without generation rate is just a better test harness for code you still had to write by hand.

Metric 01

Code Generation Rate

The percentage of the final deployed codebase that was produced by ARC Code Reactor versus written by a developer. Measured per service as LOC Generated ÷ Total LOC Deployed.

Metric 02

Test Code Coverage

The percentage of the deployed codebase exercised by ARC-generated test suites. Measured across unit, contract, and functional test layers using Vitest branch coverage as the primary signal.

Code Generation Rate

During the pilot project, ARC produced more than 85% of all deployed code, in a single, deterministic pass. ARC produced more than 3x cited average levels of AI-assisted code in deployed systems.

Source: Laura Tacho (CTO, DX) — “Measuring Developer Productivity & AI Impact” (2026)

26.9% of production code is AI-authored — up from 22% the prior quarter

Test Code Coverage

ARC didn't just generate most of the system code. It generated the test suites that automatically validate the system. ARC generated a test suite that provided 88% code coverage, providing an extremely high degree of confidence in the system.

Source: Kochhar, Thung, Lo & Lawall, APSEC 2014 — “An Empirical Study on the Adequacy of Testing in Open Source Projects”

A 2014 empirical study of open-source projects found average code coverage of ~42%

System Overview

Five Services, One Cohesive Platform

Each service owns a distinct domain. Together they handle the full lifecycle — from tenant provisioning to compliant borrower communication.

Service	Admin	Auth	Batch	Core	Messaging
Endpoints	3	1	1	22	1
AWS Resources	87	39	32	451	54
Lines of Code	2,846	749	661	21,245	1,226

Service Overview

Admin
Service

Endpoints3

AWS Resources87

LOC2,846

Description

Handles provisioning of lender accounts, campaign configuration, and system-wide settings. Ensures each tenant operates in a fully isolated context within the multi-tenant architecture.

Multi-TenantProvisioningAWS

Service Overview

Auth
Service

Endpoints1

AWS Resources39

LOC749

Description

Integrates with Auth0 to handle authentication and authorization across all services. Enforces role-based access control and tenant-scoped request isolation via JWT validation.

Auth0JWTRBAC

Service Overview

Core
Service

Endpoints22

AWS Resources451

LOC21,245

Description

The operational hub for lenders. Manages loans, borrowers, and campaign lifecycle via a RESTful API, with business logic covering campaign stages, due date tracking, and contact records.

REST APILoansCampaigns

Service Overview

Messaging
Service

Endpoints1

AWS Resources54

LOC1,226

Description

Owns all SMS communication via Twilio — outbound notifications, inbound webhook handling, conversation history, and opt-out consent management.

TwilioSMSWebhooksTCPA

Service Overview

Batch
Service

Endpoints1

AWS Resources32

LOC661

Description

The notification engine. Runs on a daily EventBridge schedule, evaluating each loan against campaign stage, due date, consent status, and notification history to trigger compliant outreach.

EventBridgeSchedulingCompliance

Source Code

Metric 1: Code Generation Rate

ARC Code Reactor produces three categories of output from a validated OpenAPI spec: source code, comprehensive test suites, and infrastructure-as-code. Together, these outputs represent the complete deployable artifact — not a scaffold to iterate from, but a production-ready baseline.

The table below shows the breakdown of generated versus manually-written lines of code across all five services. The delta column reflects custom business logic, environment-specific wiring, and developer-authored overrides beyond the spec-defined surface area.

Metric	Admin	Auth	Batch	Core	Messaging
LOC Generated (ARC)	2,846	749	661	21,245	1,226
LOC Delta (Manual)	291	31	3,379	4,079	77
% Generated	90.7%	96.0%	16.4%	83.9%	94.1%

Note on Batch Service: The Batch service's lower generation rate reflects its role as the custom notification engine — containing a high concentration of bespoke business logic for TCPA compliance, timezone-aware scheduling, and campaign-stage evaluation that falls outside standard CRUD patterns. Code Reactor's generation rate improves as more of this logic is expressed through the spec.

Estimating Dev Velocity Improvements

We leveraged ARCitect's project estimation workflow to provide agent-based estimates for projects. Compared to estimated effort, this pilot was completed 3.5x faster than manual development and 50% faster than AI-assisted development without ARC.

ARC Velocity: Building a production-ready system with “exemplary” test coverage within a single sprint is impressive and would allow any team to ship faster without sacrificing quality. The only thing better is being able to do it whenever the spec changes.

Code Coverage

What Does Good Coverage Actually Look Like?

Before reviewing the numbers ARC generated, it helps to have an industry baseline. Code coverage is not a perfect measure of test quality — a codebase can have high coverage and still ship bugs if tests assert the wrong things. But a low coverage number does make one thing certain.

“A low code coverage number does guarantee that large areas of the product are going completely untested by automation on every single deployment.”

That guaranteed blind spot increases the risk of pushing bad code to production on every release. Coverage gives teams an objective, actionable signal — not a perfect one, but a meaningful floor. Google's engineering team offers the most widely-cited industry benchmark. Their guidelines, drawn from experience at scale across thousands of services:

Acceptable

60^%

A functional floor. Large areas of code still go untested on every deployment.

Commendable

75^%

A solid baseline. Most critical paths are exercised, edge cases still present risk.

Exemplary

90^%

The standard ARC targets. High confidence that automation catches regressions before production.

Source: Google Testing Blog — “Code Coverage Best Practices” (2020). Although there is no ideal coverage number, Google offers 60% as acceptable, 75% as commendable, and 90% as exemplary across thousands of production services.

Code coverage provides significant benefits to the developer workflow. It is not a perfect measure of test quality, but it does offer a reasonable, objective, industry-standard metric with actionable data. The sections below show where ARC-generated test suites land against these benchmarks — on day one, before a developer writes a single manual test.

Code Coverage

Code Coverage Metrics Overview

Test coverage is measured across multiple dimensions — and the numbers can tell very different stories depending on which metric you look at. The table below presents a summary of key coverage types per service. What matters most is the direction: high coverage across all three categories means the generated code is being exercised broadly, from individual functions through to complete request paths.

Primer How to read coverage metrics

Vitest reports four distinct coverage metrics for every test run. Each measures a different aspect of how thoroughly the test suite exercises the codebase. They are related but not equivalent — a service can have high line coverage and low branch coverage, which would mean conditional logic is largely untested.

Statements Reference

The percentage of individual executable statements that were run at least once. Similar to line coverage but more granular — multiple statements on one line are counted separately.

Branch Presented

The percentage of decision branches exercised — every if, else, switch, and ternary path. This is the strictest signal: a line can be “covered” while one of its branches remains completely untested.

Functions Reference

The percentage of declared functions that were called at least once. A useful coarse-grained signal for identifying dead code or untested handlers, but it says nothing about whether the logic inside those functions was fully exercised.

Lines Reference

The percentage of source lines executed during the test run. The most intuitive metric and the one closest to what most engineers picture when they hear “code coverage.” Used as the headline figure in the table below.

Why branches? Branch coverage is the stricter and more meaningful signal — complete coverage numbers are included in the appendix for readers who want the full picture.

Test Pyramid

Test Pyramid Overview

The Testing Pyramid is a strategic framework that prioritizes a large base of fast, automated checks to catch errors early, ensuring software remains reliable and cost-effective as it grows.

Functional Testing: Checks the software against specific business requirements to confirm that the system actually does what the user expects it to do.

Contract Testing: Verifies that two different systems (like a frontend and a backend) agree on how to communicate, ensuring they can exchange data without breaking.

Unit Testing: Focuses on the smallest parts of the application, like individual functions or components, to ensure each piece of logic works correctly in isolation.

Primer ARC Testing Pyramid

Functional

Functional Testing

ARC understands your code at the structural level — going far beyond a simple Markdown context file. This understanding enables functional tests that exercise each code path created by business logic, verifying that exact business rules and constraints were implemented correctly.

Contract

Contract Testing

ARC automatically generates contract test suites that verify API contract compliance — confirming schema conformance, endpoint existence, and status code correctness against the authoritative OpenAPI spec. These tests make spec drift immediately visible in CI.

Unit

Unit Testing

ARC generates unit tests for low-level functions related to data models. Code Reactor v1 extends this to include test coverage for utility functions defined within ARCitect — reducing the surface area that requires manual test authorship.

Code Coverage Metrics Overview

Metric 2: Test Code Coverage

ARC doesn't just generate application code. It generates comprehensive test suites across the conventional testing pyramid — from low-level unit tests to API contract verification. All generated tests are passing.

Coverage Type	Admin	Auth	Batch	Core	Messaging
Unit Test	91.3%	100%	100%	95.6%	100%
Contract Test	80%	100%	78.8%	98.4%	75%
Functional Test	82.4%	66.7%	75.5%	78.4%	60%
Total	83.6%	80%	76.3%	89.8%	63.15%

Below acceptable <60%

Acceptable 60–74%

Commendable 75–89%

Exemplary 90%+

Unit Testing

Unit Test Coverage

Coverage of generated unit tests per service, comparing v0 output to the v1 rerun. All tests passing.

Version	Admin	Auth	Batch	Core	Messaging
v0.0	91.3%	100%	100%	95.6%	100%
v1.0	100%	100%	100%	100%	100%

Admin

v091.3%

v1100%

Auth

v0100%

v1100%

Batch

v0100%

v1100%

Core

v095.6%

v1100%

Messaging

v0100%

v1100%

Contract Testing

Contract Test Coverage

ARC generates contract tests that verify API shape conformance — schema correctness, endpoint existence, and expected status codes — against the authoritative OpenAPI spec. These run in CI and make spec drift immediately visible before it reaches production.

Version	Admin	Auth	Batch	Core	Messaging
v0.0	80.0%	100%	78.8%	98.4%	75%
v1.0	100%	100%	100%	100%	100%

Functional Testing

Functional Test Coverage

ARC understands your code at the structural level — going far beyond a simple Markdown context file. This understanding enables functional tests that exercise each code path created by business logic, verifying that exact business rules and constraints were implemented correctly.

Using this in-development functional test generation capability, we achieved comprehensive coverage across the ARC Pilot Project #1 system in one sprint of developer effort. The next Code Reactor release will generate 90+% coverage functional tests automatically.

Version	Admin	Auth	Batch	Core	Messaging
v0.0	83.6%	66.7%	75.8%	89.8%	63.15%

Infrastructure as Code

Every Resource, Generated

ARC Code Reactor doesn't stop at application code. For every service in the ARC Pilot Project #1 platform, ARC generated the complete AWS infrastructure definition — API Gateway configurations, Lambda functions, DynamoDB tables, SQS queues, EventBridge rules, and IAM roles — as production-ready Infrastructure-as-Code.

Across the five ARC Pilot Project #1 services, ARC generated definitions for 663 AWS resources. These are not generic templates — they are generated directly from the validated OpenAPI spec and reflect the exact shape of each service's API surface and operational requirements.

Service	Admin	Auth	Batch	Core	Messaging	Total
AWS Resources Generated	87	39	32	451	54	663