How I use AI to write tests (without making a mess)

TL;DR: Ask the agent to write broad tests, then review and keep only the ones that catch likely production issues. Result: fewer tests, clearer signal, lower maintenance.

The Problem with Traditional Test Writing

When I added tests for analytics in a feature, I needed to be sure every viewed event was wired correctly without spending a weekend on boilerplate. Tools like Copilot or Claude can generate lots of tests, but without guidance they add noise: checks of internals, repeated cases for the same branch, and fragile mocks.

What works for me is a simple two‑step approach: ask the agent to write broad coverage, then keep only the few tests that catch real problems.

Phase 1: Generate Comprehensive Tests

The Initial Prompt

Create a draft PR with tests for all analytics `viewed` events in the feature to prevent accidental breaks.

Why this works: I don’t constrain the AI here. I ask it to search the codebase: find every event, every prop combination, and every edge. The first pass is discovery.

The agent explored the codebase and produced tests for two places where we track viewed events:

ViewTracker (12 tests): reusable page‑view tracker
TotalsPanel (15 tests): transaction fee display tracking

What the AI Generated

Representative cases for ViewTracker:

it('tracks viewed with correct payload on mount', () => { /* ... */ })
it('skips tracking when pathname includes "manager"', () => { /* ... */ })
it('handles optional tenantId correctly', () => { /* ... */ })
it('re-tracks when pageName changes', () => { /* ... */ })
// + variations for component/app/page names, and URL payload fields

And for TotalsPanel:

it('tracks fee_displayed with correct payload', () => { /* ... */ })
it('does not track when the feature is disabled', () => { /* ... */ })
it('does not track when the fee is null', () => { /* ... */ })
it('tracks only once on initial render', () => { /* ... */ })
it('does not emit unrelated viewed events', () => { /* ... */ })
// + fee amount variants, membership permutations, etc.

Phase 2: Ruthless Filtering

Now comes the important part: I reviewed each test and asked, “Would this catch a real production bug?” Tests that only verified implementation details or checked obvious cases got removed.

Result: From 27 tests down to 8 high-signal tests. Each one protects against a real failure mode.

What I Kept

Tests that verify the analytics payload structure (breaking these would cause data pipeline failures)
Tests for conditional tracking logic (feature flags, null checks)
Tests that prevent duplicate events
Edge cases with real user impact

What I Removed

Tests that checked internal state instead of behavior
Redundant variations of the same logic path
Tests for TypeScript-enforced requirements
Overly specific mock assertions

Why This Works

AI agents are excellent at exhaustive generation but poor at prioritization. By splitting the process:

Phase 1 (AI): Generate comprehensive coverage without overthinking
Phase 2 (Human): Apply judgment about production risks and maintenance burden

The result is a lean test suite that actually protects the codebase without becoming a maintenance liability.

“The best code is no code. The second best is code that clearly prevents real problems.”

Takeaways

Let AI generate broad test coverage as a starting point
Review ruthlessly: keep only tests that prevent real production issues
Fewer, better tests beats comprehensive, noisy coverage
This approach works best when you have clear production failure modes in mind

Questions or feedback? This is an experiment in practical AI-assisted development.