How I use AI to write tests (without making a mess)
I ask AI to write a bunch of tests, then delete most of them. Sounds weird, but it works way better than trying to guide AI from the start.
TL;DR: Ask the agent to write broad tests, then review and keep only the ones that catch likely production issues. Result: fewer tests, clearer signal, lower maintenance.
The Problem with Traditional Test Writing
When I added tests for analytics in a feature, I needed to be sure every viewed event was wired correctly without spending a weekend on boilerplate. Tools like Copilot or Claude can generate lots of tests, but without guidance they add noise: checks of internals, repeated cases for the same branch, and fragile mocks.
What works for me is a simple two‑step approach: ask the agent to write broad coverage, then keep only the few tests that catch real problems.
Phase 1: Generate Comprehensive Tests
The Initial Prompt
Create a draft PR with tests for all analytics `viewed` events in the feature to prevent accidental breaks.
Why this works: I don’t constrain the AI here. I ask it to search the codebase: find every event, every prop combination, and every edge. The first pass is discovery.
The agent explored the codebase and produced tests for two places where we track viewed events:
- ViewTracker (12 tests): reusable page‑view tracker
- TotalsPanel (15 tests): transaction fee display tracking
What the AI Generated
Representative cases for ViewTracker:
it('tracks viewed with correct payload on mount', () => { /* ... */ })
it('skips tracking when pathname includes "manager"', () => { /* ... */ })
it('handles optional tenantId correctly', () => { /* ... */ })
it('re-tracks when pageName changes', () => { /* ... */ })
// + variations for component/app/page names, and URL payload fields
And for TotalsPanel:
it('tracks fee_displayed with correct payload', () => { /* ... */ })
it('does not track when the feature is disabled', () => { /* ... */ })
it('does not track when the fee is null', () => { /* ... */ })
it('tracks only once on initial render', () => { /* ... */ })
it('does not emit unrelated viewed events', () => { /* ... */ })
// + fee amount variants, membership permutations, etc.
Phase 2: Ruthless Filtering
Now comes the important part: I reviewed each test and asked, “Would this catch a real production bug?” Tests that only verified implementation details or checked obvious cases got removed.
Result: From 27 tests down to 8 high-signal tests. Each one protects against a real failure mode.
What I Kept
- Tests that verify the analytics payload structure (breaking these would cause data pipeline failures)
- Tests for conditional tracking logic (feature flags, null checks)
- Tests that prevent duplicate events
- Edge cases with real user impact
What I Removed
- Tests that checked internal state instead of behavior
- Redundant variations of the same logic path
- Tests for TypeScript-enforced requirements
- Overly specific mock assertions
Why This Works
AI agents are excellent at exhaustive generation but poor at prioritization. By splitting the process:
- Phase 1 (AI): Generate comprehensive coverage without overthinking
- Phase 2 (Human): Apply judgment about production risks and maintenance burden
The result is a lean test suite that actually protects the codebase without becoming a maintenance liability.
“The best code is no code. The second best is code that clearly prevents real problems.”
Takeaways
- Let AI generate broad test coverage as a starting point
- Review ruthlessly: keep only tests that prevent real production issues
- Fewer, better tests beats comprehensive, noisy coverage
- This approach works best when you have clear production failure modes in mind
Questions or feedback? This is an experiment in practical AI-assisted development.