Using Z3 to Minimize Your Test Suite Without Losing Coverage

Formal methods aren’t just for aerospace or cryptography. They can help you reason about your test suites and prove that removing certain tests doesn’t break coverage.

Sep 28, 2025

In my previous posts, I showed how Prolog’s logical reasoning can drive AI agents with guaranteed behavior. Today, we’re applying similar formal reasoning to a different problem: proving that your test suite optimizations are mathematically sound.

Let’s see how we can turn test selection into a constraint optimization problem.

Thanks for reading Steve’s Lab! Subscribe for free to receive new posts and support my work.

The full code can be found on this gist.

The Core Insight

We want to find the minimal set of tests that covers all lines.

Formally:

Each line must be covered by at least one selected test.
We want to minimize the number of selected tests.
The solution must be mathematically proven correct.

This is a Set Cover Problem, which can be solved as a SAT + Optimization task using Z3.

Unlike heuristic approaches that might miss edge cases, Z3 provides a guarantee: if there’s a minimal solution, it will find it. If no solution exists that maintains coverage, it will tell you.

Step 1: Collect Coverage Data

The first step is gathering the raw data. We need to know exactly which lines each test covers.

Run your tests with coverage.py in context mode:

coverage run --context=test_function -m pytest
coverage json -o coverage.json

This creates a .coverage file with detailed mapping of which test functions touched which lines. The context mode is crucial—it tracks coverage per test function rather than aggregated coverage.

Why this matters: Without per-test coverage data, we can’t reason about redundancy. We need to know that test_user_creation covers lines 45-52 in user.py, while test_user_validation covers lines 48-55. The overlap at lines 48-52 is what makes optimization possible.

Step 2: Build the SAT Problem

Now we extract the relationships between tests and lines:

# Load coverage data
cov = Coverage(data_file=”.coverage”)
cov.load()
data = cov.get_data()

# Build the mapping: which tests cover which lines?
line_to_tests = {}
for filename in data.measured_files():
    for line in data.lines(filename):
        contexts = list(data.contexts_by_lineno(filename)[line])
        line_to_tests[(filename, line)] = contexts

This gives us a mapping like:

line_to_tests = {
  (”src/user.py”, 10): [”tests.test_user.test_creation”, “tests.test_user.test_validation”],
  (”src/auth.py”, 20): [”tests.test_auth.test_login”],
  (”src/user.py”, 15): [”tests.test_user.test_creation”]
}

The key insight: If multiple tests cover the same set of lines, they’re functionally identical for coverage purposes. One can safely be removed.

But it gets more subtle: Test A might cover lines {1, 2, 3}, Test B covers {2, 3, 4}, and Test C covers {1, 4}. No single test is redundant, but we might only need Tests A and C to cover everything. That’s where formal reasoning shines.

Step 3: Z3 Formulation - The Mathematical Core

Here’s where formal methods prove their worth. We model the problem in Z3’s constraint language:

import z3

opt = z3.Optimize()

# Create boolean variables: one for each test
ctx_vars = {}
for test in all_tests:
    ctx_vars[test] = z3.Bool(f”ctx_{test}”)

# Critical constraint: every line must be covered
for line, tests in line_to_tests.items():
    # This line is covered if ANY of its covering tests is selected
    opt.add(z3.Or([ctx_vars[t] for t in tests]))

# Objective: minimize the number of selected tests
objective = z3.Sum([z3.If(var, 1, 0) for var in ctx_vars.values()])
opt.minimize(objective)

# Solve and extract results
if opt.check() == z3.sat:
    model = opt.model()
    selected = [t for t, v in ctx_vars.items() if model.eval(v)]

What’s happening here mathematically?

Variables: Each ctx_test_i is a boolean variable representing whether we run test i
Constraints: For each line, at least one covering test must be selected: (ctx_test_a ∨ ctx_test_b ∨ ...)
Objective: Minimize Σ ctx_test_i (the total number of selected tests)
Proof: Z3 doesn’t just find a solution—it proves it’s optimal

The beauty: selected is mathematically guaranteed to be minimal while preserving full coverage.

Step 4: Validation - Trust but Verify

The theory is sound, but let’s prove it works:

# Run only the minimal test set
uv run pytest --cov=your_module --cov-report=term \
  “tests/test_mod.py::test_a” \
  “tests/test_mod.py::test_b”

The script automates this validation step. After finding the minimal set, it:

Runs the minimal tests with fresh coverage collection
Compares line coverage between original and minimal sets
Reports the results: coverage retained, tests eliminated

In practice, you’ll often see results like:

Original coverage: 1,247 lines across 87 tests
Minimal set coverage: 1,247 lines across 31 tests
✅ Coverage maintained: 100%
Test reduction: 64.4%

Why this validation matters: Even with mathematical proof, bugs can creep in through implementation details, test framework quirks, or coverage measurement edge cases. The validation step catches these.

The Complete Implementation

The full script demonstrates several sophisticated techniques:

Duplicate Detection

Before running expensive SAT solving, we identify tests with identical coverage:

# Find tests with identical coverage patterns
coverage_groups = defaultdict(list)
for test, lines in test_to_lines.items():
    coverage_signature = frozenset(lines)
    coverage_groups[coverage_signature].append(test)

# Report duplicates
duplicate_groups = {sig: tests for sig, tests in coverage_groups.items() if len(tests) > 1}

This catches the obvious cases: Tests that cover exactly the same lines are trivially redundant. Why run expensive optimization when simple deduplication works?

SAT-based Optimization

For the complex cases, we build the full constraint system:

solver = z3.Solver()

# Map each line to a boolean variable
line_vars = {line: z3.Bool(f”line_{filename}_{line}”) for line in all_lines}

# Map each test to a boolean variable  
test_vars = {test: z3.Bool(f”ctx_{test}”) for test in all_tests}

# Constraint: if a line must be covered, at least one covering test runs
for line, covering_tests in line_to_tests.items():
    solver.add(z3.Implies(line_vars[line], 
                         z3.Or([test_vars[t] for t in covering_tests])))

# Require all lines to be covered
for line_var in line_vars.values():
    solver.add(line_var)

# Minimize selected tests
opt = z3.Optimize()
opt.add(solver.assertions())
opt.minimize(z3.Sum([z3.If(var, 1, 0) for var in test_vars.values()]))

Automated Validation

The script generates the pytest command and verifies coverage is maintained:

def run_minimal_tests_and_validate_coverage(selected_tests, original_coverage_lines):
    # Convert Z3 solution to pytest arguments
    pytest_args = convert_to_pytest_format(selected_tests)
    
    # Run tests with coverage
    cmd = [”uv”, “run”, “pytest”, “--cov=kb_agents”] + pytest_args
    result = subprocess.run(cmd, capture_output=True, text=True)
    
    # Load and compare coverage
    new_cov = Coverage(data_file=”.coverage”)
    new_coverage_lines = count_covered_lines(new_cov)
    
    return new_coverage_lines >= original_coverage_lines

Key implementation detail: Converting from test context names (like tests.module.TestClass.test_method) to pytest format (tests/module.py::TestClass::test_method) requires careful string manipulation to handle the various naming conventions.

Why This Matters

Proven correctness: Z3 provides mathematical guarantees. When it says a test set is minimal, that’s not a heuristic—it’s a proof. No missed edge cases, no “probably good enough.”

Automatic minimization: No manual analysis of test dependencies or coverage overlap. The constraint solver handles arbitrarily complex coverage patterns.

Reusable framework: The same approach works for any (line, test) mapping. Coverage data from coverage.py, CI logs, code instrumentation, static analysis—feed it to Z3 and get optimal results.

But, more importantly, test suite minimization can save dollars and chop down build times, especially in large projects with extensive test suites. Every minute shaved off CI cycles is a win.

Conclusion

Formal methods are not only approachable today via solvers, but can also be done in Python, making them accessible to everyday developers. So many simple problems in software can be framed as constraint satisfaction or optimization tasks.

By casting the test minimization problem as a Set Cover Problem and leveraging Z3, we achieve mathematically proven minimal test sets that maintain full coverage. You can even run this script in your CI pipeline to continuously optimize your test suite as it evolves.

Discussion about this post

Ready for more?