TestPilot — AI agents that test your web app end-to-end

The multi-agent system

Not one prompt — a coordinated QA team

A single LLM doing everything produces average results. TestPilot orchestrates specialized agents, each with one well-defined job, to produce professional, reliable, maintainable tests.

Agent 01

🧭

Explorer

Planner + Observer

Crawls your app in a real browser, reads the accessibility tree and screenshots, and turns a plain-English goal into high-quality, structured test steps.

→

Agent 02

✍️

Test-Writer

Executor

Runs each step in Playwright with proper auto-waits and clear assertions, producing clean, resilient test code instead of brittle one-off scripts.

→

Agent 03

🩹

Self-Healer

Text + Vision

When a locator breaks, it heals from the aria snapshot first, then falls back to a vision model that reads the actual screenshot to find the element.

→

Agent 04

📋

Reporter

+ Memory

Writes an executive summary with pass/fail, screenshot evidence, and a healing log — then remembers every fix so the next run is faster and stronger.

An Orchestrator agent runs the show — deciding when to plan, execute, heal (text vs. vision), and report — while a Memory store learns healed locators across runs. Observation over assumption: every decision is grounded in screenshots + the accessibility tree.

Live demo

Watch a real run heal itself

This is a self-contained animation of a TestPilot run — the agents execute, a locator moves, vision healing recovers it, and the run finishes green with evidence.

suite: e-commerce-checkout.spec ▶ running

Planning steps…0%

vision healer · screenshot diff

🔎 “Place order” button relocated — healed via screenshot

baselinev128

Place order

currentv129

MOVED →

Place order

healer.vision: locator getByRole('button',{name:'Place order'})
→ resolved ✓ new locator confirmed & saved to memory

Steps passed

Bugs found

Auto-fixed by healer

Why TestPilot

Built like a professional QA team, in a box

Every feature is grounded in observation, designed to survive real UI change, and runs entirely on your own hardware.

🩹

Self-healing tests

Locators break when your UI changes. TestPilot heals them — text-first from the aria snapshot, then a vision model reads the screenshot — so tests survive redesigns instead of failing.

🔒

Local & private via Ollama

Runs 100% on your machine with open-weight models like Qwen3 and Gemma. No API keys, no recurring cost, no code or screenshots ever leaving your network.

📸

Screenshot evidence

Every step captures a screenshot, the accessibility tree, visible text, timing, URL and title — so each pass, fail, and heal is explainable and reviewable.

⚙️

CI-ready by design

Generates real Playwright tests you can commit and wire into GitHub Actions. Headed or headless, fast 14B mode or accurate 32B + vision mode — you choose.

🚫

No more flaky tests

Proper auto-waits, grounded assertions, and healing memory kill the flakiness that plagues hand-written suites. Fewer false alarms, more real signal.

🧠

Learns over time

A SQLite-backed memory remembers every healed locator and pattern per domain, so repeat runs get faster, more reliable, and need less intervention.

Playwright browser Ollama models Qwen3 · Gemma open-weight Bun + TypeScript SQLite memory Zod structured output

Who it's for

QA-grade testing without the QA team

👩‍💻

Solo developers & indie makers

Get end-to-end coverage on your side project without hand-writing brittle specs — describe the flow in English and let the agents do it.

🚀

Small & fast-moving teams

Ship confidently when your UI changes weekly. Self-healing keeps the suite green so your team isn't drowning in flaky-test triage.

🛡️

Privacy-sensitive & SMBs

Regulated data or just no cloud budget? Everything is local and open-source — ideal for solo devs, small teams, and cost-conscious SMBs.

Give your app a QA team tonight

TestPilot is in active development. Join the waitlist for early access, install guides, and the first public build.

✓ You're on the list — we'll be in touch. No spam, ever.

Free & open-source core · always local · bring your own hardware, keep your own data.