TestPilot spins up specialized AI agents that explore your app, write real Playwright tests, self-heal broken locators from screenshots, and file bugs with evidence — all running privately on your machine via Ollama.
A single LLM doing everything produces average results. TestPilot orchestrates specialized agents, each with one well-defined job, to produce professional, reliable, maintainable tests.
Crawls your app in a real browser, reads the accessibility tree and screenshots, and turns a plain-English goal into high-quality, structured test steps.
→Runs each step in Playwright with proper auto-waits and clear assertions, producing clean, resilient test code instead of brittle one-off scripts.
→When a locator breaks, it heals from the aria snapshot first, then falls back to a vision model that reads the actual screenshot to find the element.
→Writes an executive summary with pass/fail, screenshot evidence, and a healing log — then remembers every fix so the next run is faster and stronger.
This is a self-contained animation of a TestPilot run — the agents execute, a locator moves, vision healing recovers it, and the run finishes green with evidence.
healer.vision: locator getByRole('button',{name:'Place order'})
→ resolved ✓ new locator confirmed & saved to memory
Every feature is grounded in observation, designed to survive real UI change, and runs entirely on your own hardware.
Locators break when your UI changes. TestPilot heals them — text-first from the aria snapshot, then a vision model reads the screenshot — so tests survive redesigns instead of failing.
Runs 100% on your machine with open-weight models like Qwen3 and Gemma. No API keys, no recurring cost, no code or screenshots ever leaving your network.
Every step captures a screenshot, the accessibility tree, visible text, timing, URL and title — so each pass, fail, and heal is explainable and reviewable.
Generates real Playwright tests you can commit and wire into GitHub Actions. Headed or headless, fast 14B mode or accurate 32B + vision mode — you choose.
Proper auto-waits, grounded assertions, and healing memory kill the flakiness that plagues hand-written suites. Fewer false alarms, more real signal.
A SQLite-backed memory remembers every healed locator and pattern per domain, so repeat runs get faster, more reliable, and need less intervention.
Get end-to-end coverage on your side project without hand-writing brittle specs — describe the flow in English and let the agents do it.
Ship confidently when your UI changes weekly. Self-healing keeps the suite green so your team isn't drowning in flaky-test triage.
Regulated data or just no cloud budget? Everything is local and open-source — ideal for solo devs, small teams, and cost-conscious SMBs.
TestPilot is in active development. Join the waitlist for early access, install guides, and the first public build.
✓ You're on the list — we'll be in touch. No spam, ever.
Free & open-source core · always local · bring your own hardware, keep your own data.