Debugging flakes¶
A flaky tuiwright test almost always falls into one of five
buckets. Triage in this order.
1. Read the screen dump¶
Every TuiTimeoutError includes the full last screen:
TuiTimeoutError: timed out after 5.0s waiting for text 'Saved'.
--- last screen ---
gode gemini/gemini-3.1-pro-preview-customtools high idle
No transcript yet. Ask gode to inspect, edit, or run something.
... (full screen) ...
--- end ---
Read it before doing anything else. The answer is usually visible:
- A modal you didn't dismiss
- A loading spinner that's still spinning
- An unexpected error notification
- The text you wanted, just in a different region
2. Classify¶
| Symptom | Most likely cause | Section |
|---|---|---|
Text is on screen, but wait_for_text still times out |
Region was stale | §A |
| Text doesn't appear in the dump either | App never rendered it | §B |
| Passes once, fails next run | Race condition | §C |
| Input has no effect | DEC mode not enabled, or rapid burst | §D |
| Snapshot diffs without code changes | Cursor / mode flap | §E |
§A — Stale region¶
The Region object holds a snapshot of one Screen. If you captured
it and then started a wait, the wait was checking frozen data:
# BAD
region = tui.region(title="Logs")
text = region.text # ← captured here
if "done" not in text:
... # ← stale forever
wait_for_text(region=…) is special — it re-derives the bounds from
the current screen on every poll. Always pass regions in that way:
§B — Text really isn't there¶
Quick standalone repro:
import asyncio
from tuiwright import TuiSession
async def main():
s = TuiSession()
await s.start(["myapp"], env={"KEY": "v"}, cols=140, rows=44)
for i in range(10):
await asyncio.sleep(1)
print(f"--- t={i+1}s ---")
print(s.screen.text)
if "<expected>" in s.screen.text:
break
await s.stop()
asyncio.run(main())
Common causes:
- Missing env var (API key, config flag) — the app silently degrades
- App crashed at startup — check
s.aliveafter each second - Text uses Unicode that doesn't appear in
screen.text— check individual cells:s.screen.cells[row][col].char - App is waiting for stdin you haven't provided
§C — Race condition¶
A test that passes sometimes is almost always missing a wait between
input and assertion. Never patch with asyncio.sleep; add the
right primitive:
# BAD — fast on your laptop, flaky in CI
await tui.press("ctrl+p")
await asyncio.sleep(0.5)
assert "Settings" in tui.screen.text
# GOOD — deterministic everywhere
await tui.press("ctrl+p")
await tui.wait_for_text("Settings")
assert "Models" in tui.screen.text
If the screen continues changing after the wait (animations, layout
debouncing), follow with wait_for_stable:
await tui.wait_for_text("Settings")
await tui.wait_for_stable(quiet_ms=150)
assert "Models" in tui.screen.text
§D — Input goes nowhere¶
Triage in a debug script:
print("mouse on:", tui._emu.is_mouse_tracking())
print("paste on:", tui._emu.is_bracketed_paste())
print("focus on:", tui._emu.is_focus_events())
print("DEC modes:", sorted(tui._emu.private_modes))
If the relevant mode is off, the app hasn't enabled it yet. Wait:
await tui.wait_for_predicate(
lambda _: tui._emu.is_mouse_tracking(),
timeout=3,
description="mouse tracking",
)
If the mode is on but input still has no effect, the app probably debounces — many widgets process one event per render tick. Add a stable wait between rapid presses:
Special characters worth knowing about:
\x13(Ctrl-S) and\x11(Ctrl-Q) are flow control in cooked mode.TuiSession.startputs the PTY in raw mode so they pass through. If you bypass the session and usePtyTransportdirectly, set raw mode yourself.\x7f(DEL, the Backspace key) isVERASEin cooked mode. Same fix.
§E — Snapshot flaps¶
Cell-grid snapshots compare cursor position, mode set, and cell attrs — not just text. Flakiness usually comes from:
- Cursor twitch. The cursor moves during a redraw. Add
wait_for_stable(quiet_ms=200)before the assertion. - A DEC mode toggling on/off. Make sure mode-setting completes before the snapshot.
- Variable text in the render. Version numbers, timestamps, working directory, model names. Either stub them via env vars or snapshot a region that excludes them.
Repeat to confirm¶
A test fails once → it's flaky. A test fails 5/20 → it's reliably flaky. A test fails 20/20 → it's broken (good news, much easier).
for i in $(seq 20); do
pytest tests/test_x.py::test_thing -q --timeout=20 || echo "FAIL $i"
done | grep -c FAIL
When you really can't reproduce¶
- Bump
--tui-timeout=30and--timeout=60— slow runners hide issues. pytest --tui-trace=onand inspect the cast file with asciinema-player.- Add
RUST_BACKTRACE=1(or your runtime's equivalent) toenv=so crashes show up clearly. - Open an issue with the cast file — they're plain JSON, small, and 100% reproduce-able locally.
Next: CI integration →