Playwright Intro - discussion (2023-02-10)
Playwright's intro is a solid framing for E2E: test cross-boundary flows (routing + data + focus), and treat traces as part of debugging rather than as test artifacts. In apps with explicit lanes and evidence keys, E2E gets dramatically more deterministic because you can assert on the contract, not on timing.
What evidence keys do you consider fair game for E2E assertions (lane, posture, identity), and what should stay internal? How do you keep E2E stable when the UI has streaming or duplicate reads? Do you treat traces/screenshots as a first-class debugging artifact, and how do you make them readable (logs/evidence)?
Comments (16)
Back to latestI assert on user-visible behavior first, then one evidence key for the lane.
If the lane isn't visible, E2E becomes a timer guessing game.
We print a compact contract line into the UI (debug only) so screenshots are self-contained:
txt
[tips] route=/threads/42 panel=detail lane=ok freshnessAt=2023-02-18T06:06Z
When Playwright fails, the screenshot already tells you what contract state you were in.
Counterpoint: asserting too much evidence can make tests brittle when vocabulary changes.
We keep evidence assertions to things that are also useful for support (lane, identity, posture).
Yes. Evidence should be rename-stable because it's product meaning, not implementation detail.
If the vocabulary is churning, the contract probably isn't settled yet.
Streaming UIs are hard to test unless completion targets are explicit.
We render data-stream-block=ok and E2E waits for that instead of sleeping.
Long-form: E2E is expensive, so it should test *contracts*, not pixels.
Contracts are: route identity, lane transitions, and a few user-level expectations (text/roles).
If you're testing CSS details, you're paying E2E cost for unit-test value.
Traces became useful once we aligned logs to contract transitions.
If a trace includes [tips] lane=... reason=..., you can triage failures without rerunning locally.
Counterpoint: E2E can become a crutch for not writing good integration tests.
We keep E2E for 3-5 key flows and rely on RTL for most behavior, both benefiting from the same evidence layer.
We also assert on merge winner evidence when duplicate reads exist.
Otherwise E2E can pass while the UI is actually inconsistent (list and detail disagree).
Long-form counterpoint: if you don't have deterministic lanes, E2E is just an expensive flake generator.
The best E2E investment is improving the app's observability, not adding retries.
Docs tie-in: evidence + testing posture is consistent across the docs and guides.
Related: Testing and Debugging and Routing and Navigation.
We also used Playwright to validate focus scheduling flows (especially with overlays).
If focus moves incorrectly, it's often invisible in unit tests but obvious in E2E.
Counterpoint: too much reliance on data-* can hide accessibility issues.
If you can assert by role and name, do it. Evidence keys should supplement, not replace, semantic UI.
We keep one contract summary line in the UI specifically for E2E and support, not just for devs.
It's basically the "state of the world" sentence and it's worth its weight in gold.
Long-form: Playwright becomes a product tool when traces are readable.
Readable traces require that the app exposes truth: lanes, identity, and reasons. Otherwise you're just watching a video of a spinner.
We also made back/forward part of E2E because it's the fastest way to catch hidden-state bugs.
If back/forward doesn't replay your contract, users will notice.
If you want stable E2E, invest in contract observability first.
Then tests become shorter, and failures become explainable instead of mysterious.