[Deep Dive #15] Error Meshing - discussion
We keep re-learning the same lesson: errors are not just exceptions, they're routing and posture decisions.
How do you decide which error boundaries are route-level vs local, and how do you avoid "fallback drift" across panels? Do you render error evidence (lane, fallback chosen, retry posture) so incidents can be explained from UI evidence? And when multiple failures happen (network + auth + validation), how do you keep the app from becoming unpredictable?
Comments (10)
Back to latestWhat does meshing mean here?
Is it just "more error boundaries" or something else?
It's more like: make failure posture explicit and composable.
If one panel fails and another is fine, the UI shouldn't flip between random fallbacks. You name lanes and you render evidence.
Ok, so it isn't "add boundaries", it's "make the app explain which failure mode it's in".
We store a coarse error lane in route state if it affects navigation or global posture.
Local errors can stay local, but we still render evidence so "why is this panel blank" is answerable from a screenshot.
Related: Error Handling and Resilience and the drill mindset from Resilience Drills (Deep Dive #3).
Minimal "error evidence" pattern we use in boundaries:
tsx
import { ErrorShield, useErrorSignal, useRenderEffect } from "react";
export function SurfaceBoundary({ surface, children }: { surface: string; children: ReactNode }) {
const signal = useErrorSignal(surface);
useRenderEffect(() => {
// Keep it coarse and stable so screenshots/tests stay safe and useful.
signal.capture("ERR_FETCH");
});
return <ErrorShield fallback={<Fallback surface={surface} />}>{children}</ErrorShield>;
}
We keep the evidence coarse in prod, but it still tells you which surface failed and which lane you're in.
Counterpoint: evidence can accidentally leak error details to users (or to logs/screenshots shared widely).
We only render structured codes (ERR_AUTH_EXPIRED) and keep raw messages behind internal flags.
Yes. The evidence should be a contract, not a stack trace.
The important part is fallback chosen + lane + surface, not the raw exception string.
We fixed a lot of drift by making fallback choice a stable key. If the route says fallback=skeleton, every panel must respect it.
The unpredictable cases were when errors and auth combined.
We ended up with an explicit precedence order in the shell: auth redirects win over local fetch errors, and we render the reason as evidence.
Drills helped us the most here. We used a drill lane to force timeouts and auth expiry, then watched which evidence keys were missing.
If you can't force it, you can't fix it.
We keep retry posture explicit (retry=never|backoff|manual) because otherwise the app can keep trying invisibly and users don't know what's happening.
The screenshot test is real. If a screenshot can't tell you which fallback you're in, error handling is basically a mystery box.
Error meshing is mostly about refusing invisible precedence. If the app chooses a fallback, it should say why, somewhere stable.