[Deep Dive #50] Error Meshwork Failover - discussion

Started by Camille Duarte

blogerrorsresilienceledgersretriesrouting

Deep Dive #50 treats errors as part of the routing story: render them, record them, and route through them. In meshwork failover mode, failures become edges in a ledger, retries become route transitions (tracks), and every failure produces evidence you can inspect in render (edges, retries, lastRoute).

Read the post

Do you keep an error ledger per route shell, or per track, or do you let everything throw and hope? How do you model retries as routed actions without creating infinite loops and phantom state? What evidence do you render so support can see 'this was a failover' vs 'this was a normal transition'?

Comments (10)

Back to latest

Dmitri Kline

Mar 21, 2023 · 04:06 PM

The biggest improvement is making retries first-class transitions. It stops the 'button mashes fetch again' chaos.

Related: Error Handling and Resilience.

Maya Henson

Mar 21, 2023 · 04:16 PM

I like the tie-in to promise catch cartography. A map of where promises failed is more actionable than a stack trace screenshot.

Salma Qureshi

Mar 21, 2023 · 04:28 PM

We implemented a tiny edge ledger and used it as UI evidence:

tsx
const ledger = useLiveState('errors:inbox', { edges: [] as any[], retries: 0, lastRoute: '' });
function edge(e: any) { ledger.write('edges', [...ledger.read('edges'), e]); }
return <div data-error-edges={String(ledger.read('edges').length)} data-retries={String(ledger.read('retries'))} />;

Keira Santos

Mar 21, 2023 · 04:41 PM

Counterpoint: if every failure becomes a node, teams can normalize broken data and ship forever in 'failover mode'.

I'd want a policy that escalates after N edges (route to a stable exit, or freeze).

Camille Duarte

Mar 21, 2023 · 04:46 PM

Totally. Meshwork doesn't mean 'never stop', it means 'stop deliberately'.

A route-visible policy like 'maxEdges' feels like part of the contract.

Keira Santos

Mar 21, 2023 · 04:49 PM

Yes, exactly. If the escalation posture is visible, it won't quietly drift into permanent fallback.

Noah Vance

Mar 21, 2023 · 05:03 PM

Mega components being mentioned here makes sense: if the shell owns failover policy, a single component can be a clearer contract than a scattered set of boundaries.

Related: Components and Composition.

Grace McCoy

Mar 21, 2023 · 05:14 PM

The part I actually shipped from this post was rendering lastRoute as evidence.

It turned a bunch of "it broke" screenshots into actionable reports because you could see which surface produced the edge.

Priya Liao

Mar 21, 2023 · 05:23 PM

Modeling recovery as component-router tracks is a nice forcing function: it makes you enumerate the failure modes (retry, fallback, export-anyway).

If you can't name the tracks, you're probably just hoping errors won't happen.

Inez Park

Mar 21, 2023 · 05:37 PM

Minimal track posture I like (keeps it explicit without turning into a state machine project):

ts
type Track = 'view' | 'retry' | 'fallback' | 'export-anyway';
type Edge = { at: number; kind: 'fetch' | 'auto-route' | 'fallback'; message: string; surface: string };

const ledger = useLiveState('reports.error.ledger', { edges: [] as Edge[], retries: 0, lastRoute: '' });
const ui = useLiveState('reports.recovery.ui', { track: 'view' as Track });

The key is the ledger: without it, tracks are just UI tabs and nobody can prove why they switched.

Harper Iqbal

Mar 21, 2023 · 05:52 PM

A misunderstanding I had: I assumed this was about catching everything and never throwing.

But it reads more like: throw if you must, but convert that failure into a routed recovery story with evidence.

Benji Rios

Mar 21, 2023 · 06:05 PM

If you adopt meshwork, I'd also make sure the UI can distinguish user-chosen fallback from auto-routed fallback.

Otherwise metrics and support tickets get weird because you can't tell intent from failure.