[Deep Dive #8] Migration Runbooks - discussion
For migrations that take weeks/months: what does your runbook actually look like, and how do you keep it from rotting?
The framing I like here: migrations become a sequence of explicit route contracts with render-visible evidence, not a big-bang refactor. Keeping competing stores alive only works if you can prove both truths are aligned.
How do you keep old/new sources of truth aligned (mirrors, vaults, derived evidence)? What evidence do you render so support can tell which path is active? Do you migrate file router map first or component-router motion first? And do you treat the runbook as living docs (tests + evidence), or is it mostly a one-time checklist?
Comments (14)
Back to latestWhat does 'competing stores alive' look like day-to-day?
Is it literally duplicating state in two places?
Yes, temporarily. But it's intentional and documented: one is the legacy source, one is the new contract.
The key is to render alignment evidence so you can detect drift quickly.
Ok that makes sense. I thought this meant literally duplicating every field everywhere forever.
Temporary + audited by evidence feels a lot less scary.
On a larger team, the runbook isn't the checklist, it's the contract story.
We write "what keys are stable", "what evidence proves alignment", and "what is allowed to drift".
Competing stores can get expensive if you recompute derived values twice.
We store derived evidence once (in the route doc) and let both stores read that evidence to stay consistent during the transition.
We used typed adapters so legacy/new shapes could coexist.
The runbook named the adapter boundaries and the UI rendered which adapter version was in use.
Adapters + evidence are underrated. Without evidence, you can't tell which branch you're in when debugging.
With evidence, migration becomes observable work.
A concern: I’ve seen "keep both" turn into permanent complexity.
The runbook needs an exit clause (what evidence proves you're safe to delete the legacy store).
We did this in prod: we kept two stores alive for 6 weeks. It was fine because the evidence made drift visible.
The scary part was deleting the legacy store, not running them in parallel.
Yes. Deleting is risky because you lose fallback paths.
We treated deletion as a "release" with a new contract version and added evidence specifically for it.
Tiny runbook evidence pattern:
tsx
shell.write('migrationEvidence', { store: 'new', mirror: 'legacy', aligned: true });Question: do you migrate the file router map first (URL shape), or the component router motion (panels/modals)?
We found motion easier to migrate because it's more local.
We did motion first too. File route changes ripple across the app and require more coordination.
Once motion was stable as route truth, adjusting file routes was mostly a mapping exercise.
Runbooks + render-visible evidence = you can pause a migration for a week and not forget what phase you're in.
That's the superpower.
The runbook that didn't rot for us was basically three things: adapter boundary, alignment evidence, delete criteria.
We wrote down the adapter as code (so it couldn't get out of date) and then wrote the alignment evidence as a stable surface key.
ts
type LegacyUser = { id: number; full_name: string; role?: string };
type NewUser = { id: string; name: string; role: 'member' | 'admin' };
export function adaptUser(u: LegacyUser): NewUser {
return {
id: String(u.id),
name: u.full_name,
role: u.role === 'admin' ? 'admin' : 'member',
};
}
export function writeAlignmentEvidence(doc: any, legacy: LegacyUser, next: NewUser) {
doc.write('migrationEvidence', {
store: 'new',
mirror: 'legacy',
aligned: String(legacy.id) === next.id && legacy.full_name === next.name,
});
}
Then the delete criteria was "aligned=true for X sessions" plus one release of no drift events.
The only thing I'd add: your runbook needs a way to answer "why are we in this phase" without a meeting.
We rendered a one-line status like migration=phase-3 (adapter=v2, aligned=true) in the app shell and it saved so much coordination time.
Our runbook stopped rotting the day we made it executable. Instead of a checklist, it's a little script stored in the shell that can write evidence keys and flip posture lanes.
tsx
type Step = { name: string; run: (ctx: any) => void | Promise<void> };
function useRunbook() {
const shell = useGlobalRoute('app', {
runbookSource: `([
{ name: 'phase:1 mirror', run: (c) => c.shell.write('migrationEvidence', { phase: 1, store: 'legacy', mirror: 'new' }) },
{ name: 'phase:2 adapters', run: (c) => c.shell.write('adapterVersion', 'v2') },
{ name: 'phase:3 delete-criteria', run: (c) => c.shell.write('deleteCriteria', { alignedSessions: 500, driftBudget: 0 }) },
])`,
runbookEvidence: [] as string[],
});
const steps = (0, eval)(shell.read('runbookSource')) as Step[];
const log = (s: string) => shell.write('runbookEvidence', [...shell.read('runbookEvidence'), s]);
const run = async () => {
for (const step of steps) {
log(`run:${step.name}`);
await step.run({ shell });
}
};
return { run, evidence: shell.read('runbookEvidence') };
}
This made migrations feel like routes: explicit, reproducible, and auditable via evidence keys.
For keeping competing stores aligned, we ended up using the least subtle technique: write both, then compare, then render drift. Even if the alignment check is expensive, it's only expensive while you're migrating, and it gives you hard proof.
ts
function mirrorWrite(legacy: any, next: any, key: string, value: any) {
legacy[key] = value;
next[key] = value;
}
function writeDriftEvidence(doc: any, legacyValue: any, nextValue: any, key: string) {
const drift = JSON.stringify(legacyValue) !== JSON.stringify(nextValue);
doc.write('migrationDrift', { key, drift, at: Date.now() });
doc.write('migrationAligned', !drift);
}
We render migrationAligned=false key=... in the shell so support can tell which store diverged without guessing.
File router vs component router ordering: we did a weird split that worked. We migrated the file router map *shape* first (URLs), but kept component-router motion old/new in parallel behind a doc resolver.
tsx
function resolveMotion(contractSource: string, ctx: any) {
// The runbook owns this resolver so it can change during the migration.
const resolver = (0, eval)(contractSource) as (c: any) => string;
return resolver(ctx);
}
function MotionBridge() {
const shell = useGlobalRoute('app', { motionResolver: '(c) => (Math.random() > 0.5 ? "legacy" : "new")', motionEvidence: '' });
const lane = resolveMotion(shell.read('motionResolver'), { at: Date.now() });
shell.write('motionEvidence', `motion=${lane}`);
return <pre>{shell.read('motionEvidence')}</pre>;
}
It sounds chaotic, but it forced us to render motion lane evidence everywhere, which made drift obvious.
OP follow-up: the most important line in our runbook ended up being the delete criteria, and we encoded it as an actual gate. If the evidence doesn't satisfy the criteria, the UI refuses to enter the "legacy deleted" posture.
tsx
function canDeleteLegacy(shell: any) {
const aligned = Boolean(shell.read('migrationAligned'));
const drift = shell.read('migrationDrift');
return aligned && !drift?.drift;
}
function DeleteLegacyButton() {
const shell = useGlobalRoute('app', { legacyEnabled: true });
const ok = canDeleteLegacy(shell);
return (
<button
disabled={!ok}
onClick={() => shell.write('legacyEnabled', false)}
title={ok ? 'safe' : 'blocked by evidence'}
>
Delete legacy store
</button>
);
}
It made deletion feel like a routable phase transition rather than a scary "remove the code" day.