Large mechanical migrations are miserable. You know the kind: a naming convention was wrong, a pattern propagated across 20 files over three years, and now fixing it correctly means touching everything at once. This means creating a monster PR that’s impossible to review, impossible to revert cleanly, and guaranteed to conflict with everything else in flight.
The standard response is to either live with the debt (“it’s not broken enough”) or designate a week and a brave engineer. Neither is satisfying.
There’s a better shape for this kind of work, and AI agents make it simple and practical in a way it wasn’t before.
The setup: enumerate, then stack
The key insight is that a large migration is really N small migrations wearing a trench coat. The first job, for a human, is to define the standard:
what does correct look like, what does incorrect look like, and what’s the mechanical rule that transforms one into the other.
That definition is the hard intellectual work, and it requires understanding the codebase deeply enough to anticipate edge cases and backwards-compatibility constraints.
Once the standard exists, enumerate the migration units. In a file-per-rule codebase this might be one branch per file. In a service-oriented codebase it might be one branch per service. The unit should be: the smallest thing that’s independently reviewable and independently revertible.
Where the AI comes in
Hand the stack to an AI agent with three instructions: make the change, fix the tests, create a downstream branch after each amendment for the next one, repeat.
The agent is good at exactly the parts that are tedious for humans:
- Applying the same mechanical transformation consistently across dozens of call sites, without missing one or adding a typo
- Running the test suite, reading failure output, and identifying which assertions reference the old format
- Updating test fixtures to match new output without changing what the test is actually verifying
- Recomputing the rebase chain after an amendment — the bookkeeping of “which branches need to move because I just changed their parent”
These are the parts where human attention drifts. An engineer doing this manually will get to file 14 and start making small errors, or lose track of which downstream branches need rebasing, or get interrupted and lose context. The agent doesn’t drift.
The human stays in the loop where it matters
The human’s job throughout is:
- Defining the standard and its edge cases upfront
- Reviewing each PR diff (which is small and focused — one file, one pattern)
- Handling genuinely ambiguous cases the agent flags
- Merging, in order, bottom-up
The review burden per PR is low precisely because the scope is constrained. A reviewer isn’t asked “is this 2000-line migration correct?” They’re asked “is this file’s migration correct?” That’s a question a reviewer can actually answer confidently.
Backwards compatibility as a first-class concern
One thing this workflow forces you to get right upfront: what happens to old data? In any migration that changes identifiers, keys, or formats, there will be stored values — in databases, in logs, in external systems — that use the old shape. The migration plan has to specify how old-shape values are handled going forward, and that handling has to be in every migrated unit, permanently.
Getting this wrong is easy when you’re moving fast. Having it as an explicit checklist item per PR — “does this unit explain old-shape values correctly?” — makes it hard to miss.
What you end up with
At the end, you have a stack of small, reviewable, revertible PRs that each represent one unit of a coherent migration. The test suite passes at every layer. The codebase has a new invariant that didn’t exist before. And there’s a static enforcement test that makes it impossible to accidentally reintroduce the old pattern.
The migration that would have been a week of careful manual work, reviewed as a single impenetrable diff, became a stack of boring one-file PRs — each mergeable, each understandable, each safe to revert independently if something goes wrong in production.
The AI didn’t define the problem or own the outcome. It did the part that was always the bottleneck: executing the plan without getting tired.


Leave a comment