← All writing
Engineering

How to Migrate Off Managed Cloud Without Downtime

The real runbook for moving an app off managed hosting onto infrastructure you own: database migration, DNS cutover, backups, and rollback, with the failure points named.

Managed cloud is a great place to start and an expensive place to stay. At some point the bill, the lock-in, or the simple fact that you do not control your own data pushes you to move. The fear that stops most people is downtime. They picture a maintenance window, a broken cutover, and angry users.

You do not need a maintenance window. I have pulled apps off managed hosting onto boxes I own across the portfolio, and the pattern is the same every time. The trick is that the migration runs in parallel with the live system, and the switch at the end is small and reversible. Here is the runbook, with the parts that actually bite called out.

Stand up the destination first

Before you touch anything live, build the new home in full. Provision the server, install the database, deploy the app, wire the environment variables. Point a temporary hostname at it and run the app end to end. You want the new box serving real traffic to you, not to your users, while the old one still serves everyone.

This is also where you decide what you are willing to own. For me that means a self-hosted Postgres box and an app box I control end to end. The destination is not done until you can hit it, log in, and click through the product like a customer would. This is the work HostSSH exists to make boring.

Migrate the database without freezing writes

The database is where downtime hides. A naive dump and restore means the source is changing while you copy, so the destination is stale the moment it finishes.

The clean path is replication. Set the new database up as a replica of the old one and let it catch up and stay caught up. When you are ready to cut over, you stop writes for a few seconds, confirm the replica is at the same position, then promote it. If your managed provider does not expose replication, the fallback is a dump taken at a known point plus a short freeze, but the freeze is seconds, not minutes.

The failure point here is silent data drift: rows written to the old database after you copied but before you cut over. Name that risk out loud and design the cutover so the window where it can happen is tiny and observed.

Cut over DNS with a low TTL set days early

DNS is the other place people get hurt. Records cache for as long as their TTL says, so if your TTL is an hour, some users keep hitting the old box for an hour after you switch.

Drop the TTL on the relevant records to 60 seconds several days before the migration. The change has to propagate first, which is the part people forget. By the time you cut over, every resolver is checking back every minute, so the switch lands fast and predictably.

Keep the old box running and serving after the switch. Do not tear it down. As long as both can serve, a straggler resolver hitting the old one is harmless, not an outage.

Take the backup before, and keep rollback one step away

Back up the source the moment before you freeze writes, and store that backup somewhere neither box depends on. Object storage you control is ideal. This is your floor: whatever happens, you can rebuild from it.

Rollback should be a single move. Because you raised the TTL early and kept the old box alive, undoing the cutover is just pointing DNS back. Decide the rollback trigger in advance, an error rate or a failed health check, so the call is mechanical and not an argument at 2am.

The shape of a clean migration

Build the destination fully. Replicate the data. Lower the TTL days ahead. Freeze for seconds, promote, switch. Watch, and roll back in one step if it goes wrong.

Done this way, a migration is not a dramatic event. It is a quiet switch your users never notice, which is exactly the point of owning your own stack. See how the rest of the portfolio runs.