Back to Blog
Apr 3, 2025, 1:00 PM8 min readOmar Taha Alfaqeer

The Art of the迁移: How I Moved 150K Users to a New Database Without Anyone Noticing

The Art of the迁移: How I Moved 150K Users to a New Database Without Anyone Noticing

The Art of the Migration: How I Moved 150K Users to a New Database Without Anyone Noticing

At Widespace, our WidePay platform was running on MySQL 5.7 — a solid database, but one that had accumulated years of technical debt: no foreign keys on critical tables, inconsistent indexing, and a character set that caused subtle bugs with Arabic names.

We needed to migrate to PostgreSQL 15. While the users slept.

Why We Couldn't Just Stop the World

WidePay processes financial transactions. That means:

  • Zero downtime tolerance: The Saudi workaround for "the system is down" is calling your manager at 2 AM
  • Data integrity is non-negotiable: Losing a single transaction record could mean regulatory penalties
  • Arabic text support: MySQL's utf8 (not utf8mb4, just utf8) was silently corrupting Arabic names andaddresses

The 8-Week Migration Play

Weeks 1-2: Shadow Writes

We set up a dual-write system where every write to MySQL also wrote to PostgreSQL. The PostgreSQL writes were asynchronous and non-blocking — if they failed, we logged but didn't crash.

async def write_transaction(data):
    # Primary write to MySQL (synchronous, must succeed)
    mysql_result = await mysql.insert(data)

    # Shadow write to PostgreSQL (async, logged on failure)
    try:
        await pg.insert(data)
    except Exception as e:
        logger.error(f"Shadow write failed: {e}")
        # Alert on-call, but don't crash the transaction

    return mysql_result

This let us verify that PostgreSQL could handle our write patterns without any risk to production data.

Weeks 3-4: Consistency Verification

We built a nightly job that compared MySQL and PostgreSQL row counts and checksums. The initial results were humbling:

  • 3,218 rows had different data between the two databases
  • 47 rows were missing entirely from PostgreSQL
  • 12 rows had Arabic text corruption in MySQL that was "fixed" during migration

We fixed every discrepancy before moving forward.

Weeks 5-6: Read Migration

We switched 10% of read traffic to PostgreSQL using a feature flag. Then 25%. Then 50%. Then 100%.

Each increase had a monitoring dashboard showing:

  • Query latency (P50, P95, P99)
  • Error rate
  • Data consistency metrics

PostgreSQL was 40% faster on our most common queries. This wasn't surprising — our MySQL tables had no proper indexes for the query patterns we'd evolved over three years.

Week 7: The Big Switch

On a Friday night (lowest traffic in our Saudi market), we:

  1. Enabled maintenance mode for writes (reads still served from PostgreSQL)
  2. Drained all pending MySQL writes
  3. Ran a final consistency check
  4. Switched writes to PostgreSQL
  5. Monitored for 4 hours
  6. Declared victory at 3 AM and went to sleep

Week 8: MySQL Decommission

We kept MySQL running in read-only mode for two weeks as a safety net. Nobody needed it. We turned it off on a Saturday and deleted the EC2 instance that Sunday.

The Three Things That Saved Us

  1. Shadow writes BEFORE switching — We caught 47 write-path bugs before any real data was at risk
  2. Automated consistency checks — A human would never have caught all 3,218 discrepancies
  3. Feature flags for traffic shifting — Being able to roll back to MySQL in seconds gave everyone confidence

What I'd Do Differently

I'd start the consistency verification from day one instead of week 3. Those shadow writes were writing data incorrectly for two weeks, and we had to replay and re-verify everything. Lesson learned: verify as you shadow, not after.

Comments (0)

Sign in to leave a comment