Migrating Data From MongoDB to PostgreSQL with No Downtime

Migrating Data From MongoDB to PostgreSQL with No Downtime

This post was originally published on Medium
Photo by Gareth Davies on Unsplash

Why Migrate?

In the early days of Contactually, before the product was fully fleshed out and requirements were clear, MongoDB was chosen to house certain portions of our applications data.

MongoDB has many valid use-cases, but ours did not fit, and a change needed to be made. In this case, that meant migrating data from our MongoDB cluster into our relational datastore, PostgreSQL. While conceptually this migration is easy to understand, our transition would have failed if it weren’t for this 3-step deployment plan.

Important Not to Disrupt Regular Product Usage

As we discussed the primary goals for the migration, our number one priority was to make sure that users would suffer neither downtime nor degradation of the product.

Additionally, we wanted to make sure that any changes during the migration would not be missed. During a few white-boarding sessions, we came up with a strategy that allowed us to accomplish these main goals.

High Level Strategy — Duplicate Writes with a Series of Deployments

So — how did we do it? Our high level strategy was to write to both systems simultaneously, with a series of deployments to eventually start reading from the new system.

Create New PostgreSQL Schema

The first step was to compare the schema (or most recent schema) of the Mongoid class that we intended on migrating, and come up with an appropriate PostgreSQL table schema. In some cases, this wasn’t too disruptive as we were using PostgreSQL v9.6.x, so JSON columns were an option when strings, integers, datetime, or text columns wouldn’t suffice.

In many cases, this was also a chance for us to clean up the older schema, removing data or attributes that were no longer needed. Projects like these are a great chance to pay down some technical debt while you’re under the hood.

Deployment 1: Write to Both Systems, Read from Mongo

In our initial deployment, we wrote the new ActiveRecord class, a migration to create new tables, and a migration job to write to both databases. In order to make sure we didn’t miss any new data, we added callbacks to the Mongoid class that would enqueue the migration job.

One important point to make here is that the migration job needs to be both performant, as well as idempotent. This is a best practice for any background job, but is especially important for a migration. We achieved this using ActiveRecord’s import, which uses PostgreSQL’s UPSERT syntax and ON CONFLICT clause.

class CreateDataSources < ActiveRecord::Migration
def up
create_table :data_sources do |t|
t.string :remote_id
t.integer :status
t.datetime :status_set_at
t.json :raw_response
t.string :old_data_source_id

add_index :data_sources, :remote_id
add_index :data_sources, :old_data_source_id

drop_table :data_sources

class NewDataSource < ApplicationRecord
table_name :data_sources

:remote_id, :presence => true
class MigrateFullContact
include Sidekiq::Worker
sidekiq_options :queue => :one_off, :retry => 3

def perform(full_contact_ids)
# Find Mongoid records
# Find ActiveRecord objects that have already been migrated
# Build objects in memory that need to be created/updated
# Bulk insert/update records using ActiveRecord import
class DataSource
include Mongoid::Document
include Mongoid::Timestamps

after_save :enqueue_migration

def enqueue_migration
Sidekiq::Client.enqueue(MigrateDataSource, id.to_s)

Deployment 2: Read/Write from New ActiveRecord Class

After this initial deployment, we enqueued around 150,000 jobs to migrate all existing records in batches of 1,000. We knew this might take 12–36 hours based on the size of the collection, the load on each database, and the resources allocated to each database. We used this time to write the new code that would both read and write from the new ActiveRecord class.

Once the migration was complete, we waited for an off-peak time range, and deployed the new code. One important distinction here is that we left the old collection and class in place in case we caught any issues in which we’d quickly need access to the old data. Naturally you would want to avoid a disaster recovery scenario such as rolling back. We were able to avoid any disasters with a solid plan, thorough pull request reviews, dry-runs in a staging environment, and triple-checking any move we were about to make.

Having tools in place to monitor the performance and error rates of the migration code and new code is critical to determining whether or not the migration was a success. At Contactually, we use NewRelic in order to monitor application error rates.

Deployment 3: Remove Old Mongoid Class and Mongo Collection

The last deployment is arguably the most fun step of the migration. Once the dust has settled, performance looks good, no app issues, and error rates are reasonable, it’s time to clean up.

We made backups of any collections before dropping them, and then proceeded to remove all references of OldDataSource throughout the application, as well as the migration code.

In Summary — Plan Thoroughly!

Shot by Chuttersnap

In some ways, these projects can feel like moving cargo between two cargo ships with neither one slowing down. User interaction with the application must remain unaffected while migrating — which can be quite difficult if the object being migrated is read from or written to with high frequency.

Large-scale data migrations can be challenging, but with proper and thorough planning, they can be accomplished without any users or other stakeholders noticing. That’s the #1 sign of a well-executed migration.

For millions of professionals, relationships are the backbone of a viable business. Whether you’re working with clients, prospects, or potential investors, Contactually helps you build stronger relationships with the people who can make you successful.

Contactually is looking for experienced software engineers who are passionate about solving complex problems with code. We’ve taken a novel approach to building business software — focus on the end user — and it’s been working! Our users love us. Come help us build a product that makes contact management easy and rescue 10,000s of people from the jaws of clunky, outdated software.

Migrating Data From MongoDB to PostgreSQL with No Downtime was originally published in Contactually Product and Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Submit a Comment

Your email address will not be published. Required fields are marked *