engineering@contactually

Migrating Data From MongoDB to PostgreSQL with No Downtime

Photo by Gareth Davies on Unsplash Why Migrate? In the early days of Contactually, before the product was fully fleshed out and requirements were clear, MongoDB was chosen to house certain portions of our applications data. MongoDB has many valid use-cases, but ours did not fit, and a change needed to be made. In this case, that meant migrating data from our MongoDB cluster into our relational datastore, PostgreSQL. While conceptually this migration is easy to understand, our transition would have failed if it weren’t for this 3-step deployment plan. Important Not to Disrupt Regular Product Usage As we discussed the primary goals for the migration, our number one priority was to make sure that users would suffer neither downtime nor degradation of the product. Additionally, we wanted to make sure that any changes during the migration would not be missed. During a few white-boarding sessions, we came up with a strategy that allowed us to accomplish these main goals. High Level Strategy — Duplicate Writes with a Series of Deployments So — how did we do it? Our high level strategy was to write to both systems simultaneously, with a series of deployments to eventually start reading from the new system. Create New PostgreSQL Schema The first step was to compare the schema (or most recent schema) of the Mongoid class that we intended on migrating, and come up with an appropriate PostgreSQL table schema. In some cases, this wasn’t too disruptive as we were using PostgreSQL v9.6.x, so JSON columns were an option when strings, integers, datetime, or text columns wouldn’t suffice. In many cases, this was also a chance for us to clean up the older...

Contactually’s Brand New Android Application

This post first appeared on blog.contactually.com here. Remember when Apple came out with the original iMac in ’98? Desktop computers were all the rage with their rounded edges and brightly colored backs. A laptop? Too heavy to carry around! A smartphone? Not even a word in most people’s vocabulary. Now we can’t imagine a life without our phones in our pockets and available at all times. We’ve recognized not only how important mobile is in your daily lives, but also to the core of your business. We’re introducing a completely redesigned Android application, rebuilt from the ground up to ensure that the powerful experience you’ve become used to with Contactually can extend well beyond your computer — right to your pocket. Stability and performance on par with the best Today’s fast-paced world has no time for technical downfalls or setbacks. When you’re counting on your apps to maintain and track every detail of your business, you can’t afford to lose precious minutes due to system downtime. To ensure stability and performance, we’ve utilized React Native, the same technology employed by the likes of Instagram and Facebook to support their tens of millions of users. Not only does React Native maintain the stability and performance you’ve come to expect with Contactually, but it will also allow us to grow and support our customers at faster rates going forward. Easily interact in ways that come naturally The most heated debate in the mobile world is between iOS and Android, but there really isn’t a clear winner. The key takeaway is that they’re both unique in their own way. Contactually’s Android 3.0 is built for true Android users,...

Interning with the Contactually Engineering Team

In April of 2017 I began a four month internship at Contactually in DC. Prior to my internship, I graduated from a Coding Bootcamp in New York City in September 2016. I spent the last 3+ years working in Human Resources at a nonprofit and after attending the World’s Maker Faire, which renewed my interest in technology that I had as a child, I decided to switch careers to tech. Before entering the world of HR, I worked as a Specialist at an Apple Store. As a result, I brought my love of technology and experiences at Apple with me to my HR position and found myself creating and integrating technological solutions into existing HR processes. When I’m not coding I enjoy tinkering with robotics, crafting dollhouse miniatures, and all things horror (movies, books, podcasts, etc). Getting Started My experience, like so many that have come from non-engineering backgrounds, for one reason or another is becoming more and more common. I was in a job that wasn’t challenging enough, happen to be in the right place at the right time, and stumbled upon the existence of coding bootcamps. After many months of research and self teaching, I jumped into this new world and over the course of a year I changed my life. It wasn’t an easy transition by any means, but it was definitely worth it. I graduated from Actualize (located in NYC) in September 2016 and before graduating I began what would be a six month journey of searching for a place to test out my new skill set and learn from others in the industry. In...

Creating Scalable Business Intelligence with Redshift, Stitch, Fivetran, and Looker

Companies today strive to be more data-driven than ever before. There is a pervasive thought in business that data can be the differentiator between you and your competitors. If you collect the right data and leverage it effectively, you can create a competitive advantage that can bring strong and sustained growth to your business. However, the keys to creating a true data-driven company lie in the data infrastructure of your company. In order for data to be effectively used to make decisions, it needs to be collected and stored in readily available systems. Furthermore, data in those systems must be cleaned and transformed into usable chunks that business owners can manipulate and visualize in order to gain insight. As your company continues to grow, those systems and processes that store and cleanse your data need to grow as well. Thankfully, the proliferation of cloud technologies has created an abundance of options for not only delivering data storage and transformation capability but also allowing companies to scale their data infrastructures as they grow. Here at Contactually, we’ve built a robust data infrastructure on cloud technologies which have allowed us to improve our data quality as well as maintain the flexibility to introduce data from new services and integrate them seamlessly into our existing architecture. We’ll focus on the way we’ve designed our data infrastructure to give you an idea of how a fast-paced startup deals with delivering data insights. Amazon Redshift — Cloud Storage with Analytics In Mind AWS Redshift is a data warehousing solution that combines a scalable storage solution with high performance querying on structured data. Amazon Web Services has become a major...

Adding Read Replicas in a Production Ruby on Rails System with Zero Downtime Using Makara

Thaddaeus Lim When query, schema and index optimizations aren’t enough to support the load of a high throughput database system, fundamental changes in how data flows through that system are often necessary. Over the past few years, our team at Contactually has been fortunate enough to tackle numerous database performance issues, ranging from autovacuum tuning to full production database upgrades. Growth as a company doesn’t occur without these types of issues and we’re happy to share our experiences with our peers. Why Splitting Your Read/Write Queries is Necessary Load balancing is a fundamental concept in any system design. It’s important to design your data systems to handle the peaks or spikes in usage, as opposed to designing to handle the average load a system experience. By splitting queries between your primary and follower databases, you’re able to ensure that users interacting with your web application have a snappy and performant experience, while saving heavy background jobs for a follower. How We Researched Potential Solutions There are a few options when it comes to choosing a database adapter that supports primary/follower query balancing. After evaluating Makara and octopus, we decided to go with Makara for the following reasons: Thread-safety: we use Sidekiq for our background job processing, so a non-thread-safe database adapter was a deal-breaker. Community support / used in large production environments. We read about a few companies experience using Makara, namely Instacart. In our research, we discovered a gem called distribute_reads (made by Instacart), that sealed the deal for us. It also fed directly into our ideal incremental release strategy. Plan a Detailed Roll Out Strategy Given the importance of...

Tackling Architectural Debt: How We Replaced a Production Elasticsearch Cluster

Photo by Alice Pasqual As the quantity and complexity of application data scales with any burgeoning startup, we all run into performance and scalability issues. Some issues can be addressed with slight adjustments to existing infrastructure, while others require fundamental changes to the system’s architecture. At Contactually, one such instance of this was dealing with an unhealthy Elasticsearch cluster that suffered from availability issues, syncing issues, and fundamental inefficiencies of data flow through our systems. This article will focus primarily on the large architectural change we made at Contactually in June of 2016 to reinforce an unreliable Elasticsearch cluster. Why We Use Elasticsearch at Contactually Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected. At Contactually, we use Elasticsearch to allow our users to quickly filter their contacts by numerous search criteria such as name, email address, mailing address, phone number, zip code, and even custom fields the user has created. Speed is key here — many of our users have hundreds of thousands of contacts. We store billions of data points in our normalized, relational data model. I won’t go into the pros and cons of normalization in this article, but they are very important to consider when designing any high-throughput system. Denormalized document storage Imagine trying to find a record in your database by attributes that could be in any of 10 tables. This might require an extremely expensive join, especially if the scale of the data you’re working with is...