engineering@contactually

Adding Read Replicas in a Production Ruby on Rails System with Zero Downtime Using Makara

Thaddaeus Lim When query, schema and index optimizations aren’t enough to support the load of a high throughput database system, fundamental changes in how data flows through that system are often necessary. Over the past few years, our team at Contactually has been fortunate enough to tackle numerous database performance issues, ranging from autovacuum tuning to full production database upgrades. Growth as a company doesn’t occur without these types of issues and we’re happy to share our experiences with our peers. Why Splitting Your Read/Write Queries is Necessary Load balancing is a fundamental concept in any system design. It’s important to design your data systems to handle the peaks or spikes in usage, as opposed to designing to handle the average load a system experience. By splitting queries between your primary and follower databases, you’re able to ensure that users interacting with your web application have a snappy and performant experience, while saving heavy background jobs for a follower. How We Researched Potential Solutions There are a few options when it comes to choosing a database adapter that supports primary/follower query balancing. After evaluating Makara and octopus, we decided to go with Makara for the following reasons: Thread-safety: we use Sidekiq for our background job processing, so a non-thread-safe database adapter was a deal-breaker. Community support / used in large production environments. We read about a few companies experience using Makara, namely Instacart. In our research, we discovered a gem called distribute_reads (made by Instacart), that sealed the deal for us. It also fed directly into our ideal incremental release strategy. Plan a Detailed Roll Out Strategy Given the importance of...

Tackling Architectural Debt: How We Replaced a Production Elasticsearch Cluster

Photo by Alice Pasqual As the quantity and complexity of application data scales with any burgeoning startup, we all run into performance and scalability issues. Some issues can be addressed with slight adjustments to existing infrastructure, while others require fundamental changes to the system’s architecture. At Contactually, one such instance of this was dealing with an unhealthy Elasticsearch cluster that suffered from availability issues, syncing issues, and fundamental inefficiencies of data flow through our systems. This article will focus primarily on the large architectural change we made at Contactually in June of 2016 to reinforce an unreliable Elasticsearch cluster. Why We Use Elasticsearch at Contactually Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected. At Contactually, we use Elasticsearch to allow our users to quickly filter their contacts by numerous search criteria such as name, email address, mailing address, phone number, zip code, and even custom fields the user has created. Speed is key here — many of our users have hundreds of thousands of contacts. We store billions of data points in our normalized, relational data model. I won’t go into the pros and cons of normalization in this article, but they are very important to consider when designing any high-throughput system. Denormalized document storage Imagine trying to find a record in your database by attributes that could be in any of 10 tables. This might require an extremely expensive join, especially if the scale of the data you’re working with is...

5 Aspects of Rapid Feature Validation & Iteration

When Eric Ries published The Lean Startup in 2011, he outlined foundational principles of rapidly testing business hypothesis through product experiments. Startups like Contactually know they should split-test, be data-driven, and “Move fast and break things”, but actually applying these principles is another story. Rapidly validating and iterating on product features requires your team to establish tooling and methodology for doing so, which requires explicit effort and commitment. Contactually product development process has evolved into a fairly effective but simple pattern for running experiments that involves 5 different areas of methodology. These 5 aspects of experiment execution have allowed us to drive powerful business outcomes like improving our onboarding’s user activation rate by 20% Additionally, these 5 areas serve to reasonable balance several types of assessments: Subjective and objective quality of changes Qualitative and quantitative effects of changes Short and long term effects of changes 1. A/B Testing Tools Key to any controlled experiment is tooling that enables you to apply the new functionality to an experimental group of users, while maintaining a control group to compare against. While many A/B testing tools exist for swapping out HTML or front-end components, Contactually wanted the ability to control deeper backend behavior (Ruby on Rails) based on experiment groupings. Things like changing scoring algorithms, triggering alternative background jobs, exposing or not exposing certain ActiveRecord relationships. This led us to write our own Rails service to control A/B experiments. Contactually’s “Feature Flipper” Service Our custom-rolled Feature Flipper service allows us to create and manipulate experiment “Features” via a Rails console, or an internal admin panel for our non-technical employees, and is backed by Redis. Our FeatureFlipper service give us the following...

Engineering Great Design Process

Contactually is a company all about people and relationships. We love our customers — and hope they love us too! For this reason and many others, strong usability and user experience design are critically important to the success of our business. I recently joined Contactually to lead the user experience (UX) team, after having held similar roles in companies such as AddThis, Oracle, and America Online. In this blog post I’ll share some strategies that our designers, product managers, and developers use to successfully collaborate and build great experiences together. Establish Design Standards Frontify Design resources such as style guidelines, design pattern libraries, and corresponding front-end component libraries aren’t just about consistency. These tools work together to help teams design, build, and iterate faster. At Contactually we make style guidelines and brand assets available to the larger organization using Frontify, and are in the process of converting the Contactually web application to React, using Semantic UI and Storybook, a UI component development environment for React. React is a component-based Javascript library for building interfaces, and will allow us to reuse interface components across the Contactually platform. Tools such as these help us create more consistent, customer-friendly experiences more quickly — a big win for a small team. Storybook interface for React Components Balance Building, Iterating, and Fixing Product design and development roadmaps always require trade-offs. There are new features to be explored, existing features to test and iterate, customer questions and concerns to address, and bugs to fix. As an agile design and development team, we plan sprints by distributing a percentage of “points” — which correspond to available developer time — to each of those priorities. We collaborate closely with...

Postgres at Scale: Query Performance and Autovacuuming for Large Tables

Photo by Yu-chuan Hsu There are few large, hard to solve problems that keep your typical software engineer up at night — and a malfunctioning database is absolutely one of them. We’ll walk you through how our team at Contactually discovered the issue of query performance slippage on some of our largest tables (200 mil — 3 billion+ records) and what we did to address it. The Problem: Systematic Degraded Query Performance In early 2016, we noticed query performance on our largest tables were starting to slip in NewRelic. Over the course of a few weeks, queries that were once taking 10ms were now taking upwards of 6 seconds or more. The team considered numerous causes from poorly written queries to expensive background jobs that may be causing systematic performance issues. Eventually, with the aid of NewRelic, Amazon RDS logs, and hundreds of EXPLAIN queries, we came to the conclusion that this was a database level issue, not poorly written queries in our application (a very common cause of poor database performance). The Culprit: Default Autovacuum Settings After days of research, we narrowed in on the cause: the default autovacuum settings for large tables. Photo by Clem Onojeghuo The default settings can cause weeks to elapse without triggering an autovacuuming process to run on a large table. When autovacuuming doesn’t occur, the query planner is using outdated, incorrect data to decide how to most efficiently execute a query. Imagine trying to tell a friend where the milk is in a grocery store using directions from what the store looked like 5 years ago — you’re friend is going to waste a lot of time using your outdated directions. Critical...

Empowering Developers with Consistent, Safe, and Simple Deploy Scripts

Deploying software is a notoriously risky portion of the software development lifecycle. Our industry is littered with stories of ill-conceived deployments that caused huge headaches to deal with and correct. As developers who enjoy the creative output of programming, it’s easy to neglect the operational details of deployments despite the fact that it’s widely believed that simple deployment processes make for happier developers. At Contactually, investing even a small amount of time to develop fairly simple deployment scripts has paid off huge dividends. Let’s look at what we wanted to accomplish with our deploy scripts, how our scripts do this, and what benefits this has provided our development team with. The Goal of our Deployment Scripts We want to have deployment scripts for all major components of our application/infrastructure. In our case: A ReactJS frontend hosted on Amazon S3 A Ruby on Rails backend deployed on Heroku We want to handle deployments of our components to all possible environments. In our case: Staging Production We want to use minimal setup on behalf of the developer, and use existing credentials and permissions to perform deployment actions. What our Deployment Scripts Do We have one deploy script for each of our main repos: our Rails backend and our React frontend. These two scripts have minor variations, but roughly execute the same steps. Some of the steps executed by the scripts are contained in their own bash scripts or rake files. Pre-deployment CLI checks & confirmations First, our scripts run through a series of automatic checks, or forced developer confirmations to help prevent against ill-advised deployments. Confirm which branch you are deploying to which environment Confirm continuous integration specs are passing...