At Vivian Health, data is everything—it’s what underpins their core platform that connects healthcare professionals with the right job opportunities. Max Calehuff, Data Engineer at Vivian Health, sheds light on the “transformative” initiatives undertaken by the company's data team, who handles both 1) the machine learning training for their product, and 2) data modeling for the company's internal dashboards. To meet the need for data, Max sits on a team of 10, including data engineers, analysts, and machine learning engineers. Between him and one additional data engineer, they tackle everything from:
- Creating and maintaining pipelines
- Data cleansing
- Source integration research
- Ensuring data integrity stays high
- Translating business asks into data models
Vivian’s Data Stack
To serve Vivian's vast data needs, the team set up a best-in-class data stack.
- Snowflake: the data warehouse solution at the core of their infrastructure
- Various data ingestion solutions: Vivian combines Stitch, Airflow, Kinesis Firehose, and in-house builds, to funnel data into Snowflake. Upstream sources include multiple PostgreSQL databases that serve different workloads, various business applications, and events captured by Segment.
- Looker: the business intelligence tool that empowers business teams with visualizations and reporting capabilities.
- dbt Cloud: the transformation solution to enhance Vivian's data modeling processes.
- Census: reverse ETL to pipe curated data back into business applications such as Salesforce or Amplitude.
- Metaplane: data observability with integrations to Postgres, dbt, Snowflake, and Looker
How Vivian uses dbt to expand modeling capabilities
The Vivan team knew they needed a powerful and accessible transformation tool to align with the capabilities of Snowflake's Data Cloud and power their customer experience.
Everything from analytics tables to machine learning models is transformed and modeled in dbt. For example, the training data for ML models used to show the most accurate recommendations to job seekers in-product. The data team models data from various upstream sources—their CRM, web traffic, and application events—in dbt, using DRY code to create repeatable data models fit for the scale needed to ensure that their job recommendations continue to improve.
Vivian’s data team also relies on dbt to model data for internal use cases including:
- Analytics: the same web and application data models feed into internal product performance dashboards, as well as customer success and marketing dashboards. These business-focused visualizations are used to increase customer retention and usage rates, as well as inform organic search strategies to attract more customers.
- In-App Decision Making: Beyond Looker dashboards, the team funnels their dbt models further downstream using Census to send modeled data feeds back into business applications themselves, making it even more convenient for internal stakeholders to draw insights in the tools they’re familiar with.
Across use cases, the team benefits from dbt Cloud’s testing so data practitioners can write and test analytics code all in one place. “The ability to load sample data into dbt allows us to verify our models work as intended before deployment," said Max.
A Transformative Solution
Creating data models with both SQL and Python in dbt has allowed the data team's analysts to quickly self-serve the data they need without having to wait for the infrastructure-focused data engineers to assist them.
In the past, Max has been part of teams that relied on performing transformations exclusively within business intelligence (BI) tools like Tableau and Looker. Max recalls that when using Persistent Derived Tables in Looker to hold transformed data, the approach not only required specialized LookML knowledge to set up, but also led to more fragile code, resulting in wasted time on constant maintenance.
With dbt, the sheer amount of models and supported dashboards speaks for itself. By having one tool where the data team can write analytics code in SQL or Python, they’ve tripled the number of people on the team who are able to create models.
Going forward, “we want to use the Python functionality more. It’s already given us results,” said Max.
The increase in development velocity frees up time for Max and other engineers on the team to focus on more strategic initiatives, such as improving the machine learning models and writing advanced tests.
“Everyone on the team knows how to create a dbt model, and with only 2 data engineers that traditionally were responsible for modeling, that’s pretty cool. I can make a new dbt model as fast as it takes me to write a SQL query. It’s so much faster than how we did things in my previous roles. I could not go back, ever," emphasized Max.
How Vivian uses Metaplane to improve data quality coverage
With customer careers and by extension, healthcare patient experiences, at stake, Vivian’s data team knew that data quality needed to be prioritized. The search for a data observability tool came shortly after incorporating other components into their data stack, prompted by a series of data incidents, with downstream impacts that were difficult to identify and equally as difficult to fix.
The usual solution would be to deploy unit tests on their pipeline, but there were several issues with this workflow:
- Time-consuming: Setting up tests for a single pipeline could take anywhere from an hour to an entire day.
- Test accuracy: Test thresholds needed to be updated over time as their data and expectations changed.
- Scale for new objects: consistent product growth meant ingesting net new sources and creating new models, which weren't monitored.
All of this led to the need to look for a more scalable solution to data quality monitoring.
Luckily at that time, the team was already using dbt and started off by implementing dbt tests. They were able to capture incidents with causes such as stale data but wanted to continue their success with more test types and additional object coverage.
Higher data quality, better patient care
With the implementation of Metaplane, Vivian Health now has a comprehensive solution to monitor incidents across their entire data pipeline. Data quality is monitored from the upstream transactional PostgreSQL database to dbt-modeled tables in Snowflake, with the ability to see how incidents impact dashboards in Looker.
“If I get a Metaplane alert, it’s always something that’s gone wrong, which is what we want. We want to ensure that we’re catching everything without over-alerting,” said Max.
The practical application of Metaplane within Vivian Health's operations spans across a wide variety of use cases such as:
- SEO Analytics: For web analytics, Metaplane plays a role in verifying the stream of events, facilitating proactive conversations where the data team can alert the web team to adjust their ingestion pipeline.
- Product development: To support the Vivian Health platform, Metaplane helps ensure that healthcare professionals receive the most relevant job postings by monitoring training data for the recommendation models, in addition to other user experience improvements, such as models that indicate when a posting might be stale or inaccurate.
After researching and evaluating other data observability solutions, Vivian Health found Metaplane to be the most user-friendly, cost-effective, and, most importantly, capable of identifying data quality issues. Their implementation improved their ability to monitor data integrity for all of their critical objects, resulting in improved data reliability and more efficient incident resolution, while saving time spent setting up and maintaining acceptable thresholds for data quality tests. Outside of the product, Max also added:
“We've been using Metaplane for at least a year and a half...I've seen the many improvements that Metaplane's made. It's pretty cool to talk directly to developers and 90% of the time, when I bring up a feature, it's either being worked on, or, they'll just tell me: 'Oh we fixed that already. I'll turn it on for you now'."
Using dbt with Metaplane
Today, Vivian Health still uses dbt Cloud and Metaplane side by side. Not only do they continue to rely on dbt tests, but also use Metaplane’s monitors to track the outputs of those models, while deploying similar monitor types on their tables. As any good data engineer knows, it’s a good practice to set up comprehensive data quality solutions, particularly when data is of such high importance to the success of the company.
One use case where both tools shine together is in their relatively new use of Snowpark. The team wanted to use Snowpark’s ability to execute more complex Python transformations to generate training data weekly. In the process of testing Snowpark, Vivian set up a new compute warehouse and infrastructure efficiency monitoring to track warehouse usage. Rather than undergo the arduous process of calling Snowflake’s API and parsing the JSON into usable tables, they elected to use dbt to model the usage data, and placed Metaplane on top of that output table to alert whenever costs spiked.
In addition to monitoring the outputs of models, Metaplane also monitors the job runtimes themselves, to alert the team to additional latency that might impact their downstream usage.
dbt has allowed both analysts and engineers to generate models faster than ever before, accelerating work directly impacting customer satisfaction and retention. And across all data pipelines, Metaplane provides full data quality coverage, saving weeks of effort in creating & updating custom tests. "We had an OKR last quarter related to data quality that Metaplane helped us achieve,” said Max.
“dbt is a critical part of our infrastructure, and Metaplane allows us to ensure that it is running smoothly around the clock," Max concluded.