Peak data performance: Moving from dbt Core to dbt Cloud
Sep 27, 2024
LearnMany teams start off on dbt Core to give them a common model for managing their data transformation code. However, at some point, they need a more centralized, governable solution.
That’s where dbt Cloud comes in. dbt Cloud serves as an organization-wide hub for data transformation, metrics, governance, and security. It unlocks a host of features—from data discovery to release management—that teams using dbt Core would otherwise have to build themselves.
Recently, I had the pleasure of speaking with Matt Luzzi, the Director of Analytics for human performance data company WHOOP, about how they managed the transition from dbt Core to dbt Cloud. Learn when and why WHOOP made the move, how teams managed the migration, and the additional business value that WHOOP unlocked by making the move to the cloud.
What brought WHOOP to dbt Core
WHOOP seeks to unlock human performance in a data-driven manner by developing an internal algorithm to analyze biometric data around sleep recovery and strain. The company, which has been around for 12 years, bases everything it does in data.
The analytics team at WHOOP is responsible for helping the company become and remain data-driven. It partners with everyone from marketing to product growth to engineering to run experiments to think about new ways to market or build solutions.
When WHOOP started, it didn’t have dbt or, really, any central data orchestration. It was all SQL scripts created to run ad hoc. That meant analysts had little visibility into how data was created. It also meant the company had no data for success metrics (e.g., data test pass/fail rates, data incidents/table, etc.) on getting new data into production on a daily basis.
Initially, the analytics team used dbt Core as a free solution to create a single, centralized layer for orchestration and transformation. As everyone on the team was SQL-literate, this was a pretty easy transition. It gave WHOOP a single, systematic approach to data transformation that offered greater visibility and code reuse than the previous ad-hoc approach.
The motivation to move to dbt Cloud
However, WHOOP soon realized that dbt Core didn’t solve other lingering problems.
One key problem was that there wasn’t centralized governance or a single source of truth. Two different analysts could be creating more or less the same dbt model and have no visibility into one another’s work.
Another missing piece was scheduling and orchestration. Since dbt Core isn’t a centralized service, the team had to rely on external tooling to enable orchestration. That didn’t afford them real visibility of what failed or which downstream models were skipped. The team knew it needed to switch to a more shared, centralized solution.
At the same time, the team was planning a transition from AWS Redshift to Snowflake. With this migration, they wanted to do more than just “lift and shift” the data. They wanted to ensure that the data was cleaned and well-governed. “We wanted to be really deliberate about what we brought into Snowflake,” Matt said.
Moving to dbt Cloud incrementally
That’s why the analytics team decided to shift to dbt Cloud. Instead of porting over their original dbt Core code, they started from scratch, reassessing their first principles. The team (around 12 people at a time) identified all the metrics they wanted to track across the company and worked backward from there.
This “backwards” approach worked well for the analytics team. First, it guaranteed that everything that ended up in Snowflake was data that they needed. Second, it meant there was a single source of truth for all data sets (as opposed to say, one table for sales and a separate table with similar data for product growth).
WHOOP was also happy to realize that moving to dbt Cloud didn’t mean scrapping the infrastructure it built around dbt Core—CI/CD processes, dev/test environment, etc. The team brought new workloads onto dbt Cloud but left existing workloads in the Core architecture they’d built. As new analysts onboard to WHOOP, they go straight to dbt Cloud, which unlocks a number of new use cases for the company.
Structuring cross-team work
Using dbt Cloud also allowed different data teams within WHOOP to collaborate more closely together and avoid data silos.
Other teams—such as data engineering and data science—noticed the work the analytics team was doing and how easy it was to work with dbt Cloud. These teams could onboard themselves easily by creating their own dbt Cloud projects and Git repositories. That gave each team its own separate workspace and version history.
To facilitate working with core data assets, the analytics team created a WHOOP Commons dbt project. This project contains company-wide data as well as reusable code, such as macros, that are useful across data projects. Using dbt Mesh, each team can find these common assets via dbt Explorer and reference them from their own projects.
“We don’t have to create workarounds to bring models in from different projects like we would in dbt Core,” Matt said. “It’s all native within the cloud platform and all of our models are transparent. It gives us a lot of visibility into where things are and how they’re being used.”
Managing production releases with dbt Cloud
WHOOP says that another area where dbt Cloud brought added value was by enabling greater deliberation and quality in its production release processes.
At WHOOP, software development teams don’t push changes to production on Fridays. If something breaks, that means someone’s on the hook to work throughout the weekend to fix it. In this same vein, the analytics team wanted a deliberate release process where it could understand the full impact of a change before they pushed it live.
To accomplish this, Matt and his team created a process where they fork their dbt code off of production every Friday. Engineers accumulate changes in a QA branch.
Later in the week, an analytics engineer compiles these changes into a release and obtains code owner review. All changes also have accompanying unit tests that run against both test and, eventually, production data. This ensures that metrics that shouldn’t be changing aren’t changing, and that code changes are doing what they were intended to.
Communication is also a critical part of the release strategy. Matt and the team publish release notes in a Slack channel—what metrics might be changing as a result of code changes, which tables are being added or removed, etc.
“This holds us accountable while also giving visibility to the organization,” Matt said. “This is all possible because of the dbt ecosystem and the developer framework it provides.”
As a result, Matt said, the analytics team has reduced the number of accidental errors pushed to production to essential zero. “We’re really proud that we don’t push bad code out. It helps me sleep better at night.”
Conclusion
As WHOOP continues its data journey, Matt said his team is looking at leveraging other critical features of dbt Cloud. In particular, they’re looking at how dbt Semantic Layer can better prepare them to support AI use cases.
“I foresee that being a huge player in the world of AI and natural language chatbots. If you’ve thought out the semantics of these models really thoroughly and accurately in a way that a machine can understand them, it's only a matter of time before we can have conversations with this level of data. By having all of the metadata in dbt in a way that's discoverable and consistently laid out, it sets us up for a world where that's a very real near-term possibility.”
Learn more about how dbt Cloud can improve data discoverability, enable data governance, and prepare you for an AI future—contact us today for a demo.
Last modified on: Sep 27, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.