Today, at Coalesce 2024, we announced advanced CI: a powerful enhancement to our existing continuous integration capabilities that allows data teams to validate and compare changes before merging them into production. Advanced CI builds upon our slim CI feature by offering deeper insights into the differences between development and production builds, ensuring data integrity and accuracy without driving up unnecessary compute. Advanced CI is generally available today to all dbt Cloud Enterprise customers.
Improve data quality while promoting velocity with CI
In data-driven organizations, business-critical decisions and insights rely on data products built on high-quality data. To deliver data products at scale, having a robust and automated CI/CD pipeline is not simply a nice-to-have, but a must-have. CI/CD pipelines allow teams to integrate and deploy changes more frequently, while ensuring that the code and data models are continuously tested, validated, and safely deployed into production environments. Simply put: by embracing CI/CD, data teams are able to ship trusted data products faster.
Without a proper CI/CD pipeline, even small changes to data sets or transformations could introduce undetected errors that lead to costly downstream failures in reports, dashboards, or anything that consumes data from the warehouse. For data teams, it isn’t just about faster releases; it’s about trust. Trust that every new commit will integrate seamlessly with the current state of production, and trust that changes won’t break critical data workflows and downstream tools.
Historically, dbt Cloud has contributed to this trust through slim CI, a feature designed to catch errors early in the process by running only the models that have changed in a PR. This optimizes compute costs while ensuring that models build correctly. Slim CI doesn't just check for broken code; it verifies that your changed models will build correctly, providing you with the confidence that your models will always build in production without having to risk breaking production to find that out.
With these CI jobs, users can accelerate development velocity with build and test automation and create a standardized and governed way to deliver code. While slim CI ensures your code integrates smoothly with production, simply knowing your code changes will build doesn’t mean that your code changes are actually correct. One small change can alter the value of an entire column, even if all models build and all user-defined tests pass. Users need a way to validate that the code they’re merging is doing what they expect it to do. This is exactly the problem that dbt Cloud’s new advanced CI feature addresses.
What is advanced CI?
Advanced CI builds on top of slim CI with a "compare changes" feature to provide a deeper level of insight into the changes introduced with each pull request. By surfacing the differences between what is being built and what is already in production, users can ensure that they only merge accurate models into production.
For each CI run, dbt Cloud compares the models built from your development branch against the latest production build, surfacing differences such as:
- Rows or columns that have been added, modified or removed
- Changes in the values within a column
- Changes to column data types or column orders
- Changes or duplicates in any of the primary keys
- What percent of rows have been changed, added, or removed relative to the entire data model
This gives engineers the ability to pinpoint exactly how their changes will impact data models and reports before changes are merged into production. If a column’s values have shifted, or if there are unexpected nulls, users will know before that data becomes accessible to end users or downstream systems. This granular view of changes builds confidence that the data is accurate and trustworthy before it's exposed to end users.
By temporarily caching this changed snapshot within dbt Cloud, advanced CI executes these comparisons seamlessly as part of the CI job workflow, where they can be reviewed as part of the CI/CD process without rerunning every time. This can be especially useful when working with complex transformations, where even small changes can have a domino effect on downstream data quality.
Why advanced CI matters for data practitioners
As a data practitioner, advanced CI gives you the confidence that every PR you merge into production will not only build, but that it will generate the correct changes you intended for the business. By utilizing advanced CI as part of your data quality workflow, you get:
Enhanced data quality
Say you have just modified a model that feeds into a downstream reporting dashboard. Even if the model builds successfully, what if a join condition introduces nulls into a critical column? Advanced CI surfaces this change in both a git comment and the run details, reducing the risk of bad data reaching production.
Greater developer velocity
Advanced CI eliminates the need for manual testing where teams might have previously reviewed row-level differences themselves. This enables your team to focus on building models and implementing new features, rather than spending that time debugging or resolving production issues after the fact.
Cost efficiency
Data issues caught early in the development cycle are far cheaper to resolve that firefighting in production. With advanced CI, engineers can proactively catch breaking changes, reducing the cost of reactive troubleshooting, reruns, or (gasp!) data downtime. This is particularly useful as your data ecosystem continues to grow, making it increasingly challenging to catch every potential error manually.
The future of CI in dbt Cloud
The question that guides our evolving product strategy is always: How can we give our users more confidence in their data quality? Future improvements we are currently investigating include:
- Downstream dashboard impact analysis via auto-exposures (How can I know what's at stake by better understanding where the data is used downstream?)
- AI-powered PR reviews (How can we do a better job of empowering users to improve both their code and data development process at scale?)
- Data quality monitoring (How can we detect and alert data practitioners to issues with the data, so that they are always the first ones to know?)
Getting started
Advanced CI is now generally available for all dbt Cloud Enterprise customers as an opt-in feature. To learn more check out the docs.
Last modified on: Oct 08, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.