dbt
Blog DataOps: How to get started

DataOps: How to get started

DevOps emerged years ago as a methodology that aimed to break down the silos that had grown between software engineers and IT engineers. The goal was to accelerate the pace of software deployments through shared ownership and automation.

DevOps is now recognized across the software development industry as a success story. However, the data industry has remained far behind our software development counterparts in embracing a DevOps mindset. Too often, the people who create data pipelines and data models and those who consume that data don’t have the tools or visibility needed to scalably collaborate with velocity. This results in time-consuming and expensive rework that delays the shipment of new data products and impairs trust in data and data teams.

The good news is, it doesn’t have to be this way. In this article, we’ll look at the DataOps development pattern, how it’s patterned after DevOps, the value it offers, and some of the tools you can use to implement it.

What is DataOps?

DataOps is a framework for managing data that removes silos between data producers (the creators of data products), and data consumers (the users of data products). In DataOps, data producers work closely with data consumers in short, rapid deployment cycles to design, develop, deploy, observe, and maintain new data products that align closely with data consumers’ evolving needs and business goals.

DataOps recognizes that a successful data project requires the joint expertise of both data producers and data consumers. Data producers are experts in data technology - integrating and transforming data, storing data efficiently, securing access, etc. Data consumers, on the other hand, are experts in what they need from the data and how they can use it to execute winning business strategies.

For example, assume that a Finance team needs a new data set to drive reports tracking sales trends. In a pre-DataOps world, they might log a request to a data engineering team, delivering a final product based on the limited information in the support ticket. The Finance team discovers that the data is missing important inputs or parameters, or varies significantly from the metrics in a similar data pull last month, so they don't trust it or use it and they shoot the request back for fixes.

This repeats over days or weeks. Meanwhile, the Finance team lacks the reporting it needs to drive key business decisions.

In a DataOps approach, the Finance team and data engineers would meet to discuss the Finance team’s needs in detail. This would include the data required, its shape, the format and correct calculation of fields, allowable values, and importantly, how they intend to use the data outputs to make strategic decisions.

The data engineering team would then develop a data product that pulls in all the correct sources, models the data appropriately, and tests and documents the new data models. As in DevOps, the team would use automated deployment tooling such as CICD to verify, test, and release the new data product to the Finance team.

Benefits of DataOps

There are many benefits to a DataOps approach—for both data product development and the organization as a whole.

Development value of DataOps. Like DevOps, DataOps accelerates data development timelines. It does this by focusing on business value from the beginning, aligning teams from across different domains. It also gives data teams the tooling and resources they need to translate raw data into actionable insights in a way that's automated, modular, and tested. It streamlines development cycles by scoping them into smaller cycles that deliver value with every release.

Organizational value of DataOps. Organizationally, DataOps erodes the silos between data teams and their business stakeholders. This increases collaboration and knowledge sharing, resulting in a better final product and more strategic business decision-making. The rapid releases and automated deployments associated with DataOps reduce bottlenecks, delivering more business value in less time.

How to implement DataOps

So, how does DataOps work? We can think of it as encompassing five phases:

  • Plan
  • Build
  • Deploy
  • Monitor
  • Catalog

As with DataOps, these phases shouldn’t be considered long, drawn-out projects that take months from conception to completion. Instead, teams often work within an agile development framework, defining a short timespan of work (organized into “sprints”) at the end of which they deliver something of value to stakeholders. The process repeats, with each new iteration adding additional functionality and fixes in response to stakeholder feedback.

Plan

In the planning phase, the data team works with stakeholders to understand what they need, the format in which they need it, how quickly they need it updated, where the data will be sourced from, etc. It’s at this stage that both teams also set various Key Performance Indicators (KPIs) and Service Level Agreements (SLAs) around things such as data quality, data freshness, query performance, etc.

Build

In the build phase, the data team creates the data sets they’ll deliver to stakeholders. This involves building:

  • Data models
  • Data transformations
  • Tests to gauge and certify quality
  • Documentation describing the data, its business purpose, and how it’s calculated

Making tests and documentation a part of the build phase is a hallmark of DevOps and DataOps. Tests ensure that data meets business requirements as well as KPIs and SLAs around quality and performance. Documentation ensures that stakeholders know precisely what business purpose a model serves, where its data comes from, and how it’s calculated. This makes data quality an integral part of the development process rather than an afterthought.

Deploy

In the deploy phase, the data teams push their data set changes out of local development and through a series of environments, testing their changes at each stage to ensure they behave as expected. The goal is to ensure that every change is rigorously tested and validated before it’s pushed to production and made available to data consumers.

For example, a team may release a change to a Staging environment, where an automated process runs the data team’s tests on sample data and collects metrics on data quality and query performance. If all tests and metrics checks pass, the process may push to Production, where the team will run the same tests on real-world data.

The combination of automation and version control means that teams can both deploy and roll back changes as needed. If a team identifies issues, say, with their Version 2 release in production, they can roll the deployment back to Version 1 easily. This enables stakeholders to continue using the system while the engineering team addresses critical issues.

Monitor

Once deployed into production, data consumers can self-service access to the data, using it in their reports and data products. During this time, the data team will continue to observe metrics, logs, and traces as data updates flow in, responding to any identified data anomalies or performance issues.

Catalog

All of these workflows create extensive metadata that gives stakeholders insight into how data products are built, used, optimized, and debugged. This metadata can be visualized and explored in a catalog, where the data team ensures that its work is discoverable and documented. This might entail:

  • Making data sets available for discovery in a data catalog
  • Generating data lineage charts to document a data set's sources, dependencies, owners, and relationships to other data sets
  • Publishing data models and documentation so other teams can consume their work

Cataloging helps reduce data silos and redundant data transformation work. Before embarking on a new data project, a team can search a catalog to discover if another team has already created a high-quality data that they can build.

DataOps with dbt

At dbt, we’ve long believed in and supported DataOps as a methodology. That’s why dbt supports DataOps through every step, making it easier to run your business on trusted data.

adlc loop

Plan: dbt Mesh

dbt Cloud offers several built-in features that help users plan and align their data projects:

  • dbt Mesh: dbt Mesh enables companies to implement a data mesh architecture - a decentralized, scalable approach to data management. With dbt Mesh, data teams can design their data workflow architecture to support the unique needs of their downstream stakeholders with domain-specific data, and do so in a governed, automated, simplified way. This enables them to work independently with other teams without sacrificing collaboration, governance, or security.
  • SLAs and data freshness: dbt also provides helpful interfaces for source data freshness calculations. These interfaces are designed to help users determine if source data freshness is meeting pre-defined SLAs.

Build: Various development environments

dbt gained popularity as a tool that allowed developers to build analytics code using SQL. Given the varied skills and preferences of data collaborators within an organization, dbt supports many environments for authoring analytics code:

  • Cloud IDE: dbt Cloud integrated developer environment (IDE), a web-hosted IDE that includes SQL syntax highlighting, auto-completing, code linting, documentation, and build/test/run controls to run and debug work on demand.
  • CLI: Developers can build analytics code directly in their preferred CLI and use dbt command line tools to write, run, and debug model changes.
  • Visual editor: Soon, less SQL-savvy analysts will be able to create or edit dbt models through a visual, drag-and-drop experience inside dbt Cloud. These models compile directly to SQL and are indistinguishable from other dbt models in your projects: they are version-controlled, can be accessed across projects in a dbt Mesh, and integrate with dbt Explorer and the Cloud IDE. As part of this visual development experience, users can also use built-in AI for custom code generation where the need arises.

Deploy: Scheduling jobs, version control and CI/CD

dbt Cloud includes an in-app job scheduler to automate how and when you execute dbt jobs. To improve development feedback loops and optimize data platform consumption, users can also “defer to production” for any job run, meaning that when they want to run and test changes to a single model, dbt will build only that changed model (and defer any upstream dependencies to what’s already in prod).

A major key to implementing DataOps is tracking changes to work. Without proper change tracking and control, a rogue alteration to source code that causes an error in production could take hours or more to hunt down and fix. dbt Labs' automated testing and version control streamline data pipeline automation, ensuring teams spend less time on manual processes.

dbt supports version control via Git so that every change made to a model is committed, documented, and - if needed - reversible. Using Git, data team members keep their versions of dbt models for development. When they’re ready to commit changes, they create a pull request (PR) that one or more other members of the team can review. This ensures every change receives a second set of eyes before heading towards prod.

You can also configure dbt Cloud with Continuous Integration (CI) jobs to push changes from dev through prod. Once a PR is approved, it triggers a CI job, which runs and tests models in a staging (pre-production) environment before moving them to production.

Once a set of changes is verified in staging, dbt Cloud can push them to production, a process known as Continuous Deployment (CD). This combined CI/CD process automates code integration, testing, and delivery of updates to production, identifying and eliminating potentially costly errors before they’re shipped to data consumers. The result is reduced manual labor for deployments, resulting in accelerated data development cycles. By incorporating CI/CD for data, dbt helps organizations maintain data quality and consistency across deployments.

Monitor: Automated testing

With dbt, data teams can proactively define assertions—called tests—about their data models. These tests can be designed to validate the behavior of model logic before the model is materialized in production (unit tests) or about any assertion you want to make about your model (is unique, is non-null, etc). If a test fails, the model won’t build—saving you from unnecessary data platform spend, while improving data product reliability.

In addition to setting up tests to proactively catch issues, it’s easy to monitor your production dbt jobs and alert the right people when something goes wrong with Slack or email notifications, logs, run history dashboards, data health tiles, and more.

Catalog: dbt Explorer

Both data developers and consumers alike benefit from having an understanding of data dependencies, freshness, use cases, and other relevant contexts. dbt Explorer is an interactive data catalog that represents the metadata created in every dbt Cloud run in an intuitive, visual interface. Using dbt Explorer, consumers can find a data asset and view it in context, complete with its metadata, documentation, and data lineage. Data producers can use dbt Explorer to find reusable data assets, as well as to trace lineage to troubleshoot data issues resulting from upstream data defects.

The nice thing is that, rather than an afterthought in the DataOps process, the assets viewable by dbt Explorer - models, metadata, documentation, data lineage, security controls, etc. - are all automatically generated during the data development process itself. Updates are pushed automatically to dbt Explorer with every push to production. With dbt, teams can implement collaborative data workflows that reduce bottlenecks and empower faster decision-making across departments.

Conclusion

As a methodology, DataOps can eliminate barriers between data producers and data consumers, resulting in faster data development cycles and higher-quality data. Using dbt, data teams and shareholders can make the DataOps culture part of their daily workflows, building a data framework that combines speed and governance with distributed ownership.

See dbt Cloud in action and learn how it can support your DataOps journey—try dbt Cloud free today (no credit card required).

Last modified on: Mar 03, 2025

Early Bird pricing is live for Coalesce 2025

Save $1,100 when you register early for the ultimate data event of the year. Coalesce 2025 brings together thousands of data practitioners to connect, learn, and grow—don’t miss your chance to join them.

Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.

Read now

Recent Posts

Great data professionals never work alone

Every industry leader understands one thing: you need the right network to grow. The dbt Community connects you with 100,000+ data professionals—people who share your challenges, insights, and ambitions.

If you’re looking for trusted advice, expert discussions, and real career growth, this is the place for you.

Solve your toughest challenges

Join today and get real-world advice from experienced pros.

Expand your network

Foster connections with meetups, local groups, and like-minded peers.

Advance your career

The dbt community is full of learning opportunities and shared job postings.