Analytics engineering has come a long way in the past decade. Gone are the days when we fretted over how to build pipelines or whether we had access to or enough compute to handle our workloads. Cloud data platforms and the ecosystem that's emerged around them have helped make data analytics accessible to organizations large and small. Despite this progress, today, we face a whole new set of challenges—specifically, data quality, data literacy, and ambiguous ownership.
At dbt, we’ve spent a lot of time thinking about how to tackle these modern problems. For us, the answer lies in the Analytics Development Lifecycle (ADLC) coupled with a data control plane.
The ADLC is a vendor-agnostic process that helps teams develop mature analytics workflows using techniques similar to those found in modern software engineering processes. The data control plane is an architectural layer that sits over an end-to-end set of data activities (e.g., integration, access, governance, and protection) to manage and control the holistic behavior of people and processes in the use of distributed, diverse, and dynamic data.
With dbt Cloud as your data control plane, your data teams have a standardized and cost-efficient way to build, test, deploy, and discover analytics code using the ADLC. dbt Cloud also gives data consumers purpose-built interfaces and integrations to self-serve data that is governed and actionable.
We’ll dig into how the ADLC and the data control plane work together—and how to use dbt Cloud to drive both and improve data quality, promote data literacy, and clarify data ownership.
The most pressing data problems today
In our most recent annual State of Analytics Engineering report, we asked over 450 data practitioners many questions about the state of analytics from their point of view. This was one of the most interesting findings from the survey:
The question was: What do you find most challenging when preparing data for analysis?
If this report were conducted in 2014, it wouldn’t be surprising to see building data transformations and constraints on compute resources cited as what was most challenging. Remember: that was a time before cloud data platforms or industry standards for data transformation like dbt. But today, the community’s telling us that data transformation is mostly a solved problem. This is good news for all of us.
But that means it’s time to tackle the next order of problems. Our community is telling us that as data analytics takes off within their companies, new problems emerge downstream: data quality, data ownership, and stakeholder literacy are now the biggest challenges in the industry.
Here's a quick litmus test to prove the point:
- Do your stakeholders always know where to go to find the right dashboard or the right metric? Do their numbers always agree?
- Do upstream changes and data sources ever break your pipelines?
- Do you have defined SLAs for the business—and are you hitting them?
Many readers will feel a little squirmy when answering questions like this about their data workflows. That’s expected and okay, and just proves that we have more work to do as an industry. We’ve solved some big problems. But we’re not done. We should be proud of our progress, but we're still on a journey.
Without clear, confident answers to questions like these, we can't help our business grow revenue, nor can we properly manage costs. And the big initiative that we’re all in service of—which is how to be a trusted strategic partner to the business—remains perpetually out of reach.
What is the Analytics Development Lifecycle?
That raises the question of what the next decade should look like for maturing data analytics practices.
We believe that question has two answers: the first is to adopt the Analytics Development Lifecycle (ADLC) as a cultural and workflow paradigm, and the second is to choose technology solutions that help you do that successfully.
The ADLC is a vendor-agnostic framework for mature analytics workflows. It encourages collaboration among various stakeholders and is designed to help data producers, data consumers, and—ultimately—the business ship and use trusted data products at speed and at scale.
The ADLC promotes eight distinct workflow stages in the analytics development lifecycle: from planning analytics products to building, testing, and deploying them, to operating them in production and ensuring that they're reliable and discoverable.
The ADLC borrows heavily from the Software Development Lifecycle (“SDLC”) popularized in software engineering in the early 2000s. The SDLC helped cross-functional teams work together with more agility, velocity, accuracy, and, ultimately, business impact.
The SDLC sought to erode the bifurcation between the software engineers who built software systems and the IT engineers who operationalized them. Similarly, the ADLC seeks to erode the bifurcation of roles and responsibilities between data builders and data consumers.
The goal is to give all roles a standardized, repeatable framework they can use to work better together. It's high time that analytics professionals adopt a similar framework and rely on vendors who will accelerate and harden data workflows across these stages.
Our CEO Tristan Handy has said, “We believe that implementing the ADLC is the best path to building a mature analytics practice within an organization of any size.” We’ve written a paper going into more detail about why—and how—the ADLC achieves this. We encourage you to read it and learn how you can bring some of these best practices into your organization.
The journey to the data control plane
It’s important to remember that the ADLC isn’t owned by any one company, nor is it an explicit vendor solution. Rather, it’s a vendor-agnostic process and workflow methodology that promotes more mature analytics practices at any scale. We purport that adopting a data control plane like dbt Cloud is the best way to embrace the ADLC.
(At the risk of an unnecessary history lesson) here’s why: A new data stack has emerged since the introduction of cloud-based data platforms over the past decade. The core workflow surrounding those cloud data platforms is the process of bringing data in from varied sources. These can be databases, applications, streaming data, or otherwise.
That raw data is then transformed into clean data models. Finally, those data models are pushed to endpoints for AI, BI, or analytics so teams can understand and visualize trends and make informed business decisions with data.
dbt has sat in the middle of this workflow since our inception over eight years ago. However, in that time, with the increased adoption of modern data systems and the strategic insights they have delivered to the business, we've seen the emergence of a peripheral data ecosystem to solve next-order problems or optimize this core workflow. And that's because people started asking questions like:
- How can we automate these workflows from data sources all the way through to the data consumer?
- How can we get visibility into data pipeline performance and health, troubleshoot issues, and optimize velocity and costs in the process?
- How can we increase data literacy across the company and improve data visibility and trust?
- And how do we build and centralize business logic to ensure that consistent, reliable metrics are powering all reaches of our business?
In response, we saw entire categories spring up for orchestration, data observability, data catalogs, and semantic stores to address these market needs.
This has all been great progress for the maturity of our industry, helping automate workflows and ensure the freshest, highest-quality data is powering the business. But all of those add-on components have a unique and siloed way of surfacing subject-specific metadata with no centralized way to connect it or take holistic action on it.
At the end of the day, your underlying data platform and pipeline become optimized, reliable, and cost-effective when it has context and awareness of all these varied work streams with metadata fragmented across tools, teams, and platforms.
What is a data control plane?
Thinking back to the imperative of data quality, data literacy, and clear data ownership, it’s clear that operating in data silos is sabotaging these goals.
We believe that the solution to all of this is a data control plane.
A data control plane is an abstraction layer that sits across your data stack, unifying capabilities for orchestration, observability, cataloging, semantics, and more. Perhaps more importantly, a data control plane centralizes metadata across your business, giving you a universal view of what's happening in your data estate.
The data control plane provides signals to help you understand if your data is fresh and your platform is cost-optimized. You can also use it to verify that everyone is running from a common understanding of how business metrics are defined.
We believe that a data control plane should support and promote three things:
- It should be flexible and cross-platform to empower distributed teams, helping them avoid vendor lock-in and manage data platform costs.
- It should be collaborative. That means it should make data development more accessible, streamlined, and governed to more types of users.
- Finally, it must produce trustworthy outputs. It should give users the ability to build and automate high-quality data pipelines so that the business can access, understand, and trust the data that they receive.
Weirdly (or not), those three characteristics map right back to what the market is telling us are the biggest challenges to solve:
- The challenge of ambiguous data ownership begets the need to help disparate teams align on a common control plane for accelerating analytics, regardless of what underlying platforms that particular team relies on;
- The challenge of poor stakeholder data literacy demands more governed inroads for data collaboration; and
- The challenge of poor data quality sets the imperative that data teams and stakeholders need a streamlined way to build trustworthy data products.
Fortunately, this isn’t something you need to create from scratch or with vendor-specific tools. You can build it today with dbt Cloud.
dbt Cloud is the data control plane that centralizes your metadata and makes it actionable, so your teams can ship and use trusted data, faster. It’s natively interoperable across various cloud and data platforms, so you’re never locked in. Its platform features support data developers and their stakeholders across various stages of the analytics development lifecycle, turning data analytics into a team sport. And it provides the trust signals and observability features required to ensure all data outputs are accurate, governed, and trustworthy.
Conclusion
A data control plane is a technological solution to the people and process approach laid out with the ADLC. dbt Cloud is a market-leading data control plane designed to help organizations successfully adopt the ADLC.
dbt Cloud works across various cloud and data platform environments. It’s accessible to personas of varying technical backgrounds, providing them with a standardized, unified way to accelerate the Analytics Development Lifecycle.
With dbt Cloud as your data control plane, you can:
- Abstract business logic into a flexible platform: By standardizing on a platform-agnostic control plane, you can stay focused on shipping reliable data products while optimizing spend.
- Standardize on SQL: All transformations are written in SQL (universal language, get more data people involved in transformation workflows) and dependencies and documentation are automatically built.
- Make data quality a habit: Proactively prevent data issues with built-in testing and CI. If an issue does occur, find and fix it quickly with alerts and audit logs, roll changes back easily with version control, and use column-level lineage to identify and resolve the root cause fast.
- Help your teams ship data faster: Reduce bottlenecks and improve productivity with AI-assisted workflows and automated scheduling and orchestration of your end-to-end data pipelines.
To learn more and see dbt Cloud and the data control plane in action, view our recent webinar.
Last modified on: Nov 12, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.