When it comes to cloud-based data stacks, you may have heard people use the terms “data product” and “data as a product” interchangeably. In reality, they represent different concepts: one refers to what you’re developing, while the other refers to the development paradigm itself. In this article, we’ll look at the difference between these two terms and how each contributes to high-quality, rapid data development.
What is a data product?
A data product is a data asset, or delivered unit of data, that solves a specific business problem. It can be a database table, a report, or even a machine learning model. Whatever the deliverable, a data product is the output of a data producer that is used or incorporated by a data consumer.
You may wonder: if a data product is a database table, report, etc., then aren’t we already creating data products? After all, all teams already produce these data deliverables.
Maybe, maybe not. A data product is more than just a delivered unit of data. It has other properties, context, and attributes that make it easier to find, use, and manage over time than a table, report, or model.
Data products provide three key advantages over regular data deliverables: discoverability, access control, and backward compatibility. Used effectively, data products can increase data use, drive revenue and business value, increase data security, and reduce costly data quality issues.
To achieve these advantages, a data product needs the following properties:
Discoverable. Other teams in an organization should be able to search for and find a data product easily (e.g., by using a data catalog). This avoids data being unused and becoming “dark data” - data that sits dormant, consuming computing costs while generating zero business value.
Addressable. Each data product should have a unique identifier that enables other teams to find and consume it. This could be a data connection plus database/table combination for a table in a database, an S3 URI to a Parquet file in an Amazon S3 object storage bucket, or an HTTP URL to a Looker report.
Trustworthy and observable. Other teams can inspect a data product and see its data origins, when its sources were last updated, and what transformations the team applied to calculate the current data product’s fields. This visibility into data lineage gives data consumers confidence in the correctness and validity of the data product.
Self-describing. A data product contains metadata describing what business purpose it serves, what its fields mean, and how the data was calculated. It also expressesthe versions of the data product that are available to ensure backward compatibility with existing consumers.
Interoperable. Data products provide some means—whetherSQL compatibility, an API, a defined file format, etc.—by which consumers can extract the data and incorporate it into their own data products.
Secure and governed. Finally, a data product defines mechanisms for controlling access to the data it contains, auditing data access, and identifying sensitive information that requires special handling to ensure regulatory compliance.
What is data as a product?
“Data as a product,” by contrast, is a mindset or approach that applies product-like thinking to a dataset. In other words, it ensures that a dataset has all the properties of discoverability, accessibility, self-description, and so on. Furthermore, it fosters thinking about “data product releases” much like how software developers approach software releases - i.e., as discrete, shipped products with distinct versions.
Thinking of data as a product means applying several concepts from service-oriented architecture to our datasets:
Product-like management. Data producers work with stakeholders to understand their requirements. The team uses these insights and feedback to create a backlog of prioritized development items to address across multiple releases.
Interfaces and contracts. To help consumers manage access to different versions of the data product, data producers create interfaces that specify the exact structure—tables, fields, types—of a given version. They also provide contracts that consumers can use to generate code and validate data quality.
Versioning. When a data producer needs to make a breaking change to a data product - e.g., removing a field, changing a data type - it creates a new version of the contract. It continues to support the previous version for a defined time period, allowing consumers to move any code or reports they have over to the new version.
Access rules for data. Data as a product thinking means data producers consider who requires what level of access to their product, and implementing appropriate access controls becomes a required component of every release.
Organizational benefits of treating data as a product
Data as a product has a dual impact: it improves the velocity and quality of data development and the value of data for the entire organization.
Benefits to data development
Organizes data team operations. With documentation and version control embedded in each data product release, data teams have a standardized way of keeping track of their work, troubleshooting issues, and supporting streamlined dataflows at scale.
Connects development priorities with business needs. As with product development, with these standards in place, data development can become more proactive rather than reactive. New versions can focus on addressing a variety of needs across various stakeholders versus fulfilling repeated - and sometimes contradictory - one-off requests.
Enables collective building. Data teams can build more quickly with data products because they can discover and reuse the work done by other teams. This accelerates new data product development and reduces waste and inaccuracies introduced through duplicative work.
Benefits to the organization
Enables self-service. Since data products contain all the documentation, metadata, and contracts required to consume them effectively, they remove many traditional barriers to using data. Instead of requesting a custom data pipeline through the data engineering team, data consumers can leverage the data product they need to address their specific needs, whether that’s building data-driven apps, generating new reports for business analysis, or activating data in a downstream tool.
Breaks down data silos. Without a well-defined approach to exposing data to the organization, much of its data goes unused. Data products establish a set of criteria - discoverability, addressability, security, and the rest - that streamlines finding and using data across disparate teams.
Aligns data workflows to business initiatives. By breaking down data silos and enabling data self-service, data teams help ensure that data-driven initiatives tie in with larger organizational goals.
Improves security and privacy. By defining what a data product is, organizations also ensure each product takes a consistent, centralized approach to data access and compliance. This approach combines the speed and agility of bottom-up data-driven development originating at the team level with the benefits of a top-down, consistent approach to governance.
Tools for treating data as a product
Many organizations can shift to a “data as a product” mindset using many of the data tools they already have in play. However, creating a true data product framework may require adding a few additional items to the toolbox. In particular, teams will need a way to:
- Find and reuse existing data, no matter where it lives
- Initialize new data products and create data product contracts
- Transform data from its raw form into its final, productized form
A robust data platform architecture, and tools such as a data catalog, can help with data organization and discovery. Data platforms that store organizational data in raw and transformed formats enable easier discovery and give authorized users access to the original, unfiltered data. A data catalog provides a single source of truth for discovering data no matter where in the organization it lives.
Data transformation tools like dbt Cloud can also enable finding data, understanding its lineage, and creating new data products. Using dbt Cloud, teams can:
- Create data models that import data from various sources
- Create tests to verify data quality
- Auto-generate documentation for data producers and consumers alike with detailed metadata on lineage, freshness, and more
- Define model contracts that specify a data product’s version and data guarantees
- Enable teams to discover other data products created and published across the org
Conclusion
“Data product” and “data as a product” are related concepts. “Data as a product” is a mindset shift that requires re-thinking the way we design, publish, and consume the output of data-driven workstreams. It’s not an overnight shift. But when done right, the result is the creation of an ecosystem of data products that make working with data easier and more impactful for both technical and business stakeholders.
dbt Cloud gives you the tools required to shift over to treating data as a product. On average, dbt customers realized a 194% return on investment from this shift. Contact us today to learn more about how dbt Cloud can help your organization kickstart its data product journey.
Last modified on: Oct 15, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.