dbt
Blog How to build a data product

How to build a data product

Sep 08, 2023

Learn

Treating data as a product is a new development philosophy that treats data more like software: a defined, versioned code block with clearly defined ownership, purpose, and documentation.

Using a data product model, data teams can improve the quality, reliability and clarity of their data operations. This model’s popularity has grown as more teams have found the approach makes it easier for teams to design, develop, and ship new data offerings.

In this blog post, we’ll examine the development cycle for data products, understand the process's pain points, and look at how you can overcome those obstacles.

What is a data product?

A data product is a data container or unit of data that directly solves a customer or business problem. Data products have several properties that facilitate their business value:

  • Discoverable: Are easy to find, e.g., registered in a central catalog so that teams can easily browse available products. Discoverability solves the problem of data existing throughout a company in silos and going unused.
  • Addressable: Possess uniquely labeled locations so that they can be quickly retrieved.
  • Self-describing: Accompanied by metadata that describes them. This usually takes the form of documentation in a machine-parsable text format (e.g., JSON, YAML) that defines properties such as the data’s owner, when it was last updated, what purpose it services, what its constituent fields mean, and how its data was calculated.
  • Interoperable: Offer mechanisms for connecting with other products. This can take the form of an API or a data interchange format that other data products can use to consume data from or provide data to the product.
  • Trustworthy & truthful: Provide information on data ownership, origin, testing, etc., so that users can trust in the quality of the data
  • Secure & governed: Governed according to globally defined rules and standards to ensure security and compliance. Global discoverability aids in governance by ensuring all data products are searchable from a central location.

Some examples of data products are:

  • Tables: A simple table can itself be a data product. The table will usually require other attributes to accompany it so that data product consumers can find the underlying data, examine different versions, and receive notifications when new versions are released.
  • Reports: A report is a readily-usable data product that can provide immediate business value to its users. As with a table, you can turn an ordinary report into a data product via tooling that implements features such as discoverability, addressing, versioning, and security.
  • dbt models: the core object of a dbt model is a cleaned table. A model stores the query logic and tracks its connection to other models. Tools like dbt explorer or third-party data catalogs allow models to be tagged and cataloged, making them discoverable as products.
  • Machine learning models: the core object of a machine learning model is its training set. The ML model translates the training set into meaningful correspondence, e.g., a classification, an embedding, etc. Building an API makes the model accessible to teams and users.
  • Metrics: the core object of a metric is reporting data. The metric aggregates that data into meaningful numbers. Tools like dbt’s Semantic Layer document the underlying data and its meaning while providing easy access to existing metrics.

The dbt Semantic Layer and notebook tools

Organizations can streamline the data product development cycle with the dbt Semantic Layer and a notebook tool, unlocking the full potential of minimal viable data product development.

The dbt Semantic Layer

The dbt Semantic Layer allows a data team to configure metrics and store them centrally where any team can access them (give or take access permissions for privacy and security) using queries. With the Semantic Layer, organizations can ensure that all their teams are on the same page with standardized, centralized definitions and governance.

The dbt Semantic Layer lets data teams define metrics on top of models using readable YAML files. MetricFlow technology builds SQL queries on the backend that compute the metrics that you specify in YAML. This definition architecture makes it easy for dbt to catalog metrics as they are defined and track the relationships between models and metrics.

Regarding development, when a team gets feedback on a data product, the Semantic Layer lets them quickly respond since they can find, adjust, and recombine metrics without getting into the SQL weeds.

Non-data teams can also find and read the YAML without extensive technical training. This enables effective, meaningful collaboration and feedback. When everyone can understand the definitions involved, teams can self-serve the metrics they want. That avoids ad-hoc query development and prevents inconsistencies in metrics developed by different teams.

Notebook tools

Notebook tools like deepnote and hex further open up data product development, enabling users and developers to work together. With a notebook, users aren’t just interfacing with a black box. Instead, they can see the entire system with which they are working. The MVP data development cycle becomes:

  1. Identify a user need
  2. Build a minimal product in a notebook
  3. Share notebook with users and collaborators
  4. Collect detailed feedback and make specific adjustments to the metrics & models of the product
  5. Users can query the product metrics from the Semantic Layer (even within a notebook), further fine-tuning the results they want from the product.

Instead of a development cycle that puts users and developers at odds, the Semantic Layer and notebook-based development create a virtuous cycle that encourages data literacy and enables self-service. These tools provide a streamlined data development process that allows data teams to work like software teams, accelerating the pace of new data products and freeing up time to focus on maintenance, governance, security, and other critical data systems.

Building data products with dbt

The data product mindset is a robust architecture for accelerating data development. However, implementing it takes tooling and time. The dbt Semantic Layer and notebook tools accelerate the transparency and accessibility of data product development, enabling collaboration and self-service.

Want to learn more about data product development? Check out this coalesce presentation.

Last modified on: Oct 15, 2024

Build trust in data
Deliver data faster
Optimize platform costs

Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.

Read now ›

Recent Posts