dbt
Blog The Analytics Development Lifecycle: Discover and Analyze

The Analytics Development Lifecycle: Discover and Analyze

Data isn’t any good if your stakeholders can’t find it. Splunk estimates that as much as 55% of a company’s data might be dark data - i.e., data that lies dormant.

In this series, we've been looking at the different stages of the Analytics Development Life Cycle (ADLC), a framework for a mature analytics workflow modeled after similar processes in the software development world.

The last phase, the Discover and Analyze phase, is the phase that delivers business value to data consumers. It's also the phase that we have found to be the one that's often the most immature in many companies today.

Without a solid Discover and Analyze phase, the hard work you've put into developing high-quality data products may go to waste. This phase is also critical for providing and processing feedback, which kicks off another run of the ADLC and fosters a culture of continuous improvement in data.

We'll look at the key attributes of a successful Discover and Analyze phase, and how to tie back the input from this phase into the start of a new loop of the ADLC.

The practices in the Discover and Analyze phase

The ADLC unites Data development and Operations into a single development cycle that emphasizes shipping small analytics code changes with high quality. It breaks down the artificial barriers that have grown between data product development and data management by creating a single process in which all data stakeholders work together to plan, develop, test, deploy, use, and manage data changes.

A successful Discover and Analyze phase enables two key things:

  • Discovering data sets, dashboards, and metrics; and
  • Using these data artifacts to answer questions

These answered questions can then themselves serve as a basis for another round of the ADLC. This is what makes the ADLC a loop —analysts create insights in this phase that data engineers can then productize and make available to a wider audience. The goal is to encourage experimentation and exploration while also promoting maturity and productization of the overall data system.

The key practices in the Discover and Analyze phase include:

  • Discovering and operating on data
  • Leaving feedback
  • Requesting and delegating access
  • Ensuring data is accurate
  • Ignoring implementation details

Let’s look at each of these in detail.

Discovering and operating on data

Data discovery is challenging within a sprawling enterprise. The data that users need is often split over hundreds of thousands of sources. This can make it hard —or impossible — to locate unless users know where to look.

Lack of discoverability is one of the causes of dark data. Dark data costs money —not only from lost business opportunity, but due to the compute, storage, and personnel spend required to transform and maintain it.

After publishing a data set, your stakeholders need a way to find it. A common solution is some form of data registry or repository that can catalog data from all sources across the company.

Tools such as dbt Explorer, part of dbt cloud, provide this out of the box. After publishing a data model to production, stakeholders can search for it via a simple unified interface. They can also find any documentation that accompanies the model.

Leaving feedback

As mentioned above, the ADLC is a loop. That means you need mechanisms for decision-makers to provide feedback to data engineers and analysts.

These feedback mechanisms can take multiple forms. For example, you may hold regular brown bag sessions or productive briefing meetings to gather feedback one on one. You might also provide internal support forms or access to a ticket issuing system where data stakeholders can log feature requests.

A common problem in data analytics in this stage is that most such requests go through a central data engineering team. The team quickly becomes a chokepoint for data issues.

Tools for data discovery and documenting data sets can help with this by providing stakeholders with the ability to self-service answers to specific questions. Additionally, the ADLC emphasizes that roles aren't static, but flexible. Using modern data modeling tools that leverage widely understood technology such as SQL, different people may wear the data engineer, analytics engineer, and decision-maker hats at different times. This means that a wider range of people can develop and update analytics code than ever before.

When developing feedback mechanisms, keep in mind this variability and create processes that allow all applicable stakeholders to capture, track, and act on feedback. Facets of a successful feedback mechanism include:

  • Ensuring requests are routed to the data product owner(s)
  • Setting SLAs around review and action for requests
  • Tracking SLAs to ensure stakeholder feedback is being considered and incorporated into future releases

Requesting and delegating access

To be fair, one reason companies don’t make data more generally available is that not all data should be generally available:

  • Some internal data may contain Personally Identifiable Information (PII) that should only be viewable by a subset of employees under strict protocols
  • Other data may contain Intellectual Property (IP) or other sensitive internal information
  • Many companies will also have to limit exposure of customer data to comply with local data handling laws, such as the General Data Protection Regulation (GDPR) in the European Union

Along with making data discoverable, you need a system for requesting and delegating access. By default, users should only see relevant metadata when conducting a data search. You should also establish mechanisms for requesting and either approving or rejecting access to data.

Role-Based Access Control (RBAC) supports granting access to data based on a user’s business function. Tools such as dbt Cloud support granting access to models using RBAC, simplifying permissions management by managing permission sets via business functions instead of on a person-by-person basis.

Ensuring data is accurate

Just because somebody can find data doesn't mean they can trust it. Data consumers need proof that a data set is both timely and accurate.

Data engineers and analysts can help provide this reassurance to stakeholders through mechanisms such as defining data quality metrics. Tools such as the dbt Semantic Layer can publish these statistics as standardized metrics so they're centrally available to everyone.

Additionally, tools such as dbt's automatically generated column-level data lineage can show the origins of data in a data set. This gives data stakeholders increased confidence in the data as relevance and provenance.

Ignoring implementation details

Finally, many of the products can be hard to use for data consumers without a deeper knowledge of what the data is, where it lives, and its various idiosyncrasies.

A mature analytics process should avoid this by delivering data products that just work. An analyst, for example, shouldn't have to understand the structure of a dozen or more distributed tables to grab data for a quarterly sales report. All of the relevant data and data structures should be documented in a data model and easily accessible to anyone with a basic knowledge of SQL.

How to implement the ADLC

Over time, the ADLC becomes a repeatable process that your teams can use to ship well-scoped data changes with high velocity and high quality. Getting there, however, requires more than just having good processes in place. It requires a data platform that simplifies working with data.

dbt Cloud is your control plane for data that makes supporting the ADLC easy. Its platform features support data developers and their stakeholders across various stages of the analytics development lifecycle to make data analytics a team sport. It also provides the trust signals and observability features required to ensure all data outputs are accurate, governed, and trustworthy.

Want to learn more? Contact us for a demo today.

Last modified on: Jan 28, 2025

Build trust in data
Deliver data faster
Optimize platform costs

Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.

Read now ›

Recent Posts