The Analytics Development Lifecycle: Plan
Dec 21, 2024
LearnAre all your data assets versioned, tested, and easy to support and maintain? For most companies that work with data, the answer is “no.”
We have a plethora of tools for managing data. What’s needed is an analytics practice—not just a set of tools, but a workflow that enables delivering well-governed data projects that accelerate data delivery, tune data quality, and optimize compute costs.
dbt Labs advocates an approach we call the Analytics Development Lifecycle (ADLC). In this article, we’ll look at the first stage of the ADLC, Plan, examining what exactly it entails in an analytics workflow and how doing it well helps you deliver better data, faster.
The key to planning in the ADLC
The ADLC works similarly to the DevOps process in software engineering. It’s a rapid and highly iterative process you repeat for each change you make to your data system.
In other words, planning in the ADLC isn’t a “once and done” activity. It’s also not an endless process that mires you in requirements hell, preventing progress. Rather, it’s a succinct phase you use to ensure what you’re delivering conforms to what your data stakeholders need. This prevents expensive rework down the line.
The length of the planning phase will be variable, depending on the size of the change. Every planning phase, no matter its length, should deliver these key benefits:
- Gets everyone on the same page. In other words, identifying all major stakeholders—engineers, analysts, business users, decision-makers—up front and ensuring everyone’s understanding of the business case, key metrics for the project, etc. are in alignment
- Identifies tooling needs early
- Estimates resources more accurately
- Ensures security by making discussions of data visibility, access rights, compliance, etc. part of the process from the beginning, instead of tacking them on later as an afterthought
Done well, ADLC planning ensures a high-quality and secure data product delivered with minimal rework.
Stages of the planning phase
There is no one-size-fits-all approach to the Plan phase. Every team needs to implement a version of the process that meets their business needs and meshes with their organizational culture.
Similarly, there's no single planning process that works out of the box for everyone. However, the following stages will provide a good jumping-off point for most teams from which to build their own unique process:
- Create and validate the business case
- Create your implementation plan
- Get stakeholder feedback
- Create a test plan
- Anticipate downstream impacts
- Plan for maintenance
- Determine access levels
- Implement larger changes in small pieces
Let’s look at each one of these stages in detail.
Create and validate the business case
In the past, many data changes were driven, not by the business, but by engineering. This means that most new data deliverables were measured mostly in technical terms (e.g, data throughput).
It’s hard to excite business stakeholders with technical metrics. This approach also divorces data projects from the company’s larger business objectives. That risks shipping data projects that no one ever uses.
To avoid this, all data changes should be based around a business case—i.e., what the change does, who it’s for, and the quantifiable benefit you expect to see. They should further tie back to business Key Performance Indicators (KPIs) or Objectives and Key Results (OKRs). For example, instead of focusing on improving database performance, focus on how that helped the team reduce average resolution time by 25%, improving customer satisfaction scores.
Not every minor change needs to go through this process. Your team should define a threshold of work above which a change needs a solid business case before proceeding.
Create your implementation plan
Once everyone’s agreed on the business plan and the need for the work, identify how you’ll put your proposal into action. This includes where you obtain your data, the inputs and outputs of the data product, and what code and architectural assets you’ll need to accomplish it.
A key part of implementation planning is identifying what you need to build versus what you can reuse. Wherever possible, aim to adhere to the DRY (Don’t Repeat Yourself) principle. Find a way to leverage existing code and also make your work available to others who might need it—e.g., by using packages to promote reuse.
Get stakeholder feedback
Once you have an implementation plan, run it by your stakeholders for final approval. Use whatever format—Slack, email, a recorded meeting, a ticketing system, etc. —to capture approval and address outstanding issues.
To prevent the planning phase from drawing out, set deadlines for accepting feedback. Once all feedback is in, make any changes and proceed to another round of sign-offs, if necessary.
Create a test plan
Testing is a critical part of any data change. It ensures you get the outputs you expect from your data transformations relative to the process’s inputs.
Good data tests should ensure your data transformation code works for both normally accepted inputs and fails gracefully on edge cases or unexpected inputs (e.g., null values for required fields, malformed text strings).
Good testing also includes running tests against pre-production (historical or mock) data so you can validate its functionality before making it live for users. Identify any data sets you need for testing environments as part of your planning process.
Anticipate downstream impacts
A common problem in the data world is breaking changes that impact data consumers you may never even know existed.
One day, your data engineering team changes the format of a text field or eliminates a column from a table. The next day, a critical forecasting report on which the sales team depends fails to refresh hours before an all-hands meeting.
If you’re making changes to existing models, perform an impact analysis of your changes before you exit the planning phase. This involves using data lineage to discover what other downstream data products, reports, and applications depend on your work.
Once you’ve identified your consumers, notify them of the intended change so you can work together on a migration plan. This may involve asking them to update their reports after release or versioning your data models to give consumers time to transition.
Plan for maintenance
In the past, most data transformations involved exporting data to a CSV file and wrangling it in a spreadsheet. Much of this work was disposable—no one cared if a spreadsheet formula broke a month or two after delivery.
A high-quality, reliable data system is different. Any change you make to data is a commitment to future users.
Before you exit planning, identify what tests, metrics, and alerts you’ll use to monitor your code’s behavior in production. Identify who will own the data transformation code going forward. If it isn’t you, work with the eventual maintainers to ensure they understand the implementation.
Determine access levels
According to IBM, the average data breach costs a company USD $4.45M. Such threats can come from inside the company just as easily as outside. You need to incorporate security with every release—even if it’s “just” an internal project.
Think early about your data and what you need to do to keep it safe. Which groups or individuals need access? Which should be restricted? (E.g., should vendors have access?) Are you handling sensitive information—customer’s Personally Identifiable Information (PII), company secrets—that requires additional scrutiny and governance?
After identifying your data’s security needs, decide how you’ll enforce them. Tools you might use here include:
- Using project permissions and role-based access control (RBAC) to grant or deny access automatically
- Establishing a process to administrate access requests for sensitive data
- Applying data classifications to your tables and columns so that you can identify and remove customer data as needed per regulations such as GDPR
Implement larger changes in small pieces
Finally, if you find your change becoming too large and unwieldy, consider breaking it up into multiple releases. Don’t try to boil the ocean with large changes. Instead, leverage the iterative nature of the ADLC to break it down into smaller, well-tested components.
For example, you may plan a complex change with six different models. Instead of releasing all six simultaneously, develop one or two (along with their tests) and put them through a full develop/test/debug/release cycle. Then, move on to the next model, repeating until you’ve implemented the full business case.
Tackling large changes in multiple releases keeps you from getting bogged down in technical issues. It also lets you get faster and more frequent feedback from your stakeholders.
Conclusion
A good workflow is just one part of an analytics practice—you also need powerful tools to enable it. dbt Cloud is your data control plane, delivering multiple tools that make the ADLC planning process easy to implement:
- Built-in support for defining data transformation models, tests, and shared metrics via the dbt Semantic Layer
- dbt Explorer for finding and leveraging existing data and data transformation code
- Data lineage to see all downstream dependencies, simplifying impact analysis
Learn more about how dbt Cloud can transform how you do data—ask us for a demo today.
Last modified on: Jan 13, 2025
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.