Blog The Analytics Development Lifecycle: Develop

The Analytics Development Lifecycle: Develop

Dec 23, 2024

In the past, data teams and stakeholders didn’t pay much attention to the quality or reusability of analytics code. These days, more teams are realizing how properly written, reviewed, and standardized analytics code contributes to more frequent and higher-quality releases.

Patterned after the Software Development Lifecycle (SDLC), the Analytics Development Lifecycle (ADLC) provides a model for implementing changes to your data system. In our previous installments in this series, we looked at how the ADLC impacts planning. In this article, we’ll see how you can change the way you author analytics code to improve reliability and increase data product velocity.

Stages of the Develop phase

The ADLC provides a framework for a mature analytics workflow. It aims to solve issues with low data code velocity, inaccurate results, and impaired trust that continue to plague too many analytics projects.

Data stakeholders use the Analytics Development Lifecycle to develop and ship small changes to analytics code in a short timeframe, repeating the full process for every significant change to the system. When done well, the ADLC yields better collaboration, scale, velocity, correctness, and governance.

The SDLC is built around a DevOps approach that treats developing and managing software applications as part of the same unified process. Similarly, the ADLC is built around a DataOps model that coordinates the efforts of analytics code developers with the data engineers, analytics engineers, and business analysts who manage and use analytics data.

The Develop phase is where engineers - dedicated data engineers, analytics engineers, or even business stakeholders with technical chops - turn analytics use cases into deployable data products. It’s a critical phase, as it has an outsized impact on the overall quality of the final solution.

An effective Develop phase in the ADLC consists of the following components:

Code first
Adhere to a style guide
Prioritize functionality over performance
Invest in code quality
Use code reviews
Use standards to avoid lock-in

Let’s look at each phase in detail.

Code first

In the past, analytics data changes existed in a mix of Excel macros, SQL scripts, programmatic code, stored procedures, and visual tools. Most of these existed only on engineer’s laptops and were run by hand, with engineers performing tweaks and corrections as needed.

These processes weren’t repeatable or discoverable. They often didn’t result in high-quality or fast releases, as the knowledge needed to run them resided in someone’s head.

In the ADLC, all business logic impacting data should be captured in code. All code should be:

Editable by multiple people with different tools - programmatic text editors, Integrated Developer Environments (IDEs), etc.
Checked into version control systems to enable collaboration and prevent conflicts
Broken down into composable units for reuse
Deployable via an automated process, a.k.a. CI/CD

This code first approach ensures that all analytics code changes are:

Discoverable: Others can find and modify the code as needed
Traceable: Data stakeholders can see when a change was made and who made it - and revert it if necessary
Reusable: Other data stakeholders can find and reuse general-purpose solutions in their projects
Repeatable: The same process can be used to develop and ship any analytics code changes
Tool agnostic: Analytics code developers can use any development tools that fit their workflow

Developing a code first strategy can take time and effort to create, configure, and deploy across an organization. Using a platform like dbt Cloud, which is built from the ground up on a code first philosophy, provides the necessary tooling and infrastructure out-of-the-box, shortening the time required to transition to a mature approach to developing analytics code.

Adhere to a style guide

Putting all code in common version control repositories makes it easier to find and maintain. However, code can be hard to read and maintain if everyone’s using different coding conventions.

A style guide provides consistency in code formatting and conventions (names of variables, use of whitespace, etc.) across everyone who touches analytics code. That makes code easier to read - which makes it easier for those who didn’t write it to pick up and maintain.

You can also enforce standardization automatically via mechanisms such as linting code - e.g., running sql-lint on all SQL code. Tools like the dbt Cloud IDE can assist standardization with features such as syntax highlighting for SQL, code formatting, and linting.

Prioritize functionality over performance

Premature optimization, said Sir Tony Hoare, is the root of all programming evil. In the initial stages of coding, engineers should focus on implementing their business use case versus fine-tuning for optimal performance.

This doesn’t mean engineers shouldn’t consider performance at all. The design of a data solution will have the largest impact on how quickly it runs.

It means, instead, not chasing small efficiencies at the project’s outset. That time’s better spent on iterating with stakeholders to ensure the solution fits their requirements. In Hoare’s words, “We should forget about small efficiencies, say about 97% of the time.”

Encourage engineers to focus on the requirements first. Then, at the tail end of the project, the y can implement the final tweaks needed to get the most out of the system at scale.

Invest in code quality

Quality analytics code requires a process that builds quality into the entire development lifecycle. Part of that is keeping code clean and maintainable.

A few key practices here include:

Write DRY code. DRY, or Don’t Repeat Yourself, is the principle of factoring out common code to reusable modules. This prevents you from duplicating code unnecessarily, which can inject defects. It also enables teams to work more quickly, as they can use tried-and-tested procedures for common operations rather than develop new code from scratch.
Define common metrics. In analytics, creating common metrics is an indirect form of reusability where you centrally define critical business metrics, such as revenue, that others can leverage in their own solutions. Tools like the dbt Semantic Layer simplify defining and deploying centralized metrics.
Write documentation and in-line comments. Use a tool like dbt Cloud that supports writing documentation in code to document method parameters, field definitions, assumptions, and other important aspects of your analytics implementation.

Any authoring platform you adopt for implementing the ADLC should have a mechanism for defining and sharing reusable code. For example, dbt Cloud supports defining packages - standalone dbt projects with models, macros, and dependencies that other teams can reference from their own dbt projects.

Perform code reviews

Code reviews ensure that every proposed analytics code change is seen by a second set of eyes. Code reviews have been shown to provide multiple benefits:

Reduces defects in shipped code. In one study cited in the classic software engineering book Code Complete, introducing code reviews reduced errors in one-line maintenance changes from 55 percent to 2 percent. Others have seen reductions in errors of up to 80 percent.
Provides accountability for enforcing practices such as style guide compliance, testing, documentation, and writing DRY code.
Increases shared team knowledge of data product solutions and their underlying code.

Creating code reviews is easy to set up once you have a version control system in place. Engineers create a branch in the source control system in which they make their changes. When they’re getting ready to ship, they create a pull request to merge their changes with the main branch. Potential reviewers are notified that a change requiring review is pending.

Some additional best practices for code review include:

Size your pull requests to represent a single unit of work.
Set up automated testing in your version control system and require that tests pass before a pull request can be approved.
Use a common pull request template across all teams and engineers (we shared ours here).

A code review is a critical quality gate in a CI/CD deployment system. It ensures that a change has been fully vetted and tested before it’s made available to data stakeholders.

Read the whitepaper: The Analytics Development Lifecycle (ADLC)

Conclusion

Creating data that stakeholders can trust requires a process that builds in quality at every stage. By standardizing the way your team develops analytics code, you can ship smaller, higher-quality changes more quickly than you could with a manual, ad hoc analytics process.

Good code, however, isn’t the only thing you need. In the next installment of our series, we’ll look at how to use testing in the ADLC to validate quality prior to shipping your changes to stakeholders.

Last modified on: Jan 13, 2025

Early Bird pricing is live for Coalesce 2025

Save $1,100 when you register early for the ultimate data event of the year. Coalesce 2025 brings together thousands of data practitioners to connect, learn, and grow—don’t miss your chance to join them.

Save $1,100 — Register Now

Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.

Read now

Great data professionals never work alone

Every industry leader understands one thing: you need the right network to grow. The dbt Community connects you with 100,000+ data professionals—people who share your challenges, insights, and ambitions.

If you’re looking for trusted advice, expert discussions, and real career growth, this is the place for you.

Join the dbt Community Learn more

Solve your toughest challenges

Join today and get real-world advice from experienced pros.

Expand your network

Foster connections with meetups, local groups, and like-minded peers.

Advance your career

The dbt community is full of learning opportunities and shared job postings.