dbt
Blog The Analytics Development Lifecycle: Develop

The Analytics Development Lifecycle: Develop

In the past, data teams and stakeholders didn’t pay much attention to the quality or reusability of analytics code. These days, more teams are realizing how properly written, reviewed, and standardized analytics code contributes to more frequent and higher-quality releases.

Patterned after the Software Development Lifecycle (SDLC), the Analytics Development Lifecycle (ADLC) provides a model for implementing changes to your data system. In our previous installments in this series, we looked at how the ADLC impacts planning. In this article, we’ll see how you can change the way you author analytics code to improve reliability and increase data product velocity.

Stages of the Develop phase

The ADLC provides a framework for a mature analytics workflow. It aims to solve issues with low data code velocity, inaccurate results, and impaired trust that continue to plague too many analytics projects.

Data stakeholders use the Analytics Development Lifecycle to develop and ship small changes to analytics code in a short timeframe, repeating the full process for every significant change to the system. When done well, the ADLC yields better collaboration, scale, velocity, correctness, and governance.

The SDLC is built around a DevOps approach that treats developing and managing software applications as part of the same unified process. Similarly, the ADLC is built around a DataOps model that coordinates the efforts of analytics code developers with the data engineers, analytics engineers, and business analysts who manage and use analytics data.

ADLC loop

The Develop phase is where engineers - dedicated data engineers, analytics engineers, or even business stakeholders with technical chops - turn analytics use cases into deployable data products. It’s a critical phase, as it has an outsized impact on the overall quality of the final solution.

An effective Develop phase in the ADLC consists of the following components:

  • Code first
  • Adhere to a style guide
  • Prioritize functionality over performance
  • Invest in code quality
  • Use code reviews
  • Use standards to avoid lock-in

Let’s look at each phase in detail.

Code first

In the past, analytics data changes existed in a mix of Excel macros, SQL scripts, programmatic code, stored procedures, and visual tools. Most of these existed only on engineer’s laptops and were run by hand, with engineers performing tweaks and corrections as needed.

These processes weren’t repeatable or discoverable. They often didn’t result in high-quality or fast releases, as the knowledge needed to run them resided in someone’s head.

In the ADLC, all business logic impacting data should be captured in code. All code should be:

  • Editable by multiple people with different tools - programmatic text editors, Integrated Developer Environments (IDEs), etc.
  • Checked into version control systems to enable collaboration and prevent conflicts
  • Broken down into composable units for reuse
  • Deployable via an automated process, a.k.a. CI/CD

This code first approach ensures that all analytics code changes are:

  • Discoverable: Others can find and modify the code as needed
  • Traceable: Data stakeholders can see when a change was made and who made it - and revert it if necessary
  • Reusable: Other data stakeholders can find and reuse general-purpose solutions in their projects
  • Repeatable: The same process can be used to develop and ship any analytics code changes
  • Tool agnostic: Analytics code developers can use any development tools that fit their workflow

Developing a code first strategy can take time and effort to create, configure, and deploy across an organization. Using a platform like dbt Cloud, which is built from the ground up on a code first philosophy, provides the necessary tooling and infrastructure out-of-the-box, shortening the time required to transition to a mature approach to developing analytics code.

Adhere to a style guide

Putting all code in common version control repositories makes it easier to find and maintain. However, code can be hard to read and maintain if everyone’s using different coding conventions.

A style guide provides consistency in code formatting and conventions (names of variables, use of whitespace, etc.) across everyone who touches analytics code. That makes code easier to read - which makes it easier for those who didn’t write it to pick up and maintain.

You can also enforce standardization automatically via mechanisms such as linting code - e.g., running sql-lint on all SQL code. Tools like the dbt Cloud IDE can assist standardization with features such as syntax highlighting for SQL, code formatting, and linting.

Prioritize functionality over performance

Premature optimization, said Sir Tony Hoare, is the root of all programming evil. In the initial stages of coding, engineers should focus on implementing their business use case versus fine-tuning for optimal performance.

This doesn’t mean engineers shouldn’t consider performance at all. The design of a data solution will have the largest impact on how quickly it runs.

It means, instead, not chasing small efficiencies at the project’s outset. That time’s better spent on iterating with stakeholders to ensure the solution fits their requirements. In Hoare’s words, “We should forget about small efficiencies, say about 97% of the time.”

Encourage engineers to focus on the requirements first. Then, at the tail end of the project, the y can implement the final tweaks needed to get the most out of the system at scale.

Invest in code quality

Quality analytics code requires a process that builds quality into the entire development lifecycle. Part of that is keeping code clean and maintainable.

A few key practices here include:

  • Write DRY code. DRY, or Don’t Repeat Yourself, is the principle of factoring out common code to reusable modules. This prevents you from duplicating code unnecessarily, which can inject defects. It also enables teams to work more quickly, as they can use tried-and-tested procedures for common operations rather than develop new code from scratch.
  • Define common metrics. In analytics, creating common metrics is an indirect form of reusability where you centrally define critical business metrics, such as revenue, that others can leverage in their own solutions. Tools like the dbt Semantic Layer simplify defining and deploying centralized metrics.
  • Write documentation and in-line comments. Use a tool like dbt Cloud that supports writing documentation in code to document method parameters, field definitions, assumptions, and other important aspects of your analytics implementation.

Any authoring platform you adopt for implementing the ADLC should have a mechanism for defining and sharing reusable code. For example, dbt Cloud supports defining packages - standalone dbt projects with models, macros, and dependencies that other teams can reference from their own dbt projects.

Perform code reviews

Code reviews ensure that every proposed analytics code change is seen by a second set of eyes. Code reviews have been shown to provide multiple benefits:

  • Reduces defects in shipped code. In one study cited in the classic software engineering book Code Complete, introducing code reviews reduced errors in one-line maintenance changes from 55 percent to 2 percent. Others have seen reductions in errors of up to 80 percent.
  • Provides accountability for enforcing practices such as style guide compliance, testing, documentation, and writing DRY code.
  • Increases shared team knowledge of data product solutions and their underlying code.

Creating code reviews is easy to set up once you have a version control system in place. Engineers create a branch in the source control system in which they make their changes. When they’re getting ready to ship, they create a pull request to merge their changes with the main branch. Potential reviewers are notified that a change requiring review is pending.

Some additional best practices for code review include:

  • Size your pull requests to represent a single unit of work.
  • Set up automated testing in your version control system and require that tests pass before a pull request can be approved.
  • Use a common pull request template across all teams and engineers (we shared ours here).

A code review is a critical quality gate in a CI/CD deployment system. It ensures that a change has been fully vetted and tested before it’s made available to data stakeholders.

Conclusion

Creating data that stakeholders can trust requires a process that builds in quality at every stage. By standardizing the way your team develops analytics code, you can ship smaller, higher-quality changes more quickly than you could with a manual, ad hoc analytics process.

Good code, however, isn’t the only thing you need. In the next installment of our series, we’ll look at how to use testing in the ADLC to validate quality prior to shipping your changes to stakeholders.

Last modified on: Jan 13, 2025

Build trust in data
Deliver data faster
Optimize platform costs

Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.

Read now ›

Recent Posts