The Analytics Development Lifecycle: Test

on Jan 28, 2025

Data quality errors are your company’s worst enemy. At best, they undermine people’s trust in the data that drives the business. At worst, they can provide false information that leads to erroneous - and costly - business decisions.

The Analytics Development Lifecycle (ADLC) aims to create a mature analytics workflow that produces high-quality, frequently updated data with every iteration. A key part of delivering that quality is not just creating tests, but fostering a test-driven culture as part of the ADLC.

We’ll explore how testing fits into the ADLC, the types of tests you should be creating, and how to best manage your testing efforts for maximum positive impact.

Test in the ADLC

The ADLC is a variation of the Software Development Lifecycle (SDLC) that focuses on shipping new or revised data products. Like the SDLC, it breaks down artificial barriers between the different personas that deal with data, treating it as a single, unified process.

In the ADLC, the different personas that handle data - the engineer, the analyst, and the decision-maker—work together to plan, develop, test, deploy, monitor, and use new data products. The process focuses on creating small, well-defined changes and shipping frequently.

We’ve covered how the Plan and Develop phases of the ADLC work. These phases help ensure quality by ensuring that:

The work done accurately captures business requirements (Plan); and
All data changes are captured in code, and that code is clean, readable, and reusable (Develop)

The Test phase creates assets that validate that your assumptions about your data and analytics code are correct before pushing a change to production. By testing your data, you can identify issues early in the development lifecycle, preventing expensive rework and downtime down the road.

A good Test phase involves:

Writing tests for every data asset you own
Running tests before they’re merged into production
Continuously testing production data to detect anomalies

Types of tests in the ADLC

Let’s first look at the different types of data tests you’ll want to focus on writing:

Unit tests
Data tests
Integration tests

Unit tests

Unit tests validate small functional portions of your data models and transformations to ensure correctness. They validate your logic on a small set of static inputs before running it on actual data. In data pipelines, this means validating your SQL modeling logic’s correctness.

dbt Cloud supports developing unit tests alongside your SQL models and running them on demand. You don’t need to create a test for every single transformation. However, you should always aim to create unit tests when you have:

SQL with custom logic
Reported defects (to verify the fix and prevent regressions)
Edge cases
High criticality models, such as organization data sets where a defect could have wide-scale negative impact

For example, this test in dbt verifies that a routine for verifying email addresses captures known edge cases, such as malformed addresses and invalid domain names:

unit_tests:

- name: test_is_valid_email_address

description: "Check my is_valid_email_address logic captures all known edge cases - emails without ., emails without @, and emails from invalid domains."

model: dim_customers

given:

- input: ref('stg_customers')

rows:

- {email: cool@example.com, email_top_level_domain: example.com}

- {email: cool@unknown.com, email_top_level_domain: unknown.com}

- {email: badgmail.com, email_top_level_domain: gmail.com}

- {email: missingdot@gmailcom, email_top_level_domain: gmail.com}

- input: ref('top_level_email_domains')

rows:

- {tld: example.com}

- {tld: gmail.com}

expect:

rows:

- {email: cool@example.com, is_valid_email_address: true}

- {email: cool@unknown.com, is_valid_email_address: false}

- {email: badgmail.com, is_valid_email_address: false}

- {email: missingdot@gmailcom, is_valid_email_address: false}

Data tests

Data tests validate that data transformations are running correctly against the actual data. They verify that:

The data is current
The model is sound
The transformed data is accurate

Data tests usually start by testing basic assumptions about unique and non-null fields (e.g., primary keys), accepted values, and relationships between data. Once you’ve nailed those aspects, you can move on to more proactive tests that focus on verifying freshness and looking for domain-specific problems. For example, if a customer can only have one active subscription to a service, verifying that there aren’t records that violate this constraint.

As with unit tests, you can specify these tests using dbt Cloud and run them with the dbt test command.

Integration tests

Whereas unit tests test one small unit of functionality, integration tests operate against the entire application or project. They ensure your solution works end to end and not merely in isolation.

In the software world, this might involve calling a REST API and ensuring that the REST API endpoint, associated authentication procedures, underlying data stores, connected APIs, etc. all work. In data, you’ll use it most often to test packages, reusable units of analytics code that multiple projects leverage.

In dbt, you can keep unit, data, and integration tests separate by placing them in separate subdirectories. That enables running them at different points of the ADLC.

When to run tests

Anyone who’s creating or updating analytics code - i.e., who’s wearing the engineer hat - is responsible for creating or updating the associated tests. The engineer should make sure to run unit and data tests on their local machine prior to check-in.

As discussed in the Develop phase, engineers should work in their own source control branches. When ready to push to production, they should cut a Pull Request (PR). Another engineer should review their changes - yet another quality control measure - before approving the merge.

The PR should also automatically trigger a run of any associated tests for the change against non-production data in an isolated environment. If the tests don’t fail, the PR should prevent the merge to production until the issue’s resolved.

dbt Cloud supports running tests automatically against a staging schema when it detects a PR has been opened or submitted in your Git provider. You can see this run in either the dbt Cloud dashboard or directly on the PR page of your Git provider, along with any errors that resulted.

Tips for managing testing

Here are a few more tips to get the most out of data testing:

Developing a culture of testing. It’s easy to throw testing by the wayside because you’re busy and you just wanna get something out the door. As our CEO Tristan Handy has written, “The desire to skip writing good tests and move on to the next task is always present and must be balanced via accountability mechanisms like code reviews, linting, and test coverage metrics.”

Get everyone on board with testing as a matter of habit. Set a bar where testing is required for a change and enforce it during PR reviews so that team members hold each other accountable.

Keep the scope of work small. This is a central tenant of the ADLC that’s critical in testing. The larger a change, the harder it is to verify its functional correctness. Conduct training on properly scoping PRs so that all submitted changes contain enough new logic to be useful - but not so much that you can’t verify its accuracy.

Determine your level of test coverage. Decide how much of your analytics code should require testing. In the software field, most teams aim for around 70-80% test coverage. You may need less depending on the complexity of your code.

Once you have a metric for test coverage, monitor it over time to ensure you’re hitting your goal. The dbt Cloud dashboard Recommendations page shows you your overall test coverage as a percentage of how many of your models have defined tests.

Fix or retire “flaky tests.” A flaky test is one that fails intermittently, usually due to some network or environmental condition, or just poorly written logic. Ignoring flaky tests is dangerous because it can foster “alert fatigue,” leading people to tune out and ignore real errors. Either identify the cause of a flaky test and fix it or remove it from your test suite altogether.

Conclusion

The ADLC creates high-quality data sets by making small changes over a series of rapid iterations. Testing verifies quality by making assertions about the state of your data and analytics code.

Since its inception, dbt has supported creating a test-driven culture by building support for testing directly into both dbt models and dbt Cloud. With dbt Cloud as your data control plane, your data teams have a standardized and cost-efficient way to build, test, deploy, and discover analytics code.

In our next installment of this series, we’ll look at how you can leverage dbt Cloud to implement a CI/CD-style approach to deploying analytics code safely to production.

Published on: Dec 26, 2024

2025 dbt Launch Showcase

Catch our Showcase launch replay to hear from our executives and product leaders about the latest features landing in dbt.

Watch the launch event replay

Set your organization up for success. Read the business case guide to accelerate time to value with dbt.

Read now

Latest posts

Product4 min

Iceberg Ahead: dbt now supports Apache Iceberg tables in BigQuery

Stephen Robb

on Jul 09, 2025

Learn17 min

Life after Talend and Informatica: Migrating to the future of data

Kathryn Chubb

on Jul 07, 2025

Insights8 min

dbt Labs on dbt: Building the habit of cost-aware data development

Brandon Thomson

on Jul 03, 2025

The dbt Community

Join the largest community shaping data

The dbt Community is your gateway to best practices, innovation, and direct collaboration with thousands of data leaders and AI practitioners worldwide. Ask questions, share insights, and build better with the experts.

Join the Community Explore the community

100,000+active members

50k+teams using dbt weekly

50+Community meetups

The Analytics Development Lifecycle: Test

Test in the ADLC

Types of tests in the ADLC

Unit tests

Data tests

Integration tests

When to run tests

Tips for managing testing

Conclusion

2025 dbt Launch Showcase

Share this article

Latest posts

Iceberg Ahead: dbt now supports Apache Iceberg tables in BigQuery

Life after Talend and Informatica: Migrating to the future of data

dbt Labs on dbt: Building the habit of cost-aware data development

Join the largest community shaping data