Five tips and tricks for getting the most out of dbt Explorer
Nov 25, 2024
ProductSpending your working hours putting out fires or fielding never-ending tickets is a short path to burnout. Luckily, dbt Explorer provides is designed to help you move beyond reactive workflows and offers the holistic context and breadcrumbs you need to build, fine-tune, and troubleshoot your pipelines in a more proactive way.
When you fire up dbt Cloud, dbt Explorer should be your starting point. It’s a knowledge base that offers a visual representation of all of the metadata across your dbt pipeline. You can double-click into any node to get detailed context about that data asset, its dependencies, its freshness, who it's used by, how to improve it, and more. Using dbt Explorer, you have a launchpad to easily understand how your data models are interconnected, what data products they inform, and which models or nodes may need your attention, with the necessary insights on how improve them.
“dbt Explorer makes it easy for our consumers to understand the entire lineage from the source to reporting—and all of the data quality checks or issues along the way—without having to go ask a dev." - Robert Goodman, Lead Developer of Enterprise Data Analytics at Lennar
Since launching dbt Explorer a year ago, we’ve been steadily shipping a lot of new functionality to make it the best place to navigate, understand, and improve your dbt projects. This post will dive into five tips and tricks for getting the most out of dbt Explorer so you can build trust in your data products, ship with more confidence and velocity, and tune your projects so you can optimize costs.
Tip #1: Find the exact resource you need by mastering search filters and selector syntax
dbt Explorer has advanced search capabilities that make it easy to find the exact resource you’re looking for; saving developer time, reducing siloed workstreams, and enforcing model consistency.
Search
Let’s say I’m a data analyst at a B2C company that’s looking to understand how many of our transactions get routed through our call centers. I know we have column in our data that tracks whether a call center was part of the transaction cycle, but I don’t know exactly what the column is called or which data source it lives in. This discovery exercise is simple with dbt Explorer. When I enter “call” in the search bar:
- I am rendered a long list of resource names, column names, resource descriptions, warehouse relations, and code that match my search criteria. I can easily narrow this list down by using the
Column name
filter. Now, I’m able to see all relational nodes that contain that column in their schemas.
- I find the relevant column I’m looking for,
CALL_CENTER_SK
, and see that it’s tied to our public “transaction” model that is exposed. Using the lineage lenses at my disposal (read more about these in Tip #4), I can see this is a healthy model that’s often consumed by many stakeholders, and therefore I know this has the data I’m looking for.
Without dbt Explorer, this exercise becomes a lot more tedious. I’d have to search and sift through potentially hundreds of tables and their schemas to confirm which is the correct column. Then I’d have to run a number of test queries to see if the column looks correct. Finally, I’d likely have to ask someone on the data team to confirm which table is good for consumption.
Selector syntax
Seemingly unruly lineage graph? Explorer has you covered with logical selector syntax based on the same selection syntax used in the dbt CLI. You can filter the lineage graph in the same way you would filter resources during a dbt invocation.
Let's go through some of the basics. Just like running a single model with dbt run -s my model
, you can search for that model using its name.
You can also use graph operators to see the model's lineage; to see everything just one step downstream of a resource model, use “+1” after the resource name, like transactions+1
. If you want to see additional dependencies past one, you can add +2, +3, and so on.
If you want to see one step upstream of a particular resource, simply append “1+” before the resource name, for example 1+transactions
. You can build on this logic to show one step upstream and downstream of a resource with the syntax 1+transactions+1
, and so on.
If you want to see specific resource types only, you can use the “resource_type” specification, for example resource_type:model
. This goes for any resource type in the DAG (model, sources test, seed, snapshot, exposure, metric, semantic_model, macro, group).
Similarly, you can use the + operator to see additional levels of lineage for the resource type. For example, you can see everything one step down from every source in your dbt project using resource_type:source+1
We support a whole host of selector methods and you can find auto-suggested selectors in the lineage search bar. To read more about search and syntax selector in dbt Explorer, see our docs.
“I always have dbt Explorer up on one half of my screen to be ready to answer questions about where data is coming from or how a column is defined. It's a tremendous time saver for me.” - Brian Gillet, Director of Data and Analytics at Hazel Health
Tip 2: Discover cross-project assets and view lineage from a single pane of glass
Oftentimes, organizations manage multiple projects within one dbt account. For example, there may be a project for the marketing team, another one for the central data team, and a third for the finance team. And with dbt Mesh, these projects can reference each other to promote better collaboration and governance while keeping code DRY. dbt Explorer offers a few ways to discover and understand cross-project assets and lineage.
Find all public models
Using dbt Explorer, it's easy to discover all of the public models across your account. Using this view, you can get an at-a-glance understanding of what projects use these models, who owns them, and then dive into lineage with a single click.
Navigate cross-project lineage
It’s also easy to view lineage for more than one project side-by-side in dbt Explorer. If your data pipeline includes a reference to another project, simply double click that project’s node in your lineage graph.
When you do, we’ll automatically render a new tab with that project’s lineage (zoomed into what should be most relevant based on your flow) so you can visualize both graphs side-by-side.
This feature keeps you in your flow, giving you an intuitive way to view downstream or upstream dependencies used by other projects. This keeps your DAGs manageable while still providing a useful comparative view into your dependencies
Tip #3: Build trust with detailed context into what dashboards—and teams—your models power
All of the data that you curate in dbt models is in service of your business initiatives, and now, native in dbt Explorer, you have the detailed context needed to connect the dots between a data model and the business value it drives. Two new interrelated features—auto-exposures and model query history—make this happen.
Auto-exposures
With auto-exposures (now available for Tableau, coming soon for other BI platforms), you can have your lineage graph automatically build out downstream Tableau dashboards. This gives data teams automatic context into how and where models are used, so they can prioritize data work to promote data quality. Coming soon, you can trigger downstream dashboards to automatically refresh as soon as new data is available, giving business stakeholders confidence that they’re always making decisions from the freshest data.
Model query history
You can also easily layer in a “lens” across your DAG to build context for each node on things like materialization type, freshness, and—as shown in the image below—model consumption. By quickly grasping the relative popularity of your models, you have a data-driven “to do” list of where you should allocate your engineering resources, as well as built-in empathy for the consumers behind the models you build.
Having this context is critical to ensure that you keep your business-critical dashboards up and running. After all, no one wants to get a frantic call the morning of a board meeting that the dashboard is broken. With these details at your fingertips, it’s easy to do one better and actually optimize the inputs that drive that dashboard so you can continue to build trust with your stakeholders.
“Leveraging the model query history feature in dbt Cloud has transformed our approach to optimization. It empowers us to gain deep insights into our SQL execution, identify performance bottlenecks, and enhance our data models with confidence. We're excited to see how this feature evolves in the roadmap ahead." - Gary How, Data & Analytics Architect at Kenvue
Data health tiles
A bonus feature for building that trust is the turnkey ability to embed health tiles that provide trust signals like data freshness and data quality directly in those downstream dashboards. This gives your stakeholders confidence in the data they’re about to use, and empowers them with the transparency they need to trust the data you provide.
In-app health signals
These health signals are also accounted for throughout the in-app dbt Cloud experience, giving developers and other dbt users an at-a-glance understanding of whether the model they’re about to use is fresh, error-free, tested, documented, and more.
Tip #4: Supercharge your context and debug faster with lineage lenses
Using lineage lenses, you can visualize your lineage graph from a number of different parameters—beyond the default of resource type—so you can grok critical details that help you build more resilient, efficient pipelines and debug issues faster. These lens overlays include:
- Model layer (staging, intermediate, marts)
- Materialization type (table, view, materialized, incremental, ephemeral)
- Model execution status (success, fail, error, warn, skipped)
- Test status (pass, error, fail, warn, skipped)
- Column-level evolution to see how your columns change across the pipeline (passthrough, transformed, renamed, etc.)
- Model query history (actual metric for the last 30 days and visual representation for high, low, medium of consumption queries against the models) (highlighted in Tip #3 above!)
With this layered context about your data estate at your fingertips, it becomes much simpler to understand pipeline issues (test status, column-evolution lens), know which resources to spend development time on (model query history, model execution status), and identify ways to simplify and streamline your pipeline (model query history, materialization type).
Column-evolution lens
You can also use lineage lenses to easily visualize how your columns evolve across your DAG. When debugging a pipeline issue, it’s a huge timesaver to be able to quickly understand whether a column has simply been reused, or if it in fact has been transformed somewhere in your pipeline.
Having this transparency of what transformations are happening at the column level allows analysts to easily confirm how a particular column evolves throughout the pipeline. This helps users understand how a particular data point was calculated or helps developers and analysts decide how they should extend the model for further analysis. Additionally, when columns in a source table are modified, renamed, or removed, column-level lineage shows exactly which downstream tables, views, or dashboards will be affected. This visibility allows teams to proactively anticipate and plan for downstream impacts of changes, preventing disruptions in analytics or reporting.
Overlaying this context onto your lineage graph with lineage lenses is a powerful sidekick as you build, troubleshoot, analyze, and improve your data pipelines. Having this context at your fingertips helps you understand how data evolves from ingestion to analysis, promoting data quality without compromising velocity. Check out this demo video to learn more about lineage lenses in dbt Explorer.
Tip 5: Fine-tune and tidy up your data estate with project recommendations
Another great way dbt translates your project metadata into actionable insights is with project recommendations in dbt Explorer.
No one likes an urgent fire drill alerting you to an outage or quality issue in your data pipeline. You can get ahead of these potential issues with project recommendations that surface proactive ways to improve the test coverage, documentation, and overall project health of your dbt models. You can filter the list by severity (high, med, low), improvement category (documentation, performance, testing, etc.), and rule names.
With these insights at the ready, data teams can proactively tackle improvements to the performance and quality of their data projects. The end result is more resilient pipelines and better trust with stakeholders…done in a way that’s proactive, data-driven, and manageable for data teams.
“dbt Explorer is an indispensable ally for any data-driven organization aiming for excellence in their analytics workflows. We gained valuable insights into project data quality and adherence to dbt best practices. It not only helped us pinpoint areas for code enhancement but also significantly improved our documentation practices. We achieved substantial enhancements in data quality percentages, effectively mitigating data errors in the bronze/silver layer and ensuring a higher standard of data quality for our end consumers.” – Shravan Banda, Solutions Architect at World Bank
Get started with dbt Explorer today
dbt Explorer is generally available to all dbt Cloud customers. Given the amount of new features we’ve shipped into Explorer this past year, we'd be curious: how many of these features are you using today? Hopefully, this post inspired you to fire up Explorer and discover how it can help improve your workflow.
It’s really easy to get started. Just navigate to the “Explore” tab in dbt Cloud. Maybe start by assessing which models are most popular, or seeing what additional tests or documentation you can build (or better yet, have dbt Copilot build them for you!) to better tune your projects. We also offer a free online course on getting started with dbt Explorer where you’ll get hands-on instruction on how to use many of these features.
Last modified on: Nov 27, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.