Five use cases for the dbt Semantic Layer
Jul 15, 2024
ProductWe’ve written extensively about the benefits of the dbt Semantic Layer, its technical architecture, and our integration partners…but this is our first time formally documenting the various patterns for the use cases for semantic models throughout a company.
But first, a quick level set on the dbt Semantic layer:
Data teams use the dbt Semantic Layer to build metric definitions and store them centrally in code, right alongside their dbt models, and make them available across the enterprise through a consistent, tool-agnostic interface. It provides a “hub-and-spoke” architecture to your metric definitions: metrics are written and defined centrally (the "hub"), and can be queried from a number of “spokes”—analytics tools, APIs, LLMs, and more. Those end-points are always accessing the same centralized data, so organizations can be sure that they’re working off the same metric everywhere, every time.
What’s more, metrics can also easily be exported as tables back to your warehouse so that you have centralized, consistent, DRY business logic that’s traceable across your DAG and readily available as a critical building block to deliver on numerous use cases.
A note on exports
Lean on exports to get a cohesive and comprehensive view of your metric logic. Exports allow you to materialize a saved query—which is combination of metrics and dimensions—as a table or view in your data platform. Unify metric definitions in your data platform and query them as you would any other table or view. Using exports is a recommend best practice as they allow you to: 1) keep your code DRY, 2) lean on MetricFlow to generate SQL, thereby reducing the time it takes to build new rollups, and 3) automatically integrate metrics in your dbt DAG so you can visualize all of your business critical metrics—not just the ones that feed a BI tool—in one place.
Five use cases for the dbt Semantic Layer
The data you capture and transform is a means to an end. So how are early adopters of the dbt Semantic Layer taking advantage of consistent metrics across the business? We’ve seen 5 key use cases emerge: reporting & BI, embedded analytics, AI and LLM integrations, self-serve analytics, and exploratory analytics.
Let's dive in!
Reporting & BI: Deliver accurate data to your boss
Populating BI tools with relevant and accurate data is an obvious, and urgent, use case for the semantic layer. According to a Forrester survey, on average, a single organization utilizes four or more BI tools (and 25% of organizations use 10 or more!). When metric logic is housed within those individual tools, a few problems crop up:
- Data delivery bottlenecks:
- Metrics must be manually built within each BI tool, and as definitions evolve, they must be updated across each tool. This is undifferentiated heavy lifting that slows development time and introduces unnecessary data quality risk.
- With dispersed metric logic, it’s cumbersome to test and validate changes to logic or diagnose issues in a scalable, proactive manner.
- Impaired data trust:
- Metrics drive business-critical decisions, and there is no room for error. Mistakes are noticed by C-level stakeholders and impair trust in data and data teams.
- Discrepancies in metric outputs can derail cross-functional meetings, where energy is spent debating data accuracy and not on decision making.
Together, these issues ultimately impair data quality, can damage trust in data and data teams, and sabotage organizational efforts to embrace data-driven practices.
By utilizing the “hub-and-spoke” architecture a semantic layer provides, data teams can store semantic models and definitions centrally, and those metrics can be delivered on demand into any downstream tool that queries it—whether from a first-class integration with your analytics tool, or via a custom integration powered by an export to your data platform. That means that when the finance team uses Tableau to surface last month’s ARR for a board presentation, they’re seeing the same exact number as the Marketing team that uses PowerBI to analyze last month’s ARR by lead source. This consistency fosters trust in data, helps teams make decisions faster, and frees up data development cycles as teams no longer need to manually spelunk the root cause of a discrepancy.
What’s more, teams can explore the metadata associated with a particular metric, like data lineage, data freshness, definitions, and joins, so they are empowered with the information they need to always be 100% confident in the metrics they rely on.
“When you put everything on dbt, you ensure everyone is seeing the same number. You don’t get that message saying, ‘oh, my director got this GMV number and I’m getting this different one.’”
- Gabriel Marinho, Lead Analytics Engineer at Inventa
Embedded analytics: Power delightful in-app experiences
Data is powerful. And so, it shouldn’t be relegated to an internal KPI scorecard to align internal employees; it can and should be embedded into customer- and partner-facing applications as real-time, personalized data is the bedrock of building delightful and differentiated customer experiences. A recent report by Thoughtspot and Product Led Alliances found that companies that embed analytics with a differentiated user experience see increased user engagement and revenue.
But there are few considerations to take into account before embracing an embedded strategy:
- Data integrity: Once metrics are going outside the walls of your company, the stakes get higher. Unlike with an internal stakeholder, you can’t sort out discrepancies with a quick Slack message or Zoom alignment call: it’s CRITICAL that the data is correct. Otherwise, you risk impairing brand equity, losing customer trust, compromising revenue growth opportunities, and in some cases, breaking regulatory commitments.
- Costs: Embedding dashboards from traditional BI tools can be prohibitively costly, especially when leaning on tools with seat-based pricing. Meanwhile, relying on internal engineering teams to stand up a solution requires tradeoffs on other product deliverables, not to mention a maintenance burden.
- Flexibility: Leaning on a BI tool for embedded analytics limits a developer’s ability to build a customized and seamlessly branded product UI—and delivering a delightful customer experience is the driving force behind an embedded strategy in the first place. Teams need the consistency and control afforded by custom APIs.
With the dbt Semantic Layer as your foundation, you can calculate complex metrics centrally—right alongside the rest of your data—version control them, and deliver them as up-to-date embedded visualizations in your app, website, or wherever your stakeholders consume data. Data teams can build custom web apps using developer-friendly APIs or SDKs and serve up relevant, personalized data to end-users…without driving up costs with legacy BI tools. And with built-in caching and performance-optimized SQL, you can always be sure those metrics are delivered lightning fast. What’s more, your teams can build confidence in the data that’s being served to your customers with the ability to visualize dependencies from source all the way to the metric.
"The dbt Semantic Layer gives our data teams a scalable way to provide accurate, governed data that can be accessed in a variety of ways—an API call, a low-code query builder in a spreadsheet, or automatically embedded in a personalized in-app experience. Centralizing our metrics in dbt gives our data teams a ton of control and flexibility to define and disseminate data, and our business users and customers are happy to have the data they need, when and where they need it.”
- Hans Nelsen, Chief Data Officer, Brightside Health
AI & LLMs: Turn your questions into governed, high quality answers
Everyone wants to embrace AI (or at least have a plan for adopting it!)…and no one wants to have bad data. In our most recent State of Analytics Engineering Report, the majority of respondents noted that they manage data for AI model training (either currently or within the next 12 months). Unsurprisingly, data quality is an undeniable pre-requisite for the successful adoption of AI: a recent Tableau / Salesforce study found that 86% of analytics and IT leaders agree that AI's outputs are only as good as its data inputs.
Using the dbt Semantic Layer, you can set up your AI investments for success. By enriching your LLMs with high-context, well-governed data inputs, you can ensure high quality outputs across your AI stack:
- Reduce hallucinations: Get high-quality AI responses, backed by real data which keeps your models on track, and enriched by business context
- Build once, use everywhere: Define your business logic via metrics in the semantic layer, and access it through any connected LLM
- Lower the barrier to analytics: Democratize data-driven decision making by empowering less technical users to self-service the answers to their questions using an agent that utilizes natural language.
As an example of what can be accomplished by powering your LLM with governed semantic definitions, dbt Labs built an agent to interact with the dbt Semantic Layer using plain language text: Ask dbt. Unlike traditional AI chatbots, Ask dbt uses the dbt Semantic Layer to provide critical context about your dbt project, improving accuracy by 3x as observed in our benchmark. With Ask dbt, users can ask questions in natural language and receive insights in an understandable format, which can significantly speed up business processes and decision-making. Make decisions quickly with the confidence that the metrics you’re using are always consistent—regardless if you got them through Ask dbt, your AI chatbot, Tableau, Google Sheets, or anywhere else.
Ask dbt is currently in public beta for customers using the dbt Snowflake Native App.
Self-serve analytics: Democratize data where people already are (hint: that’s probably a spreadsheet)
To scalably adopt data-driven practices across an organization, data teams can’t be involved in every data request. Manual human intervention isn’t scalable, nor is it efficient. Self-serve analytics makes it possible for anyone —even especially those that aren’t technical—to get the data they need to make strategic decisions. And that means they need to access data from the tools they already use. Most likely, that’s a spreadsheet, but could be any interface they already gravitate towards. From there, users can tap into a no-code interface to view dashboards and reports or slide and dice data themselves to get reliable answers quickly.
Predicated on the success of any self-serve strategy is that the data needs to be both accessible and accurate. dbt offers built-in integrations across a variety of data consumer end-points:
- BI & analytics tools: Business users can jump into their BI tool of choice—including Tableau, Hex, and Google Sheets— build their own queries via intuitive drag-and-drop interfaces, or leverage the saved queries that their colleagues have used to expedite data discovery.
- LLMs: Less technical users can ask natural language questions such as, “what was our revenue in EMEA last quarter?” and get a high-context, governed, consistent answer back. The dbt Semantic Layer currently offers a native integration with Snowflake Cortex via our Ask dbt feature.
The data, metrics, and reports flowing into these end-points are sourced centrally and are fully governed, tested, and version controlled. This gives data teams the peace-of-mind to feel comfortable removing themselves from what otherwise would have been a ticket for a new data pull and an immediately outdated CSV file. dbt also streamlines authentication workflows with granular access controls that can be configured for individual users or groups.
By centralizing metrics on the dbt Semantic Layer, data teams can minimize ad hoc requests while ensuring high quality and governed data becomes easily accessible across the organization. As a result, data velocity, collaboration, and trust improve which further supports data ROI.
Exploratory analytics: Surface new insights that drive competitive edge
Exploratory analytics is a critical step in the data science workflow that involves using Python libraries to inspect data, discover patterns, and verify hypotheses that ultimately inform winning business strategies and deliver data ROI. For example, a data scientist may want to analyze trends for a particular metric over time and calculate summary statistics to understand historical data and better predict the future.
While highly strategic, exploratory analytics can be challenging: data sources are constantly expanding, you need the ability to combine data across various sources to get a complete picture of reality, and you need the flexibility to continuously iterate your questions and quickly slice metrics across various dimensions to get to the bottom of something…all while ensuring data integrity and quality along the way. The inherent expansiveness and fluidity of this practice can introduce productivity challenges, and so it’s essential that data teams have the tooling and resources to make the iterative process of exploratory analytics as efficient and outcome-oriented as possible.
We’ve all heard the phrase “garbage, in garbage out,” and the truism also applies when training ML models. If data scientists are building models starting from governed metrics definitions, they can be confident that the datasets they’re using accurately represents the business metric they are trying to improve. Without this assurance, all of the data modeling they conduct is moot.
Using the dbt Semantic Layer, data science teams can take advantage of centralized and governed metrics—that live alongside other data models—and query and join them to support exploratory analytics workflows. All data metrics and models are version controlled, lineage can be explored from source all the way through to metric, and teams can easily join semantic models with other dbt models. We offer native integrations to a number of workbooks and exploratory analytics tools (Hex, Mode), data can be exported back to your warehouse as centralized, governed metrics tables to then deliver downstream, or data teams can query data via our JDBC or GraphQL APIs in their notebook of choice.
Get started
To get started building your own semantic models and metrics, check out our documentation, schedule a call with on of our product experts, or you can reach out in Community Slack (#dbt-cloud-semantic-layer).
Last modified on: Sep 11, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.