dbt
Blog Overriding Schema Generation

Overriding Schema Generation

The first dbt Live: Expert Series session is a wrap 🎉!

What is dbt Live?

The dbt Live: Expert Series is a 60-minute interactive experience where dbt Labs Solution Architects share practical advice on how to tackle the sticky problems we see come up over and over again in the field—like how to safely split environments with CI/CD, or how to implement blue/green deployments. Event agendas are shaped by audience requests, so whether you’re new to dbt, or just want to sharpen your skills, we’d love to see you there! Register to take part in the next live session.

Session Recap: Overriding Schema Generation

The first dbt Live session was led by Randy Pitcher, Solutions Architect at dbt Labs. Watch the full replay here.

Randy kicked things off by live coding a solution to a common problem he’s seen at many organizations: How to split dev and prod environments across databases.

Let’s start with what organizations are trying to achieve through this setup. Many organizations want to follow an environment promotion process like the one illustrated below, to make sure developers do not overwrite each other’s work in development and to put changes through a staging or QA check before merging them with production.

Breaks in production can produce incorrect or unavailable data leading to missed market opportunities or bad decisions. Organizations struggle with overriding default dbt behaviors for schema and database generation to arrive at their intended environment promotion process.

There are different ways to set this up if you’re running self-hosted (dbt Core) or dbt Cloud. But, Randy showed makes a more consistent process no matter how you deploy dbt. To achieve this, we can use macros to modify how dbt interacts with databases and schemas instead of using profiles.yml files (in dbt Core) or the environment configuration page (in dbt Cloud).

Using a macro to define this configuration is useful because it can be version controlled to monitor any changes to the script over time. This helps teams align with their company’s analytics governance processes.

Step-by-step: Walking through Randy’s approach

Getting started, Randy opened up dbt Cloud (or you can use your IDE of choice if you're using dbt Core) and created a new file within the macros/config folder named generate_schema_name.sql

Begin by defining the macro and what it will return (custom_schema_name) when called within your models. Defining the macro as generate_schema_name will overwrite the default dbt behavior for this macro.

{%- macro generate_schema_name(custom_schema_name, node) -%}

{%- endmacro -%}

Log the custom schema name output by the macro when it is run to help with troubleshooting. Node generates the node that is currently being processed by dbt such as {{this.database}} or {{this.schema}}.

{%- macro generate_schema_name(custom_schema_name, node) -%}
{{ log( node ~ '\n custom schema name: ' ~ custom_schema_name, info=True) }}
{%- endmacro -%}

Define each of your target environments, which are located in dbt Cloud environments or your profiles.yml file in dbt Core. For Randy’s demo, he defined default, pull request, and production environments.

{%- macro generate_schema_name(custom_schema_name, node) -%}
{{ log( node ~ '\n custom schema name: ' ~ custom_schema_name, info=True) }}
  {% if target.name == 'default' %}

  {% elif target.name == 'pr_test' %}

  {% elif 'production' in target.name %}

  {% endif %}
{%- endmacro -%}

Add to your target.name conditional statements the name you want dbt to generate.

{%- macro generate_schema_name(custom_schema_name, node) -%}
{{ log( node ~ '\n custom schema name: ' ~ custom_schema_name, info=True) }}
  {% if target.name == 'default' %}
    {{target.schema}}{{ '_' ~ custom_schema_name if custom_schema_name else '' }}
  {% elif target.name == 'pr_test' %}
    {{target.schema}}{{ '_' ~ custom_schema_name if custom_schema_name else '' }}
  {% elif 'production' in target.name %}
    {{ custom_schema_name if custom_schema_name else target.schema }}
  {% endif %}
{%- endmacro -%}

Once you’ve developed your code, save your file and compile your code to see if there are any compilation errors.

The code above will append the project name to the schema name if the target name equals default or pr_test. To keep individual developers from overwriting each other’s work in dev and pull requests, all while working within one database, Randy suggests using the best practice of naming schemas with this pattern: dbt_username_projectname.

Now, you’ve overridden the default generate_schema_name function and you can further customize this for your processes. You can check out a replay of these steps and hear Randy’s commentary on the process by watching the replay of the session below.

If you followed along with the video and were wondering if a copy of Randy’s repo would become available, we’ll save you some clicks by sharing his generate_schema_name script with you:

{%- macro generate_schema_name(custom_schema_name, node) -%}
{{ log( node ~ '\n custom schema name: ' ~ custom_schema_name, info=True) }}
  {% if target.name == 'default' %}
    {{target.schema}}{{ '_' ~ custom_schema_name if custom_schema_name else '' }}
  {% elif target.name == 'pr_test' %}
    {{target.schema}}{{ '_' ~ custom_schema_name if custom_schema_name else '' }}
  {% elif 'production' in target.name %}
    {{ custom_schema_name if custom_schema_name else target.schema }}
  {% endif %}
{%- endmacro -%}

If you’re not familiar with the target variable, config, or custom schemas, we suggest checking out the dbt docs for more info on how they work and more use cases.

Participant Questions

After the live coding session, Randy answered Community member questions live and in the Slack channel.

Here are a couple of the questions:

Great question, Daisy! To resolve this:

  • Go into dbt Cloud and if you are an account admin, go to Account Settings.
  • Select what project you want to change the GitHub repo.
  • Click the repository and disconnect the repo. This will not delete anything in the repo. Try to do this during off hours to not disrupt your developers.
  • Go to your Profile and select Integrations.
  • Click Link your GitHub profile and follow the instructions in dbt Cloud docs to select and configure it to connect to your organization’s GitHub repo.
  • Go back to the Account Settings page and select the project where you removed the GitHub repo earlier.
  • Click Configure a repository, select Github and then search to find the repo to import for your project. This will become the repo used for your branching process, IDE development, and running jobs in dbt Cloud for this project.

Thank you James, this question comes up frequently within the dbt Community in the #analytics-craft and #advice-data-modeling channels.

From Randy’s experience he sees most customers work with a mono-repo. The major reason is to have all of your code in one place to help with testing. If teams do break up a mono-repo into smaller projects this can result in duplication of models and work.

However, Randy said there are exceptions. Some customers with thousands of models do end up splitting their projects up and Randy advises:

  • Split up projects by domains, not maturity. For example: marketing, manufacturing, finance
  • Ensure there is one set of imports within a single foundational project
  • Define the outputs of the single foundational project as sources in your domain projects
  • Import the foundational project in your sub-project (domain project)
  • Follow the same pattern as you develop other sub-projects

Join the dbt Community Slack and the #events-dbt-live-expert-series channel to see more question responses and to ask your questions for upcoming sessions.

Want to see more of these sessions? You’re in luck as we have more of them in store. Go register for future sessions with more members of the dbt Labs Solution Architects team.

Until next time, keep on sharing your questions and thoughts on this session in the dbt Community Slack!

Last modified on: Oct 15, 2024

Build trust in data
Deliver data faster
Optimize platform costs

Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.

Read now ›

Recent Posts