As an analytics engineer on the professional services team at dbt Labs, I've helped a dozen or so data teams refactor from a legacy SQL transformation workflow to modular data modeling in dbt.
I've noticed a clear pattern: rare is the team that starts with a completely clean slate. More often than not, teams have accumulated years of SQL code powering business-critical, often complex data transformations.
These SQL transformations may run in a stored procedure, in an Airflow SQL operator, or some other scripting environment---but regardless of where it runs, that legacy SQL code must be migrated!
How can that migration be done with a) minimal headache and b) maximum quality?
Announcing a new (free!) on-demand course
Over the last 18 months, the dbt Labs professional services team has evolved a process to help teams migrate legacy SQL queries to dbt SQL.
And as of today, you can now learn + implement that same refactoring process, with a new on-demand course.
If you're completely new to dbt, you may want to start with the dbt Fundamentals course before diving into this one.
All the links
Everyone learns a bit differently, so you can pick up the course materials in a few different formats:
- The video course, Refactoring SQL for Modularity, which includes a practice query + dataset to test your refactoring skills.
- A pull request in a GitHub repo, where you can follow commit-by-commit how a monolithic query (as from a stored procedure) breaks down modular dbt code (note: these commits are covered in-depth in the video course).
- A written tutorial on legacy SQL -> dbt SQL refactoring, in case you prefer reading or bookmarking the short version.
Our SQL refactoring process, in short
For a bit of background, the migration workflow that we follow internally is:
- Migrate your legacy SQL code, unchanged---make sure it runs and produces an identical output.
- Implement dbt sources (as opposed to raw table references).
- Choose a refactoring strategy.
- Implement clean CTEs (we ❤️ CTEs).
- Separate transformation logic into standardized data model layers (often the longest step).
- Audit the output (often using the audit_helper dbt package).
There is, of course, no one true way to refactor legacy SQL into dbt models!
I'd love to hear from you (on Twitter or in #learn-on-demand in dbt Community Slack) how your approach varies from the one we lay out in the course.
Last modified on: Oct 15, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.