dbt
Blog Guide to DML

Guide to DML

Jul 01, 2024

Learn

Data Manipulation Language (DML) is a class of SQL statements that are used to query, edit, add and delete row-level data from database tables or views. The main DML statements are SELECT, INSERT, DELETE, and UPDATE.

DML is contrasted with Data Definition Language (DDL) which is a series of SQL statements that you can use to edit and manipulate the structure of databases and the objects in them.

Similar to DDL, DML can be a tad bit boring. However, DML statements are what allows analysts and analytics engineers to do their work. We hope you can use this glossary to understand when and why DML statements are used and how they may contrast with similar DDL commands.

Types of DML Statements

The primary DML statements are SELECT, INSERT, DELETE, and UPDATE. With the exception of SELECT statements, all of the others are only applicable to data within tables in a database. The primary difference between SELECT and all the other DML statements is its impact to row-level data:

  • To change the actual data that lives in tables, use INSERT, DELETE, and UPDATE statements
  • To access the data in database object, use SELECT statements

SELECT

Ah, our favorite of DML statements! This is the SQL we all know and love (most of the time). Because the SELECT statement allows you to access and manipulate data that exists in database objects, it makes it the true powerhouse in data analysis and analytics engineering.

You write SELECT statements to create queries that build data models and perform robust analysis. With SELECT statements, you can join different views and tables, qualify data by setting filters, apply functions and operators on the data, and more. SELECT statements, unlike INSERT, DELETE, and UPDATE, don’t actually change the row-level value stored in the tables/views. Instead, you write SELECT statements to express the business logic needed to perform analysis.

All SELECT statements need three elements: a SELECT clause in the beginning, the actual field selection and manipulation, and a FROM statement which is specifying which database object you’re trying to access.

Here’s an example SELECT statement:

select

	payment_method,
	sum(amount) AS amount

from {{ ref('raw_payments') }}
group by 1

In this example, your selection of the payment_method column and summation of the amount column is the meat of your query. The from {{ ref('raw_payments') }} specifies the actual table you want to do the selecting from.

INSERT

Using the INSERT DML command, you can add rows to a table that exists in your database. To be honest, data folks are rarely inserting data into tables manually with the INSERT command. Instead, data team members will most often use data that’s already been inserted by an ELT tool or other data ingestion process.

You can insert a record in jaffle_shop’s raw_customers table like this:

INSERT INTO raw_customers VALUES (101, 'Kira', 'F.');

As you can see from this example, you clearly set all the column values that exist in your raw_customers table. For INSERT statements, you can explicitly specify the values you want to insert or use a query result to set the column values.

DELETE

The DELETE command will remove rows in an existing table in your database. In practice, you will usually specify a WHERE clause with your DELETE statement to only remove specific rows from a table. But, you shouldn't really ever delete rows from tables. Instead, you should apply filters on queries themselves to remove rows from your modeling or analysis.

For the most part, if you wanted to remove all existing rows in a table, but keep the underlying table structure, you would use the TRUNCATE DDL command. If you wanted to remove all rows and drop the entire table, you could use the DROP DDL command.

You can delete the record for any Henry W. in jaffle_shop’s customers table by executing this statement:

DELETE FROM customers WHERE first_name = 'Henry' AND last_name = 'W.';

UPDATE

With the UPDATE statement, you can change the actual data in existing rows in a table. Unlike the ALTER DDL command that changes the underlying structure or naming of database objects, the UPDATE statement will alter the actual row-level data. You can qualify an UPDATE command with a WHERE statement to change the values of columns of only specific rows.

You can manually update the status column of an order in your orders table like this:

UPDATE orders SET status = 'returned' WHERE order_id = 7;

Tip - The UPDATE statement is often compared to the MERGE statement. With MERGE statements, you can insert, update, and delete records in a single command. Merges are often utilized when there is data between two tables that needs to be reconciled or updated. You'll see merges most commonly executed when a source table is updated and a downstream table needs to be updated as a result of this change. Learn more about how dbt uses merges in incremental models here.

Conclusion

DML statements allow you to query, edit, add, and remove data stored in database objects. The primary DML commands are SELECT, INSERT, DELETE, and UPDATE. Using DML statements, you can perform powerful actions on the actual data stored in your system. You'll typically see DML SELECT statements written in data models to conduct data analysis or create new tables and views. In many ways, DML is the air that us data folks breathe!

Further reading

For more resources on why people who use dbt don’t write DML, check out the following:

For database-specific DML documents, please check out the resources below:

Last modified on: Oct 02, 2024

Build trust in data
Deliver data faster
Optimize platform costs

Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.

Read now ›

Recent Posts