Analytics engineering is a role that emerged in recent years as data has become more integral to an organization’s day-to-day operations. This shift was undeniably spurred by the rise of cloud-based data platforms, and data engineering teams soon realized they needed a more standardized, scalable, and collaborative way to build and manage data pipelines. cloud. Analytics engineers occupy the space somewhere between traditional data engineering and business analytics teams, working with data in warehouses using tools like dbt and Snowflake.
This new role contributes significantly to the data stack. But since it’s so new, many people still aren’t clear on its exact function. In this article, we will look at what analytics engineering is, what analytics engineers do, and the tools they use.
How analytics engineering began
With the growth of cloud computing, many organizations now use fully hosted data cloud systems. The separation of storage and compute costs have democratized cloud data warehousing to organizations of all sizes,but this ubiquity introduces new struggles.
Extracting value from data requires analytics. In response, organizations developed pipelines for building dashboards and views that inform business decisions.
However, the sudden expansion meant that the architecture of systems was ad hoc. Especially as business teams became more interested in self-service data, different teams ended up with different datasets and metrics.
Additionally, a growing concern became the quality of the data informing those decisions. Many data pipelines were developed in an ad hoc fashion, without much visibility into the provenance of the data. Data quality issues also meant that data pipelines would often break, resulting in broken and outdated reports.
Analytics engineering emerged naturally as data teams looked for solutions to these issues. Rather than having data integration, cleaning, query design, and metric definition as discrete workflows that were owned by different teams, analytics engineering as a practice considers these actions as part of a holistic workflow, with the end goal of delivering transformed, high quality data to downstream stakeholders in a scalable and standardized way. Instead of analysts and business teams interacting with raw data, or waiting on the data engineering team to build new pipelines and data cuts for their specific use case, the analytics engineer builds a baseline organized framework of cleaned tables, common queries, and well-defined metrics.
The role of an analytics engineer
Analytics engineers connect data systems and analytics teams. Their exact duties depend on their organization's data pipeline and use cases, but generally, their duties include:
- Taking raw data and transforming it into clean, usable data
- Deploying and automating transformation pipelines
- Testing and documenting data pipelines
- Organizing query architectures
An analytics engineer aims to bridge the gap between the data engineers who architect data platforms and extract and load data into them and business teams who want to access data. An effective analytics engineer focuses preparing, documenting, and testing data while the data engineering team focuses on managing pipelines ands optimizing performance.
For example, a data engineer would build out the tools required to build and maintain an effective data pipeline. This would involve not just maintaining the underlying data storage, but setting up tools to support performing data transformation, exploring data lineage, and running data pipelines.
Since data engineers build core tooling, they may not understand a specific team’s data use cases in detail. That’s where an analytics engineer comes in, creating the transformations, queries, and data sets a specific business requires to support data products and drive decision-making.
Analytics engineers also establish a framework of prepared, high-quality tables ready for use. This framework sets analytics teams up for success since they can skip data wrangling and hit the ground running on new projects. It also enables business teams to self-serve views and metrics since fundamental definitions have already been developed and computed.
Tools that an analytics engineer uses
Analytics engineers don’t always have this specific title. However, you can often identify them by the tools they use. There are a few primary tools analytics engineers use to connect data and analytics teams:
- Data warehouse platform: A storage and computation system where data is stored, structured, and manipulated. Often cloud-based, these tools include cloud storage systems like AWS and cloud computing systems like Snowflake, BigQuery, Databricks, and Redshift.
- Data transformation system: These are tools such as SQL and dbt that analytics engineers use to develop and implement queries. These systems allow complex query structures to be tested, deployed, and updated in an organized way.
- Version control system: analytics engineers embrace software engineering best practices, and so they use systems like Git to continuously integrate and version control analytics code.
How to tell if your team needs an analytics engineer
Now that you understand what an analytics engineer does, you may be wondering, “Does our team need one?”
Here are a few signs that hiring an analytics engineer would benefit your organization.
Teams want to self-serve data
More teams need the ability to spin up their own reports on demand in response to business needs. For example, the marketing team might want to know how many contacts through a contact form came from paid media.
Easy-to-use BI tools have made this simpler than ever. But those tools won’t do any good if the underlying data isn’t there. In the past, getting the right data often meant making a request to the data engineering team and waiting days - or weeks, or months - to reach the top of their queue. What’s more, managing these queries in a BI tool isn’t scalable and leaves your decision making prone to errors, inconsistencies, and bottlenecks.
With an analytics engineer, you can create a firm foundation for all current and future reporting your team requires in a fraction of the time. Your analytics engineer can focus on the data while your business personnel focus on…well, the business!
Problems with data quality
It’s not uncommon that teams who do start pulling their own data consistently find analytics have data quality issues. At best, this can delay making critical business decisions. At worst, you may end up making decisions based on inaccurate data.
Data quality issues can also cause a data pipeline to break. For example, if a column gets renamed upstream from a dashboard, you could spend days or weeks with an broken tool until someone locates the conflict.
An analytics engineer has the tools and skills to inject data quality checks throughout the transformation process. Since they’re dedicated to your team, they understand your data and the needs of your business. That means they can work closely with you to ensure the data is always correct and in the format you need.
Competing definitions
A common issue that impairs data trust is when different teams—or even people within the same team—get different metrics for the same query. For example, the accounting and marketing teams may come to a board meeting with a different representation of revenue in the current quarter. This can be due to using varying definitions (one group includes churn, the other hasn’t accounted for that yet; one group may have failed to include returned items as part of their definition, and so on), various tools to query that data (one group uses PowerBI, and the other Tableau), or other factors. These discrepancies undermine an organization’s ability to embrace data-driven decision making. By adopting a modern approach stewarded by analytics engineers, these definitions can be built and maintained centrally, in a tool such as the dbt Semantic Layer, and downstream teams can query them from their analytics tool of choice.
Data engineers are overloaded
A huge reason for the emergence of analytics engineering was to address bottlenecks that naturally emerge when a very small subset of people—data engineers—have full responsibility for not only architecting data platforms, integrations, and transformations, but also servicing ad hoc requests from downstream stakeholders. Given the varied and exploding use cases for data, it’s not uncommon for data requests to take weeks, months, or even quarters to complete. Simply put, there was too much work to do, and not enough people with the skills and tooling to do it. If data-related requests are taking longer and longer for your data team to process, that’s a good sign they could use some help.
Sending all data requests through data engineering can also result in employee burnout and turnover. As we noted above, the data engineering team may be serving dozens of teams, and without a proactive way to keep up with requests, build on their work, and ensure data quality, they are liable to leave your organization in search of a more manageable work environment.
Bringing on an analytics engineer can take an enormous load off of the data engineering team’s shoulders. Your analytics engineer can focus on maintaining specific datasets, ensuring data quality, and interacting with the teams that consume analytics. That enables your data engineers to focus on the underlying data architecture, integrations, and maintenance.
Further resources for analytics engineers
Analytics engineers fill a valuable role in modern analytics workflows, connecting the data warehouse backend to the analytics and business intelligence insights that allow organizations to build competitive advantage with data . To learn more about analytics engineering, check out this introductory series.
Great analytics engineers also need great tools. Looking for a tool that makes the life of an analytics engineer easier? Book a demo of dbt Cloud to see how you can enable data transformation easily across your entire organization.
Last modified on: Oct 15, 2024
Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.