Leading innovation across many, many industries
Employing 300,000+ people across the world
For the last 180 years, Siemens has led innovation across diverse industries—from healthcare to infrastructure. In the mobility business alone, Siemens produces the software and machinery to design and build cars, as well as manufacturing car batteries and charging stations.
An immense, complex, on-premises data infrastructure
Having operated for so long, Siemens found itself internally dealing with the consequences of a legacy, monolithic data infrastructure. The company used an on-prem HANA system, which was difficult to use, maintain, and grow.
The Siemens Data Cloud (SDC) project
Siemens started by defining the requirements for their data infrastructure. After reviewing the existing architecture, they landed on a new direction: an open ecosystem to unify all data products—from business intelligence to machine learning—and share those data products across the 70,000 internal data consumers.
Building for scale with data mesh
Given Siemens’s scale, the data mesh framework was chosen as a compass to help the company achieve its data vision. It would change how teams access data assets and participate in data development—leveraging the thousands of Siemens’ domain experts to increase data velocity.
Previous way of working
- Internal users locate the correct owners and manually request access to needed data assets—a process that can take weeks.
- BI and AI data products are created in silos and do not share governance, sources, or metrics.
- Data products are created for 1 team and often used only once.
- Data products are created in silos by data engineering and analysts, producing inconsistencies in metrics, duplicated logic, and code few can understand.
- Data pipelines are centralized and exported consecutively. This leads to slow load times and unstable pipelines where one break can affect all data products.
Data mesh way of working
- Internal users self-serve data needs with a semi-automative process; access to raw and modeled data assets is granted within minutes.
- BI and AI data products are created within the same data stack and can share data sets.
- Data products are stored in accessible locations and repurposed by multiple teams.
- Data products are created with testing, documentation, continuous integration (CI), and observability—improving data quality and decreasing duplicated work.
- Data pipelines are loaded in parallel. There is workload isolation and, therefore, improved stability.
Siemens first assessed if it could use the existing central stack for a data mesh setup. However, the team realized a “lift and shift” approach would fall short due to:
- Hardware capacity: Siemens could not keep up with the hardware needed to run the data mesh infrastructure on-premises.
- Expensive fixed costs: Costs for HANA were federated with limited consumption-based billing—leaving little opportunity for efficiency gains.
- Not built for purpose: Unlike on-premises ETL solutions, a decentralized cloud-based stack is built for expansive organizations like Siemens, offering better capabilities for managing high-volume data.
Delivering the SDC vision with dbt Cloud and Snowflake
To achieve its data mesh goals, Siemens needed an intuitive interface for internal users to discover, access, and model data while ensuring data governance. dbt Cloud, with its accessible browser-based IDE and shallow learning curve, was picked as the UI for data modeling in SDC.
Migrating from on-prem to the cloud with Accenture and dbt Labs
The data stack for SDC needed to be an open architecture, to enable and facilitate any future migrations or modernization projects. The two tools chosen as the backbone of the SDC enabled this flexibility: “With Snowflake and dbt, you’re agnostic to cloud providers or BI tools,” explained Tobi Humpert, Product Owner of SDC.
To set up their dbt project following best practices, the Siemens team recruited the help of Accenture and dbt Labs’ services. Data teams completed dbt Labs-led onboarding sessions and group training and leaned on the support of a dedicated dbt Labs Resident Architect. Siemens also leveraged their internal Learning Management System (LMS) and dbt Labs-produced content to encourage dbt training—all while tracking dbt learning and development across unique company divisions.
Setting up the infrastructure for data mesh with dbt Mesh
As a customer with large scale and complexity, Siemens participated in the closed beta of dbt Mesh: a set of features that enable companies to implement and maintain a data mesh infrastructure. This empowered Siemens to bring their new way of working to life with:
- Multi-project discovery: dbt Explorer provided the data team with complete lineage across hundreds of decentralized projects.
- Data contracts: dbt model contracts offer an additional validation layer for production code to govern data quality and prevent downstream issues.
- Federated governance: dbt’s built-in governance features enable the SDC team to define which datasets should be shared with the wider Siemens data community, and which should only remain within a certain project.
Reaping the benefits of a modern cloud-based Data Mesh infrastructure
Democratized data access decreases time-to-value
To increase data participation across the organization, Siemens launched the Siemens Data Cloud Marketplace where all employees can browse and purchase data products, such as platform components and ML models. The Marketplace enables stakeholders to reutilize assets created by other teams, reduce duplicated work, and transform data teams into profit centers.
All data products live in one Snowflake account, managed by the central IT department. Employees request access to a Snowflake project, which is a bundle of Snowflake, dbt Cloud and git access—a process automated end-to-end. Within minutes, analysts can self-serve and access the data they need to build data products on top of it:
“Already in our first dbt Cloud project we were amazed by the seamless collaboration dbt Cloud offers, allowing us to effortlessly work together on the same Snowflake project. With built-in tests, simple job scheduling, and easy deployment, dbt Cloud enabled us to immediately focus on the business case rather than spending time on our data architecture setup,” said Rebecca Funk, IT Business Partner at Siemens.
The new workflow eliminated the corporate IT bottleneck causing delays in data deliveries. And, most importantly, it decentralized data production by enabling domain experts to own a much larger portion of the data development process:
“All teams are now working in the fields of their expertise, which empowers them to experiment within their data domains and innovate,” explained Tobi. “This set-up enables teams to scale independently, with faster development cycles and decreased risk of failure.”
Centralized governance complies with security and privacy standards
dbt Cloud’s federated governance works behind the scenes to enable Siemens’ domain teams to self-serve and participate in data transformation:
“The barrier to entry is low because end users don’t need to worry about contracts, control, or security. That’s already all defined centrally in dbt,” explained Tobi. The Central IT team increased data velocity without compromising on security by:
- Defining contracts and access in dbt: User access is managed with row-level security and by publishing datasets into a distribution layer within Snowflake, all implemented on dbt. Meanwhile, contracts guarantee new code doesn’t affect downstream data.
- Using a singular Snowflake dbt account: All stakeholders’ data sources and transformations live in a central place where IT can set guardrails, audit, and visualize the full lineage with dbt Explorer.
Increased efficiencies and 90% cost savings
The migration to the cloud enabled Siemens to move away from a fixed-cost structure to transparent usage-based pricing. In particular, dbt’s incremental materializations led to substantial efficiency gains, as only the latest data available is now loaded, as opposed to full table loads.
“In our on-premises stack, data for our business analytics dashboard would take six hours to load every day,” explained Nuno Pinela, Data Engineer at Siemens. All 35 ERP systems that fed the dashboard had to load sequentially. “With dbt, if a model is not dependent on another, they automatically run in parallel.” Today, that same data is loaded in 25 minutes, leading to a 90% reduction in costs.
A successful migration
As of February 2024, over 700 projects have been migrated to the Siemens Data Cloud and dbt Cloud within a year and a half. Siemens has over 600 developers onboarded to dbt Cloud, using a single dbt instance, who are maintaining 5,000+ active dbt models. The company celebrated phasing out their legacy SAP HANA system with a global party spanning 5 physical locations.