dbt
Blog How university data teams migrate to the cloud

How university data teams migrate to the cloud

Sep 11, 2024

Learn

Not all data challenges are the same. While there are certain universal principles that apply to all contexts, different institutions and industries face challenges when migrating from on-premises data stacks to a cloud-based approach.

As a solutions architect at dbt Labs, I’ve dealt primarily with banking and anti-fraud use cases. I was curious: what challenges do universities face when performing cloud data migrations? How do the structures of ‌institutions influence how they structure their day-to-day data lives?

To find out, I recently met with three data team leaders to talk about these topics. These leaders are Sarah Taylor, lead data engineer at RMIT University, Darren Ware, senior data engineer at RMIT University, and Cameron Mayer, head of data analytics at the University of Canterbury.

I talked with each leader about how they structure their teams, what motivated them to move from legacy data systems into a dbt Cloud-based data stack, and the challenges they faced along the way. I also drilled into what each team looks for when hiring new data engineers in this post-on-prem world.

Data team structure in large universities

I kicked off by asking each leader how they structure their teams. Sarah related that, in her case, she came back from doing post-doctoral work to find that RMIT had centralized most everything into the data analytics area. The team was taking its initial steps to working with dbt and working out their conceptual data models.

Being a large team, RMIT’s data analytics team has the ability to specialize. “We have specialists in data management. We have stakeholders specialists - i.e., their only job is to talk to stakeholders.” On top of that, it breaks up its day-to-day work into squads or guilds. All of these guilds come together one day every sprint to share common knowledge across their separate data domains.

Darren echoed the importance of Guild Day, especially in a school the size of RMIT. “The risk in a large organization is that people don’t talk to each other.” These days enable experts to come together not just to share knowledge of common problems but to get second opinions on unique challenges their teams might be facing that week.

The University of Canterbury, being a smaller team, works differently. “Our entire data team is probably the size of one of RMIT’s squads,” Cameron related. So instead, they moved everyone from specialty areas onto a single team responsible for managing migration of student data. As a result, members of the smaller, close-knit team are all able to tackle issues across a variety of data domains.

From the old stack to the new stack

Moving technology stacks is always a process fraught with trial and error. Processes that worked brilliantly at the start, for example, might fall apart as you scale up.

In moving from the legacy systems onto dbt Cloud and Snowflake, the University of Canterbury team found that the main struggle was finding all of the relevant data from disparate legacy systems. “Along the way,” said Cameron, “we had to work out which bits we actually need and which are no longer relevant.”

The major benefit of the migration, said Cameron, is speed. A cloud-enabled stack means that the data engineering team can spin up new compute on demand, when they need it. That enables them to process data pipelines faster.

Previously, deans and other business stakeholders didn’t understand how long it took to move data. They couldn’t understand why they’d wake up at 7 am and their Power BI dashboards for admissions hadn’t updated.

It was even worse if a data pipeline failed, as that generally meant waiting for the job to run the next day. “That’s when you get particularly angry phone calls from your boss’s boss,” he said.

For RMIT, one of the biggest reasons for migrating is increased resilience. Sarah related how their prior data pipelines ran on Python code that could take hours to execute. If it failed, you had to run it all over again. The system became brittle, and engineers feared making changes.

By contrast, in dbt Cloud, the team now has data models checked into source control. Engineers can run incremental builds against different environments. That enables them to test changes and run experiments without fear of bringing the entire system down.

“I love having this moment with new starters where you go, come on, you can run it, you're not going to break someone else's work,” Sarah said. “And they just go, really?”

Darren said he started at RMIT in a data management role. Whenever he needed new data, engineers would ask him what specific table and fields he needed from Snowflake—information he didn’t have. The university had no easy-to-use mechanisms for data discovery.

By contrast, data models, security classifications, and role-based access control in dbt Cloud mean that if a data manager has the permissions to see a table, they can find it via dbt Explorer. This provides a new level of visibility into data that RMIT previously didn’t possess.

Prioritizing work

Another challenge when converting from legacy systems to a cloud-based system is how to prioritize work. Particularly, how do the teams at RMIT and the University of Canterbury balance short-term and long-term priorities?

Darren said one of the more difficult sticking points is setting expectations around data migration. Many of RMIT’s legacy systems have, as he put it, “crappy data models.” Often, stakeholders will come in demanding that the team lift-and-shift that data into dbt as is—and get it done yesterday, please.

To help manage this, RMIT has set up a multi-tier system for transforming data. Initially, data goes into a staging layer, where the team performs some light transformations. After that, it goes into data marts, where all data must conform to the institution’s conceptual data model and two-dimensional modeling standards.

This approach means that RMIT has a single location that acts as a trusted one-stop shop for all university data. That enabled them to standardize on concepts such as the student management lifecycle.

RMIT also uses projects for pre-mart data that a specific group requires for reporting. Projects act almost as a “cultivated drafts” area, where owners can start thinking about them and experimenting with different modeling approaches. Over time, the team can rework the data in a project in an agile manner, with the eventual goal of making them a part of the centralized data mart. Eventually, the projects transition to using the data mart models, where data is guaranteed to be reusable.

Life with dbt Cloud

RMIT has found that, as their usage grows, they’re pleasantly surprised to see how well it works with their teams and the way they work with data.

“It’s kind of grown on us,” said Darren. “Usually, you get a sales pitch and you’re like, this is amazing. But then you use it and it doesn’t do this, or it doesn’t do that, and you’re like, My God, what have we done? With dbt Cloud, it’s been the complete opposite. We’re like, oh wow - we can.”

“We know one product alone isn’t going to make things work just by itself,” said Sarah. “We’re all reasonably jaded about, okay, here’s this new thing. And I was like, hey, this new thing is actually very useful because it fits with how we work.”

Cameron echoed these thoughts, saying there was a learning curve with dbt Cloud—but not a steep one. “All of our engineers have picked it up quickly. They all legitimately rave about it.” Cameron also applauded dbt Cloud’s cost-to-performance ratio compared to competitors' tools.

All three leaders reiterated that the visibility provided by dbt Cloud has proved one of its greatest assets. Instead of “walking on eggshells” (as Sarah put it) when making a data model change, engineers can instead look at dbt Cloud, see who uses a given data model, and engage their stakeholders early about changes and their potential impact.

The cloud data skills you’re looking for

I was curious about what each team looks for when hiring data engineers now that they’ve shifted into a cloud-based data stack. Are they looking for different skills than they did before?

“I probably wouldn’t look at someone who didn’t know dbt and Snowflake,” said Cameron. Given the tight-knit nature of his team, he argued it makes sense to hire someone who can speak the same language as his other engineers.

“There’s a certain amount of on-the-job training you can do,” he said. “But on a small team, if someone’s training someone else, you’ve lost 30% of one experienced person’s workload for a month.”

Sarah and Darren related that things are different at RMIT. As a larger university and data department, they’re better able to provide training and ramp-up assistance. Sarah said she looks for base skills (“SQL is non-negotiable”) so that she can focus on teaching engineers about running jobs, managing pull requests, and the particulars of life at RMIT.

Darren said the university frequently takes on interns from RMIT’s software development courses. The internships last over the course of a year and occasionally convert into full-time jobs.

With interns, said Sarah, it’s great to have access to a pool of junior engineers who know software engineering fundamentals and who also have experience dealing with real-world data. “They’re happy to ask questions and call things out. That can be a hard quality to instill if it’s not already there.”

Conclusion

Both teams still—and will continue—to support legacy data systems as part of their data estates. “That’s the nature of a university,” Sarah says.

However, moving more of their workloads to the cloud has enabled both institutions to provide their stakeholders with greater and more timely visibility into data. And all of them agreed that dbt Cloud has made the shift, not just easier, but enjoyable.

Last modified on: Oct 15, 2024

Build trust in data
Deliver data faster
Optimize platform costs

Set your organization up for success. Read the business case guide to accelerate time to value with dbt Cloud.

Read now ›

Recent Posts