Great Expectations Case Study: Avanade
This case study describes how the data team at Avanade, a professional services company providing IT consulting and services focused on the Microsoft platform, is using Great Expectations to address data drift issues in their pipelines.
October 08, 2020
Avanade is a global professional services company providing IT consulting and services focused on the Microsoft platform. The company is a joint venture between Accenture and Microsoft, with 39,000 employees working in 25 countries . The main user of Great Expectations at Avanade, the Intelligent Enterprise Team, is located within the IT department, which focuses on servicing internal stakeholders with data insights into all relevant areas of the organization.
Avanade uses data to drive and support operational decisions. For that purpose, the IE team collects and uses data from various sources, such as sales applications, HR systems, and collaboration data from their Office 365 platform. The team works with all departments in the organization, ranging from sales, to finance, HR, and marketing, and provides stakeholders with a platform for unified data insights, reports, direct data access, and Data Science expertise.
One of the main challenges facing the Intelligent Enterprise Team when integrating data from so many distinct sources and departments is the frequent change of upstream data models and taxonomies, which were at risk of going unnoticed in their Machine Learning pipelines.
“Within our organization, we constantly run into taxonomy changes and business units that realign. We need to be able to know that so we can retrain our models, but we’re not always informed in advance.”
Steve Nelson, Data Scientist at Avanade
One such issue was when the team noticed coincidentally that one of the top features feeding into their ML model had gone down to zero due to an issue “deep down” in the data warehouse, which would have severely impacted the model. Another example of data drift problems is the occurrence of outlier values that might have been introduced into the data as “dummy values” without the team noticing. The team evaluated another tool to identify feature drift, but decided that Great Expectations provided the most transparency for users to see what changed in their data, without hiding it behind opaque metrics. Another factor in that decision was the validation report output in Data Docs, which provides a convenient way to consume the validation output.
How the Avanade team uses Great Expectations
The Intelligent Enterprise Team relies on infrastructure based on a mix of Microsoft Azure cloud products and open source tooling, such as an on-prem SQL Server data warehouse, Azure Synapse, Azure Cloud Storage, Azure Data Factory, Azure ML Service, Power BI, Pandas, scikit-learn, and dbt.
To create Expectations, the team uses the scaffolding feature to automatically profile the data and create an initial version of an Expectation Suite, which is then cleaned up manually. They then validate the input data using those Expectation Suites in their Azure ML pipeline. Each step in the pipeline is followed by a validation step: First, the raw data is checked, and then the result of each transformation step is validated, too. The pipelines are configured to continue on validation failure, but they output an HTML table that contains an overview of which Expectations fail for which feature. The team is also planning to create a custom store for validation results, so that they can collect metrics on every validation run over time.
The Intelligent Enterprise Team reports that the biggest benefit of using Great Expectations has been the ability to catch data quality issues caused by upstream data changes before stakeholders notice.
We would like to thank the team at Avanade for their support in creating this case study!
Want us to publish your Great Expectations case study or add your logo to our site? Let us know here:
You should star us on Github