backgroundImage

First State of Data Quality Report uncovers a divided house

Our landscape study of data teams finds gaps in collaboration and understanding

Erin Kapp
January 25, 2023
A graphic of the cover and several interior pages from GX's State of Data Quality 2022 report

All of the information in our first State of Data Quality report comes from you: the data developers, engineers, analysts, and scientists that Great Expectations is built for!

Our goal in this first edition of the report was to identify and quantify how the standard tropes around data quality are playing out in the real experiences of data practitioners. To find that out, we surveyed 500 data practitioners from across the United States and supplemented that with a survey of the GX Slack community last year.

Here are some of our findings:

  • Is data quality a problem? Definitely: Data practitioners reported seeing symptoms of poor data quality all throughout the data lifecycle. “Data scientists spending too much time on data preparation” is a well-worn trope, but more than a quarter of respondents (26%) cited this as a top symptom of poor data quality. The most-reported symptom was production or product launch delay.

  • Is everyone on the same page? No: 55% of data engineers and 50% of data scientists rated their trust in their data as “high,” but only 34% of analysts felt the same.

  • Are processes in place? Sometimes: There was a roughly equal split between respondents who indicated active data quality processes were in place and respondents who have planned but not implemented or are not planning data quality.

  • Does everyone agree on what data quality means? Nope: Three-quarters of respondents indicated that they were currently performing data validation… including many who said they didn’t have active data quality.

  • Is the data landscape settled? No way: Respondents were planning an average of 2.5 new data initiatives per person, just in the next quarter. While in aggregate Java was the language of the data team, Python, SQL, Scala, and R all see plenty of use. And a survey of data stack technologies indicated a wide range of stack compositions.

In the report, you can read our full findings, including our analysis of some of the seemingly contradictory responses. Preview: we think a lack of measurement of capabilities and lack of alignment are huge contributing factors.

Download your copy of the report here.

To stay up-to-date with all of our data quality news and updates, sign up for our newsletter.


GX's State of Data Quality Report 2022 was authored by Abe Gong (CEO) and Erin Kapp (technical marketing writer).

Search our blog for the latest on data quality.


©2024 Great Expectations. All Rights Reserved.