Like our blogs?
Join our newsletter and get more blogs and news
Great Expectations Newsletter and Updates Sign-up
Hello friend of Great Expectations!
Our newsletter content will feature product updates from the open-source platform and our upcoming Cloud product, new blogs and community celebrations.
Please let us know what matters to you in regards to your use (or potential use) of Great Expectations below. We want to make sure that we keep you informed and notified only about what matters to you.
Data Governance vs. Data Quality: Where Do They Overlap?
Data Governance vs. Data Quality
Is data quality the same as data governance? When you’re managing data within an organization, you’ll often come across many different data-related terms. And let’s be honest: A lot of the concepts and principles in the data space can often seem vague. It’s not always clear what they describe and how they relate to each other.
As data experts, we know how tough it can be to pinpoint definitions for critical terms. In this post, we’ll take a look at all things data governance vs. data quality, and how you can better implement these concepts.
What Is Data Governance?
In the data quality vs. data governance debate, data governance is a broad term that describes the processes and systems used to manage data properly across an organization. The main goals of a data governance framework are to increase efficiency and maximize data value while reducing the risk of data quality and data security issues.
This post uses an excellent analogy to describe data governance: If data was water, data governance would ensure that the right people with the right tools are responsible for the right part of the water system.
Examples of Data Governance
Now that we’ve talked about data governance in the abstract, let’s look at some concrete examples of a data governance strategy. There isn’t a single go-to definition of all aspects of data governance, but there are best practices.
Key components of a data governance framework are:
Data usability: The extent to which data is relevant, understandable, accessible, accurate, and timely.
Data security: Protection against accidental or intentional corruption, loss, or destruction of data assets.
Data availability: A measure of the percentage of time that data is available for processing.
Data preservation: Processes to conserve and maintain the integrity and safety of data.
You can take those components and apply them across all kinds of use cases to safeguard data.
One example of data governance is how companies handle access to sensitive data, such as PHI and PII (Protected Health Information and Personally Identifiable Information):
This usually requires data governance procedures that allow only authorized individuals in an organization to access the data. It also involves technology solutions such as role-based access control in a database. Both play a crucial role in helping organizations avoid data breaches, reduce the risk of privacy concerns, and avoid fines.
This is by no means an exhaustive look at data governance. However, understanding these ideas will help you establish a foundation.
What Does Data Quality Really Mean?
At a high level, we can say that good data quality means that the data we're using is suitable for its purpose and allows us to draw correct conclusions from it. More specifically, we often think about the six dimensions of data quality:
Accuracy: Does the data accurately reflect reality?
Completeness: Is all the data that's required for the use case available?
Uniqueness: Is the data free from unwanted duplicates?
Consistency: Is the data free from conflicting information?
Timeliness: Is the data sufficiently recent for the required use case?
Validity: Does the data adhere to an expected format?
After meeting these criteria, we can say that our data is high-quality and suitable for use in analytical workflows. We can also consider the workflows required to accomplish high-quality data as part of our data quality strategy.
For example, assume you’re integrating data from an e-commerce system into your data platform. You would need to take these steps to ensure data quality.
1. Start data analysis and data profiling to determine any potential issues you may need to consider when creating data pipelines, such as missing or inconsistent values.
2. Make sure any known issues get taken care of in the data transformation and cleansing stage of a pipeline.
3. Confirm that the data coming in consistently meets your expectations of what it should look like by implementing data testing and alerting.
All these components come together in a data quality strategy to ensure high-quality data.
How Do Data Quality and Data Governance Overlap?
Data quality is both a goal and a crucial component of data governance. One could say the relationship between data governance and data quality is symbiotic. Confused? We’ve got you covered.
By ensuring high-quality data, we ensure that stakeholders can make decisions based on correct and reliable data. At the same time, implementing other components such as data monitoring, cataloging, lineage, documentation, and access control is crucial. Testing can highlight data issues, but you also need to reduce the amount of risk you expose data to.
Data cleaning can address data quality issues and testing can highlight data problems. However, these workflows don’t eliminate the reasons why data issues occur in the first place. Even if our data is clean and well tested, we need to prevent incorrect use of it through processes like documentation and access control.
And that’s where many of the other aspects of data governance come into play.
How Great Expectations Helps Ensure Data Quality
Great Expectations (GX), a data quality platform, allows you to assert what good quality data looks like for any given data asset that you're processing in your data pipelines. With GX, you can assert what you expect from the data you load and transform—enabling you to catch data issues quickly, maintain quality and improve communication between teams.
Here’s an example of how it works:
1. You start with an assertion about your data. We call these statements “Expectations”. For example, to assert that you want the number of rows to be between 1 and 6 you can say,
2. GX then uses this statement to validate whether the row_count in a given table is indeed between 1 and 6. The next time a data pull brings in a larger or smaller number of rows than expected, GX alerts you that this row count Expectation didn't pass.
3. Data engineers can then investigate to identify and fix the data issue. Or, they may investigate and discover that they need to change their expectations of the data (and subsequently their Expectations).
Great Expectations ships with several dozen pre-defined Expectations so data professionals can jump right in. And, if you don't see the Expectation you need in our gallery, you can create your own Custom Expectation.
GX also automatically generates Data Docs. These docs show the Expectations you define, as well as the state of your data after each validation run. All to help you operate at the intersection of data quality and data governance.
Bottom Line: Key Differences Between Data Quality and Data Governance
Remember: It’s not really data governance vs. data quality. Data quality is both a goal and a component of data governance. Implementing data quality measures such as data testing is part of a data governance strategy. Meanwhile, adhering to data governance principles is crucial to ensuring high-quality data.