backgroundImage

Data catalogs & data quality: Using Great Expectations with data catalog partners

How GX partners with data catalogs to help data quality reach its full potential, with visible tests-as-metadata & more collaborative power.

Sarah Krasnik
October 20, 2022
Screenshot of a GX Validation Result which has found that 15.79% of values are unexpectedly not between 1 and 6
GX Data Documentation after new Validation Results

Data quality has only achieved its full potential when the data is well-understood and the analytics team can respond to failures quickly. That’s why Great Expectations is partnering with data catalogs to grow the visibility of tests-as-metadata and offer more collaborative power.

For data catalog users, Great Expectations offers a robust data quality process that gives data teams and their stakeholders confidence in the data populating the catalog.

GX stores Validation Results as static yet flexible Data Docs. Data Docs can be published directly from GX for standalone access, or they can be viewed in the data catalog itself using our range of integrations with data catalog providers.

Interactive data catalogs enhance communication

A business has data… and that data has its own data. As data is transformed and crosses team boundaries, challenges begin to arise: the consensus surrounding column definitions decreases, and the root origin of the data becomes hard to trace. 

Those issues snowball, their very existence perpetuating even more issues, until meaningful clarity about the data and metadata’s movement through the business is nearly impossible to achieve.

Data catalogs offer a solution to these challenges: a way to unify the metadata about how operational data moves through tools, transformations, and teams. 

But a data catalog is more than just lists of data assets. With features like column-level lineage, clear ownership, and collaboration across the entire business, a data catalog can provide a single platform to understand the entirety of the business’ data knowledge. 

With their bird’s eye view of upstream business logic, interactive data catalogs allow organizations to assign ownership to debugging failures and make troubleshooting easier.

And, in a key capability for data quality, interactivity paves the way for ensuring that decision-makers only use data that meets predefined expectations.

Great Expectations shares insight

GX has lightweight static data documentation built in. Our goal is to make it easy to share validation results to other places—like data catalogs.

After validations run, Data Docs show which tests succeeded and failed as well as display the specific failed values. This sheds light on the specific issues found in data in motion.

GX Data Documentation after new Validation Results

GX allows data catalogs to hook into specialized workflows for creating and updating Expectations. This integration means data catalog users can easily access the vast vocabulary available in Expectations and Expectation Suites.

How to use Great Expectations and data catalogs

Data quality stays top-of-mind when the results are in a place where both analytics teams and stakeholders frequently access information.

By adding a data testing and quality layer to a data catalog tool, GX users can make sure that all of the catalog’s users know the data is reliable.

Ongoing data testing is needed because data is constantly being interacted with by decision-makers, either via dashboards and analytics or through operational tools like CRMs or marketing platforms.

A data quality failure shouldn’t leave the analytics team feeling like a deer in the headlights. And with GX and a data catalog, they won’t.

Instead, analytics teams can have a repeatable plan of action:

Identify the issue. This is where the static documentation GX provides is most helpful. The GX homepage shows the most recent tests and the most recent failures, complete with example values.

Identify the root cause. Data catalogs show lineage, which can be used to track the downstream data assets that haven’t updated yet. Stakeholders can take timely action to prevent further ripple effects from the issue.


Implement a fix. After the source of the issue is identified, comments can be documented in the data catalog and tasks assigned to those responsible for fixing the issue. With swift action, pipelines can continue running, and GX Data Docs will reflect successful fixes as soon as the next Validation completes.

Integrations with our data catalog partners

Great Expectations integrates with a range of data catalogs, making it easy for users to:

  • Add a data quality layer to their knowledge repository.

  • Showcase their data quality journey.

  • Improve their entire organization’s understanding of data.

Great Expectations is fast becoming the de facto standard for data quality. By integrating GX with a data catalog, companies can quickly upgrade their data quality capabilities.

Here’s an overview of some of our data catalog integrations:

Atlan

This active metadata platform uses Great Expectations as a profiling engine for its users. Leveraging the GX batch metrics store and Data Assistants, Atlan can give its users out-of-the-box insights about what shape their data should take. 

The Atlan and GX teams are working closely together on even deeper integrations, due later this year.

DataHub

Brought to life by developers at LinkedIn, DataHub is an open source metadata management platform. It automatically surfaces validation outcomes from Great Expectations in its UI, with validation management through the API.

DataHub treats GX assertions as metadata events and as documentation, which means that users can automatically circuit-break orchestration pipelines when an Expectation fails.

Secoda

Data discovery app Secoda brought Google-style searching to metadata, queries, charts, and documentation management. When integrated with Great Expectations, Secoda shows all active GX validation tests, plus an overview of the results.

Secoda users who have integrated with GX can automatically take actions like unpublishing data assets that fail a quality test.


Do you want your favorite data catalog to integrate with Great Expectations? Let us know!

Search our blog for the latest on data management


©2024 Great Expectations. All Rights Reserved.