Don’t panic! Prefect and Great Expectations have got your data quality covered
We’ve been cooking up another fantastic collaboration with an open source project over the past few months! Prefect is an open source workflow management system that makes it easy to take your data pipelines and add semantics like retries, logging, dynamic mapping, caching, failure notifications, and more. While it was already possible to run validation with Great Expectations in a simple Python task, you can now use a more convenient pre-built
RunGreatExpectationsValidation task in Prefect to kick off your data validation! Read on for more details, or jump straight to the Prefect docs to get started.
How does Great Expectations integrate with Prefect?
Prefect provides a task Library that includes common task implementations and integrations with other tools in the data engineering ecosystem such as Kubernetes, GitHub, Slack, Docker, AWS, and GCP. The integration with Great Expectations adds another task to this library: The
RunGreatExpectationsValidation task. The task allows you to run validation with Great Expectations in one of the following ways:
- Using batch_kwargs that define your dataset, and an Expectation Suite
- Using a list of those batch_kwarg and Expectation Suite pairs
- Using a pre-configured Checkpoint, which bundles an Expectation Suite and a set of batch_kwargs
If this sounds familiar, you’re absolutely right: Prefect already had a Great Expectations task for quite a while that was specifically implemented to run Checkpoints, but this time we teamed up with the Prefect folks to develop a new version of the task that supports several ways of validating your data. It also comes with another exciting feature: In addition to running validation and storing the validation results in your pre-configured validation store, the new task also renders the results to markdown and displays them directly in the Prefect UI. See the above screenshot for an example! This markdown rendering support is general-purpose functionality of Prefect, with the Great Expectations task being the first one that takes advantage of it! You can read more about Prefect's Artifacts API here if you're curious.
Where can I learn more?
In order to get started with Prefect, check out the Prefect docs. To learn more about Great Expectations, check out our Getting started tutorial! And if you’re already familiar with Prefect and Great Expectations and want to start using the
RunGreatExpectationsValidationTask, hop over to the Prefect docs!