A powerful, flexible data quality solution

Profile, test, and document with an open source platform developed by data engineers, for
data engineers.

oss-banner-color

The toolset for data confidence

Gears icon

Seamless operation

GX fits into your existing tech stack, and can integrate with your CI/CD pipelines to add data quality exactly where you need it. Connect to and validate your data wherever it already is, so you can focus on honing your Expectation Suites to perfectly meet your data quality needs.

Rocket icon

Start fast

Get useful results quickly even for large data volumes. GX’s Data Assistants provide curated Expectations for different domains, so you can accelerate your data discovery to rapidly deploy data quality throughout your pipelines. Auto-generated Data Docs ensure your DQ documentation will always be up-to-date.

Lightbulb with arrows icon

Unified understanding

Expectations are GX’s workhorse abstraction: each Expectation declares an expected state of the data. The Expectation library provides a flexible, extensible vocabulary for data quality—one that’s human-readable, meaningful for technical and nontechnical users alike. Bundled into Expectation Suites, Expectations are the ideal tool for characterizing exactly what you expect from your data.

Security shield icon

Secure and transparent

GX doesn’t ask you to exchange security for your insight. It processes your data in place, on your systems, so your security and governance procedures can maintain control at all times. And because GX’s core is and always will be open source, its complete transparency is the opposite of a black box.

Paper with checkmark icon

Data contract support

Checkpoints are a transparent, central, and automatable mechanism for testing Expectations and evaluating your data quality. Every Checkpoint run produces human-readable Data Docs reporting the results. You can also configure Checkpoints to take Actions based on the results of the evaluation, like sending alerts and preventing low-quality data from moving further in your pipelines.

People in a circle icon

Readable for collaboration

Everyone stays on the same page about your data quality with GX’s inspectable, shareable, and human-readable Data Docs. You can publish Data Docs to the locations where you need them in a variety of formats, making it easy to integrate Data Docs into your existing data catalogs, dashboards, and other reporting and data governance tools.

Ready to jump in?

Getting started with GX OSS is fast and easy. This tutorial walks you through how to set up your first local deployment of GX and shows you how to validate some sample data. Plus, with pre-built workflows and automated checklists from our community-driven Expectations Gallery, you’ll see value in minutes.

Extensible and flexible

GX works with the tools you know and love

Built on the strength of our ever-growing community

Our community is an inclusive space for data practitioners who want to improve data collaboration. With more than 9,000 data practitioners worldwide who have contributed to over 300 Expectations, our Slack community is the best place to get support from us and others working to maintain data quality at their organizations.

community-color

FAQs

  • What makes Great Expectations unique?

    One of the ways in which GX is unique among data quality platforms is its use of Expectations. An Expectation is an abstraction that allows users to express nearly any assertion about the data in a human-readable form. 

    Expectations are the cornerstones of core GX features, including Checkpoints and Data Assistants: they provide a rich vocabulary for expressing data quality.

  • What do I have to install to use Great Expectations?

    GX requires a minimal setup, and many data practitioners are likely to already have the required tools installed.

    Great Expectations is built in Python, and requires a Python environment; a virtual environment is recommended as a general best practice. GX is installed using the pip package manager, and its CLI frequently uses Jupyter Notebooks to provide you with boilerplate code and assistance. We recommend using Git or another version control system, but this is not required.

    Other than these basic tools, GX requires only a supported Execution Engine such as Pandas, Spark, or SqlAlchemy. The external system containing the data that you want to validate must be accessible via a standard interface such as a Pandas/Spark DataFrame, SQL, or filesystem.

  • How does Great Expectations support data contracts?

    GX’s Checkpoints and their generated Data Docs create inspectable artifacts that describe the alignment between data users. Data Docs in particular are a way to ensure that nontechnical data users have an accessible avenue for understanding the Checkpoint operation and through that the contents of the data contract.

  • What is your data privacy policy? How does Great Expectations handle GDPR/CCAP and other data privacy laws?

    With Great Expectations, your data never leaves your environments. All computations on your data are done by your DB, your Spark instance, and/or your local machine that is running GX.

    In other words, the privacy and compliance protocols you have set up for your existing technology remain in force when you deploy GX on those systems.

    For more information, see the GX privacy policy.

Join us as we work toward a shared open standard for data quality

GX mark CTA
©2023 Great Expectations. All Rights Reserved.