Key Features

Expectations

Expectations are assertions for data. They are the workhorse abstraction in Great Expectations, covering all kinds of common data issues.

Expectations are declarative, flexible and extensible.

  • expect_column_values_to_not_be_null
  • expect_column_values_to_match_regex
  • expect_column_values_to_be_unique
  • expect_column_values_to_match_strftime_format
  • expect_table_row_count_to_be_between
  • expect_column_median_to_be_between
ooooo ahhhh

Tests are docs and docs are tests

This feature is in beta

Many data teams struggle to maintain up-to-date data documentation. Great Expectations solves this problem by rendering Expectations directly into clean, human-readable documentation.

Since docs are rendered from tests, and tests are run against new data as it arrives, your documentation is guaranteed to never go stale. Additional renderers allow Great Expectations to generate other type of "documentation", including slack notifications, data dictionaries, customized notebooks, etc.

Automated data profiling

This feature is experimental
Wouldn't it be great if your tests could write themselves? Run your data through one of Great Expectations' data profilers and it will automatically generate Expectations and data documentation. Profiling provides the double benefit of helping you explore data faster, and capturing knowledge for future documentation and testing.

Automated profiling doesn't replace domain expertise—you will almost certainly tune and augment your auto-generated Expectations over time—but it's a great way to jump start the process of capturing and sharing domain knowledge across your team.

ooooo ahhhh
ooooo ahhhh

Batteries-included data validation

Expectations are a great start, but it takes more to get to production-ready data validation. Where are Expectations stored? How do they get updated? How do you securely connect to production data systems? How do you notify team members and triage when data validation fails?

Great Expectations supports all of these use cases out of the box. Instead of building these components for yourself over weeks or months, you will be able to add production-ready validation to your pipeline in a day. This “Expectations on rails” framework plays nice with other data engineering tools, respects your existing name spaces, and is designed for extensibility.

Pluggable and extensible

Every component of the framework is designed to be extensible: Expectations, storage, profilers, renderers for documentation, actions taken after validation, etc. This design choice gives a lot of creative freedom to developers working with Great Expectations.


Recent extensions include:

  • Renderers for data dictionaries
  • BigQuery and GCS integration
  • Notifications to MatterMost
  • We're very excited to see what other plugins the data community comes up with!

Quick Start

pip install great_expectations

great_expectations init

We recommend deploying within a virtual environment. If you’re not familiar with pip, virtual environments, notebooks, or git, you may want to check out the Supporting Resources section to set up your environment.

View our full documentation

Ready to dive in and start implementing? Head to our docs to take the next leap

learn more

See our progess on GitHub

We keep our GitHub issues update with what we are working on while addressing our communities issues.

learn more

Find help on Slack

Feel free to ask us a question on slack! There are always contributors and other users there.

learn more
discuss-logo

Join us on Discuss

A place where you can find and share advice on GE implementing and keep up on some of the more cutting edge developments on Great Expectations

learn more

Trusted by

These companies have joined us in the battle against pipeline debt and so should you!

Integrations

 Some integrations are not yet fully tested and documented.
Please reach out on slack with questions. If you're feeling really motivated, you can help us make our integrations better by contributing!

  • img

    Pandas

    Great for in-memory machine learning pipelines!

  • img

    Spark

    Good for really big data.

  • img

    Postgres

    Leading open source database

  • img

    BigQuery

    Google serverless massive-scale SQL analytics platform

  • img

    Databricks

    Managed Spark Analytics Platform

  • img

    MySQL

    Leading open source database

  • img

    Microsft SQL Server

    Leading open source database

  • img

    AWS Redshift

    Cloud-based data warehouse

  • img

    AWS S3

    Cloud based blob storage

  • img

    Snowflake

    Cloud-based data warehouse

  • img

    Apache Airflow

    An open source orchestration engine

  • img

    Other SQL Relational DBs

    Most RDBMS are supported via SQLalchemy

  • img

    Jupyter Notebooks

    The best way to build Expectations

  • img

    Slack

    Get automatic data quality notifications!

Greetings! Have any questions about using Great Expectations? Join us onSlack