Our newsletter content will feature product updates from the open-source platform and our upcoming Cloud product, new blogs and community celebrations.
Please let us know what matters to you in regards to your use (or potential use) of Great Expectations below. We want to make sure that we keep you informed and notified only about what matters to you.
Greetings! Our Hackathon is live with $15,000 in prizesStart Hacking
Welcome to Great Expectations
Always know what to expect from your data
Great Expectations is a shared, open standard for data quality. It helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
Quick Start
We recommend deploying within a virtual environment. If you’re not familiar with pip, virtual environments, notebooks, or git, you may want to check out the Supporting Resources section to set up your environment. It's as easy as these two commands:
View Our Full Documentation
Ready to dive in and start implementing? Head to our docs to take the next leap.
See Our Progress on GitHub
We keep our GitHub issues update with what we are working on while addressing our communities issues.
Find Help on Slack
Feel free to ask us a question on slack! There are always contributors and other users there.
Join and Discuss
A place where you can find and share advice on implementation and keep up on some of the more cutting-edge developments on Great Expectations.
Key Features
Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
Expectations
Expectations are assertions for data. They are the workhorse abstraction in Great Expectations, covering all kinds of common data issues.
Expectations are declarative, flexible and extensible. They provide a rich vocabulary for data quality.
Tests are docs and docs are tests
Many data teams struggle to maintain up-to-date data documentation. Great Expectations solves this problem by rendering Expectations directly into clean, human-readable documentation.
Since docs are rendered from tests, and tests are run against new data as it arrives, your documentation is guaranteed to never go stale. Additional renderers allow Great Expectations to generate other type of "documentation", including slack notifications, data dictionaries, customized notebooks, etc.
Automated data profiling
Wouldn't it be great if your tests could write themselves? Run your data through one of Great Expectations' data profilers and it will automatically generate Expectations and data documentation. Profiling provides the double benefit of helping you explore data faster, and capturing knowledge for future documentation and testing.
Automated profiling doesn't replace domain expertise—you will almost certainly tune and augment your auto-generated Expectations over time—but it's a great way to jump start the process of capturing and sharing domain knowledge across your team.
Batteries-included data validation
Expectations are a great start, but it takes more to get to production-ready data validation. Where are Expectations stored? How do they get updated? How do you securely connect to production data systems? How do you notify team members and triage when data validation fails?
Instead of building these components for yourself over weeks or months, you can use Great Expectations to deploy production-ready validation in your data pipelines. This “Expectations on rails” framework plays nice with other data engineering tools, respects your existing namespaces, and is designed for extensibility. PS: If you’re interested in a hosted and managed data quality stack, purpose-built for better data collaboration, please reach out to us about Great Expectations Cloud.
Pluggable and extensible
Every component of the framework is designed to be extensible: Expectations, storage, profilers, renderers for documentation, actions taken after validation, etc. This design choice gives a lot of creative freedom to developers working with Great Expectations.
Case Studies
There are many amazing companies using Great Expectations these days. Check out some of our case studies with companies that we've worked closely with to understand how they are using Great Expectations in their data stack.
Blogs
Take a dive into our library of blogs! Our blogs range from thought leadership pieces like Eugene Mandel's "The Love Languages of Open Source Maintainers" or Sam Bail's "Why data quality is key to successful ML Ops to some in-depth looks at concepts in the side of Great Expectations.
Webinar
Webinars are a great way to get to know our product and see it in action. Check out our playlist of Webinars that range from "Great Expectations 101: Getting Started" to the more advanced "Great Expectations 301: Great Expectations in a Hosted Environment"
Great Expectations Cloud
Great Expectations Cloud is a fully managed SaaS offering. We're taking on new private alpha members for Great Expectations Cloud, a fully managed SaaS offering. Alpha members get first access to new features and input to the roadmap.
Apply here for the waitlist, and we'll reach out if there's an initial fit!
Trusted by
These companies have joined us in the battle against pipeline debt and so should you!
Integrations
Pandas
Great for in-memory machine learning pipelines!
Spark
Good for really big data.
Postgres
Leading open source database.
BigQuery
Google serverless massive-scale SQL analytics platform.
Dagster
A data orchestrator for machine learning, analytics, and ETL.
Managed Spark Analytics Platform
Managed Spark Analytics Platform.
Leading open source database
Leading open source database.
Microsft SQL Server
Leading open source database.
Flyte
Flyte is a workflow automation platform for complex, mission-critical data and ML processes at scale