We recommend deploying within a virtual environment. If you’re not familiar with pip, virtual environments, notebooks, or git, you may want to check out the Supporting Resources section to set up your environment. It's as easy as these two commands:
>_ pip install great_expectations
>_ great_expectations init
Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
Tests are docs and docs are tests
Many data teams struggle to maintain up-to-date data documentation. Great Expectations solves this problem by rendering Expectations directly into clean, human-readable documentation.
Since docs are rendered from tests, and tests are run against new data as it arrives, your documentation is guaranteed to never go stale. Additional renderers allow Great Expectations to generate other type of "documentation", including slack notifications, data dictionaries, customized notebooks, etc.
Automated data profiling
Wouldn't it be great if your tests could write themselves? Run your data through one of Great Expectations' data profilers and it will automatically generate Expectations and data documentation. Profiling provides the double benefit of helping you explore data faster, and capturing knowledge for future documentation and testing.
Automated profiling doesn't replace domain expertise—you will almost certainly tune and augment your auto-generated Expectations over time—but it's a great way to jump start the process of capturing and sharing domain knowledge across your team.
Batteries-included data validation
Expectations are a great start, but it takes more to get to production-ready data validation. Where are Expectations stored? How do they get updated? How do you securely connect to production data systems? How do you notify team members and triage when data validation fails?
Instead of building these components for yourself over weeks or months, you can use Great Expectations to deploy production-ready validation in your data pipelines. This “Expectations on rails” framework plays nice with other data engineering tools, respects your existing name spaces, and is designed for extensibility. PS: If you’re interested in a hosted and managed data quality stack, purpose-built for better data collaboration, please [reach out](link) to us about Great Expectations Cloud.
Pluggable and extensible
Every component of the framework is designed to be extensible: Expectations, storage, profilers, renderers for documentation, actions taken after validation, etc. This design choice gives a lot of creative freedom to developers working with Great Expectations.
Flyte is a workflow automation platform for complex, mission-critical data and ML processes at scale