At this month’s meetup, we:
Learned about Great Expectations’ upcoming fluent-configuration Datasources
Explored recent usability improvements to DataContext
Saw a community presentation about building an integration between GX and OpenDataDiscovery using Custom Actions from community member Pavel Makarichev
Learned how to integrate GX with AWS and Pandas
And more!
You can watch the complete recording below:
The GX community gathers on the third Tuesday of every month: sign up here to join the next one.
Thanks and kudos
The GX community is a key part of our success!
Special recognition this month goes out to Thiago Militino, who’s answered many community members’ questions in Slack and been an active participant in GitHub, as well as to Itai Sevitt for his insightful questions and willingness to help out other community members.
Thank you to everyone who contributed on GitHub in the past 30 days:
And big kudos to our top Slack supporters this month:
Product updates
As always, our core themes for the GX roadmap are usability, community, capability, and quality.
Fluent-config Datasources
GX’s senior product manager for open source, Suzie, shared what we’ve been focusing on for the past month: fluent-configuration Datasources!
Fluent-config Datasources—first previewed last fall—will make it simpler to get started with GX by making it possible to set up a Datasource with just a couple of lines of code.
They’re slated to soft launch in the product and in GX documentation on March 2. This soft launch will have no breaking changes: the existing way of configuring Datasources, block configuration, will remain available and supported.
We’re especially interested in hearing about your experience with fluent-config Datasources during the month after the soft launch! If you have thoughts you’d like to share, please contact @Suzie A or @Tal in the GX Slack. Your feedback will be important in prioritizing future work on fluent-config Datasources.
DataContext cleanup
Chetan, a software engineer at GX, introduced some of the changes around DataContext that the team has been working on to make the user experience more consistent.
There are two main parts to this effort: state management and CRUD standardization.
To improve state management, the engineering team has changed some previously-inconsistent behaviors for saving GX objects. Now, GX uniformly utilizes stores to persist and is more predictable.
The CRUD work standardizes the naming conventions used for CRUD methods on different GX objects. This standardization is particularly relevant for creation-related methods, where earlier inconsistencies in the use of terms like save, add, and create could lead to confusion.
All DataContext objects now use the following nomenclature for their save-related behavior:
ADD creates a new object, with an error if the object already exists.
UPDATE updates an existing object, with an error if the object does not already exist.
ADD_OR_UPDATE saves the object to store regardless of its prior existence.
Some objects that did not previously have certain CRUD mechanisms now have them to maximize the consistency of CRUD behavior. For complete details, see the DataContext API documentation.
Below is a summary of the supported CRUD methods for DataContext objects after these improvements.
The methods that have been deprecated as a result of these improvements are create_expectation_suite, save_expectation_suite, save_profiler, and save_datasource.
Deprecation for these methods will follow our standard approach: deprecation warnings will point to the new methods to use, as well as indicate the version in which the deprecation happened, when we’ll stop support, and how best to move away from the deprecated methods. We’ll also update GX docs and notebooks to remove the deprecated methods.
User demo: Pavel Makarichev (Provectus)
Next, Pavel Makarichev from Provectus shared his experience with becoming a GX user from the point of view of integrations, with an in-depth look into how he uses Custom Actions to integrate GX with his OpenDataDiscovery project.
Pavel approached GX not in a data engineering role but as the person in charge of integrating ODD with external systems. As a first step, he decided to get to know how GX works under the hood. He worked through setting up, connecting data, creating Expectations, and then validating the data using the GX documentation.
Once familiar with GX, he considered integration options. Intermediate storage would offer more control and separated layers but adds an additional layer and requires a data model. Rather than pursuing that route, Pavel decided to use GX’s Custom Actions to create an integration with ODD that provides a seamless experience from the user side.
Check out the video for a fascinating in-depth look at exactly how Pavel implemented Custom Actions to take actions like extracting configurations separately from results and providing custom messages!
Feature demo: AWS S3 data validation and documentation (part I)
Ruben, a developer advocate at GX, introduced our new set of end-to-end integration guides for AWS with a live demo of the Pandas use case. In today’s demo, Ruben walked us through how to set up the Validation Store, Expectation Store, and Data Docs website in S3:
Join next month’s meetup for part II of the demo, where we can get going with Pandas and the fun will really start!
Join the conversation
What kind of needs do you have around anomaly detection?
Do you have a real-world use case for time series Expectations? GX CEO Abe Gong would love to hear from you & put his weekend hacking to the test.
Benji Lampel and Tino Pietrassyk shared a great story about using GX and Airflow at FactoryPal! If that sounds interesting, don’t miss Benji’s webinar about using Airflow’s GX Operator on Feb 28.
Philip Fürste sought advice about using in-memory Checkpoints.
Jeff Katz is looking for companies to work with data engineering bootcamp students.
What type of content do you want to see more of from the GX team? Let us know!
If you’re in the Bay Area and use Airflow, Viraj Parekh invites you to check out some free events at their SF office next week.
Take the survey
How well is the GX Slack meeting your needs, and what could we do better?
Please take a few minutes to fill out our survey. Your responses will directly impact where we invest in the community!
If you have any questions, don’t hesitate to DM @Josh Zheng, GX’s director of developer relations.
Additional updates
Next month, we’re meeting on Tuesday, March 21, for part II of the AWS S3 data validation with Pandas demo and more! Get the invite here.
GX now allows you to specify row identifiers/primary keys, has updates to the SQLAlchemy row condition parser and a new API action, and more!
Catch up on our API documentation improvements and new AWS integration guides…
…and don’t miss our interview with GX contributor Thiago Militino.
We’re hiring in engineering and developer relations! Check out our open roles.
Have you done something cool with Great Expectations that you'd like to share? If you're interested in demoing or have a piece of data quality content that you'd like us to feature, DM @Josh Zheng on our Slack.