We demo’d new functionality, reviewed the product roadmap, and welcomed new team members at the Great Expectations January community meetup.
We gather as a community on the third Tuesday of every month. Sign up here to join the next one!
This month, we covered
Community stats, thanks, and kudos
A GX product roadmap update
Introducing Suzie Antal, senior product manager at GX
How to include primary keys in Validation Results, a new feature!
You can watch the complete recording below.
Thanks and kudos
The GX community is a huge part of our success, and especially the people who contribute to the platform. Thank you to everyone who contributed on GitHub in December and January:
Our Slack supporters are key to our community’s ecosystem, especially during those times when GX’s developer relations team can’t be available.
Kudos to the top Slack supporters this month: Thiago Militino, Adarsha, Aleksei Chumagin, Amauri, Han Siong, Veronica Moi, Aravind Narayanan, Philip Fürste, and Dimitrios Truchan.
Our core themes for the GX roadmap are usability, community, capability, and quality.
Here’s how those showed up in recent work on the GX platform:
Data Docs can now include failed rows! Watch a demo of this new ID/PK functionality.
We’ve made improvements to the code quality of DataContexts which will help make maintenance and future work easier.
New API documentation uses Sphinx, so you’ll see the functionality you know and love from other Python projects coming to our API docs! Check out the progress here.
We’ve done a lot of cleanup on the documentation and tests for our core Expectations so that they’re a better example for community-contributed Expectations.
New integration guides are incoming: our integration guide for AWS S3 and Pandas is now available, with additional guides for Spark, Athena, and RedShift arriving within the next couple of weeks.
We now support Python 3.10!
As a reminder: we have updated our workflows to help us better respond to community PRs! As part of this new routine, PRs that go more than five days without any update may be closed. Do not hesitate to reopen your PR if it was closed due to inactivity. We completely understand that everyone has different availability.
We are very excited to introduce Suzie Antal as the senior product manager for GX Open Source!
Suzie has firsthand knowledge of the importance of data quality from her work in analytics. She’s looking forward to learning more about the community’s needs and pain points, so don’t hesitate to reach out to her on Slack @Suzie A.
Will Shin, a GX software engineer, showed off the platform’s new ID/PK feature, which returns the index of any lines that failed an Expectation.
You can watch that demo below, or read on for a summary.
ID/PK is a great example of a GX feature that was able to come to fruition because of the community.
Before ID/PK, Expectation Validation Results identified what was wrong with the data, but not where: there was no way to identify which particular rows failed the Expectation in most cases. The exception was Pandas users, who could use unexpected_index_list to create a list of index numbers using the default Pandas index.
The initial request for ID/PK came in a GitHub issue opened by KentonParton. In the following discussion, many community members contributed valuable insight as needs for this feature were fleshed out.
Community member Aidan Fennessy undertook the initial work on this feature, implementing ID/PK for Pandas, before passing the baton to the GX team to finish implementation. Thanks, Aidan and Kenton!
What ID/PK does
With ID/PK, you can specify the primary keys using unexpected_index_column_names in the result_format of a Checkpoint. The keys for rows that fail the Checkpoint’s Expectations will be included in the Unexpected Value Count table.
In addition to those keys, the output now by default includes a query that will allow you to retrieve all the rows that failed the Expectation. This output will vary slightly depending on whether you’re using Pandas, SQL, or Spark.
ID/PK also allows you to include multiple index column names. For details, watch the demo!
Join the conversation
We want to know: what behavior would you like to see in Data Docs?
Deepa KP is looking for suggestions about moving records to different Snowflake tables depending on whether they pass or fail Expectations.
Fraser shared a 10-minute walkthrough of the newest features in Dagster.
Monica Miller dropped the link to register for Datanova, a free virtual data conference.
Next month, we’re meeting on February 21: get your invite here.
We’re hiring engineers and a developer advocate! Check out our open roles.
Have you done something cool with Great Expectations that you'd like to share? If you're interested in demoing or have a piece of data quality content that you'd like us to feature, DM @Kyle Eaton on our Slack.