backgroundImage

Community roundup: January 2023

January's community roundup features a demo of how to get the key of rows that fail Expectations, a roadmap update, and more!

Erin Kapp
January 19, 2023
Great Expectations community roundup January 2023 cover card

We demo’d new functionality, reviewed the product roadmap, and welcomed new team members at the Great Expectations January community meetup.

We gather as a community on the third Tuesday of every month. Sign up here to join the next one! 

This month, we covered

  • Community stats, thanks, and kudos

  • A GX product roadmap update

  • Introducing Suzie Antal, senior product manager at GX

  • How to include primary keys in Validation Results, a new feature!

You can watch the complete recording below.

Thanks and kudos

The GX community is a huge part of our success, and especially the people who contribute to the platform. Thank you to everyone who contributed on GitHub in December and January:

January 2023 GitHub contributors

Our Slack supporters are key to our community’s ecosystem, especially during those times when GX’s developer relations team can’t be available. 

Kudos to the top Slack supporters this month: Thiago Militino, Adarsha, Aleksei Chumagin, Amauri, Han Siong, Veronica Moi, Aravind Narayanan, Philip Fürste, and Dimitrios Truchan.

Roadmap update

Our core themes for the GX roadmap are usability, community, capability, and quality.

Here’s how those showed up in recent work on the GX platform:

  • Data Docs can now include failed rows! Watch a demo of this new ID/PK functionality.

  • We’ve made improvements to the code quality of DataContexts which will help make maintenance and future work easier.

  • New API documentation uses Sphinx, so you’ll see the functionality you know and love from other Python projects coming to our API docs! Check out the progress here.

  • We’ve done a lot of cleanup on the documentation and tests for our core Expectations so that they’re a better example for community-contributed Expectations.

  • New integration guides are incoming: our integration guide for AWS S3 and Pandas is now available, with additional guides for Spark, Athena, and RedShift arriving within the next couple of weeks.

  • We now support Python 3.10! 

As a reminder: we have updated our workflows to help us better respond to community PRs! As part of this new routine, PRs that go more than five days without any update may be closed. Do not hesitate to reopen your PR if it was closed due to inactivity. We completely understand that everyone has different availability.

Welcome

We are very excited to introduce Suzie Antal as the senior product manager for GX Open Source!

Suzie has firsthand knowledge of the importance of data quality from her work in analytics. She’s looking forward to learning more about the community’s needs and pain points, so don’t hesitate to reach out to her on Slack @Suzie A.

Feature demo

Will Shin, a GX software engineer, showed off the platform’s new ID/PK feature, which returns the index of any lines that failed an Expectation.

You can watch that demo below, or read on for a summary.

Background

ID/PK is a great example of a GX feature that was able to come to fruition because of the community. 

Before ID/PK, Expectation Validation Results identified what was wrong with the data, but not where: there was no way to identify which particular rows failed the Expectation in most cases. The exception was Pandas users, who could use unexpected_index_list to create a list of index numbers using the default Pandas index. 

The initial request for ID/PK came in a GitHub issue opened by KentonParton. In the following discussion, many community members contributed valuable insight as needs for this feature were fleshed out.

Community member Aidan Fennessy undertook the initial work on this feature, implementing ID/PK for Pandas, before passing the baton to the GX team to finish implementation. Thanks, Aidan and Kenton!

What ID/PK does

With ID/PK, you can specify the primary keys using unexpected_index_column_names in the result_format of a Checkpoint. The keys for rows that fail the Checkpoint’s Expectations will be included in the Unexpected Value Count table.

In addition to those keys, the output now by default includes a query that will allow you to retrieve all the rows that failed the Expectation. This output will vary slightly depending on whether you’re using Pandas, SQL, or Spark.

ID/PK also allows you to include multiple index column names. For details, watch the demo! 

Join the conversation

Additional updates

Have you done something cool with Great Expectations that you'd like to share? If you're interested in demoing or have a piece of data quality content that you'd like us to feature, DM @Kyle Eaton on our Slack.

Like our blogs?

Sign up for emails and get more blogs and news

Great Expectations email sign-up

Hello friend of Great Expectations!

Our email content features product updates from the open source platform and our upcoming Cloud product, new blogs and community celebrations.

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Error message placeholder

Banner Image

Search our blog for the latest on data management


©2024 Great Expectations. All Rights Reserved.