At this month’s meetup, we:
Met Mollie Pettit, GX’s new senior community product manager!
Got a tour of the brand-new GX Discourse forum
Learned how to do multi-Batch Checkpoints efficiently
And learned about work by this month’s contributors, the latest on the product roadmap, and more.
Watch the recording:
Sign up here to join the next one!
Thanks and kudos
Our Slack supporters are an indispensable part of keeping the GX community vibrant! Kudos to all our top Slack supporters for July:
And our GitHub contributors do great work every month. We want to especially recognize these contributors for July:
You can connect to Azure Blob Store, Google Cloud Storage, and Amazon S3 more easily using Fluent Datasources now, thanks to Toivo Mattila’s work adding recursive file discovery!
Expect_day_sum_to_be_close_to_equivalent_week_day_mean gets another useful update from Hadas Manor.
Expect_queried_column_pair_values_to_be_both_filled_or_null makes its debut: thanks for this new column pair Expectation, Eden Omardeker!
New contributor workflow documentation was created by Christian Bromann. Contributors writing for contributors adds a whole other level of insight to those docs: thank you!
We are incredibly excited to welcome Mollie Pettit as our new senior community product manager!
Mollie began her professional career in geoscience before moving into data science and then data visualization engineering. She’s always been drawn to fostering communities, with one major example being her co-founding of the Data Visualization Society in 2019 then running its global conference for two years.
Developer relations allows Mollie to combine her technical skills with community development. She officially joined GX full-time this month and is looking forward to getting to know the community!
You can reach her on the GX Slack as @Mollie Pettit and on Discourse @mollie.pettit.
We have Discourse
Slack is a great space for a lot of communications, but it falls short as a knowledgebase. Most significantly, it only keeps messages for 90 days on its free plan. But even if message retention were longer, those messages wouldn’t be publicly searchable outside the app, which makes it much harder for users to find answers to their questions.
So to address both those things, we’ve revamped the GX Discourse forum to be a more welcome and effective place for Q&A! The Slack will still be a place for community members to connect in a more social setting, about basically anything that isn’t a Q&A support-type question.
For current Slack users, we’ve established a pipeline to help ease the transition:
Discourse questions are automatically cross-posted to Slack. You can post in Discourse and be sure that everyone will see it, even if they’re sticking with Slack. Responses to the Slack thread will be cross-posted back to the Discourse.
In the meetup, Mollie gave a tour of the new setup:
Check out the Discourse at https://discourse.greatexpectations.io!
ICYMI: 🎉 Support for SQL Alchemy v2 and Pandas v2 is now live! 🎉
We’ve also made a number of improvements to GX Cloud. Most recently, we implemented a way to create and edit Expectations entirely in the UI and added historical charts for viewing Validation Results over time.
If you’d like to check out Cloud yourself, sign up for the Beta here!
Exploring the API walkthrough
Developer advocate Haebichan Jung walked us through some flowcharts that help you explore the GX API.
Lately, multi-Batch Checkpoints have been a hot topic in the GX Slack.
Several of the users we’ve heard from have been struggling with the same thing: getting all the Batches, not just one of them, to be processed. The common workaround is to create a separate Asset for each file, a separate Batch for each Asset, and then a separate Batch Request and Expectation Suite, etc. This works, but is fairly inefficient.
Our recommended solution is to use a single Asset. While this might not immediately sound more efficient, it actually makes a huge different from a coding standpoint: our solution takes 5 lines of code to handle as many Batches as you have, whereas the workaround takes 3 lines per Batch.
GX developer advocate @Austin Robinson demonstrated the difference between these two solutions:
Join the conversation
Barrett asked what use cases there are for a minimum value that’s a range.
Jimmy3142 is looking for suggestions about validating whether a SQL query should merge data from the source table to destination.
Do you have ideas for creating a Custom Expectation that evaluates the min and max of a column while depending on a conditional in another column? Let Natalia Jiminez know!
Have you ever wished that GX could connect to CKAN? Cesar Garcia Saez is pondering how to make it happen.
Next month, we’re meeting on Tuesday, August 15! Get the invite here.
GX now has a Snowflake-specific Fluent Datasource! Also: initial support for Python 3.11.
Schema data validation for streaming data is a good first step, but it’s only the start of meaningful quality testing for streamed data: there are a myriad of important insights GX can deliver that on-arrival data validation will miss.
ICYMI: GX does not move or change your data, and we took a deep dive into why and how.
Have you done something cool with Great Expectations that you'd like to share? If you're interested in demoing or have a piece of data quality content that you'd like us to feature, DM @Mollie Pettit on our Slack.