At this month’s meetup, we:
Celebrated this month’s contributors
Found out what’s next in GX Cloud
Learned more about using GX with Databricks
We’re hiring in developer relations! If you or anyone you know is interested in joining the GX team, don’t hesitate to apply.
You can watch the complete recording below:
The GX community gets together on the third Tuesday of the month: get your invitation to join the next one!
Thanks and kudos
We’re super excited to see the community continue to grow! That’s due in no small part to the efforts of our Slack supporters.
Kudos to all the long-standing contributors and new faces who are our top supporters for May:
Special recognition this month goes out to Tobias Bruckert, Will, Hadas Manor, and Rishi!
Tobias updated the classname for the MulticolumnDatetimeDifferenceInMonths Metric (#7734).
Will fixed a bug in expect_column_values_to_be_in_type_list that was causing sparkdf_datasets checks to fail (#7684).
Hadas fixed a bug in expect_day_count_to_be_close_to_equivalent_week_day_mean and deleted an Expectation that it had made redundant (#7782).
Rishi fixed a broken link in the README.md (#7780).
We also have some exciting community-contributed PRs in progress, which we’re looking forward to sharing next month!
Product updates
GX product manager Suzie Antal shared an update on what’s coming next for GX Cloud.
We’re sharing these Cloud updates in the community because we want it to be a great tool for current GX OSS users who need to collaborate with other teams, particularly nontechnical ones. If that’s you, we want to hear what would help you most!
Coming next in Cloud is:
Improved sharing of Validation Results to make collaboration faster and easier
UI-based Expectation creation—especially for less-technical users
A new data health dashboard
If you have thoughts or ideas about any of these—especially how we could make Expectations more accessible to less-technical users, or what metrics and features you would find most helpful for measuring your data health—please contact Suzie! You can reach her @Suzie Antal on the GX Slack or in the #gx-feedback channel.
Demo: GX and Databricks: a powerful alliance
You might have seen this blog post about quickly spinning up GX OSS on Databricks and accessing BigQuery public datasets. The repo at the center of that process was created by Tanner Beam, an analytics engineer at GX, to use in his own work.
In this presentation, Tanner builds on the content of that post by providing additional context and commentary while demonstrating the repo live. He also shares an example of how he operationalizes the repo by building a dashboard for the BigQuery data.
Thanks, Tanner!
Q&A
We had some great followup questions:
Emanuel asked about inserting this process in the middle of a data pipeline and options for triggering notebook runs.
He also asked about using GX with files on the order of 2GB.
Amani asked about using the greatExpectationsOperator to validate a CSV file without loading the file to a database.
If we ran out of time for your question or you were having trouble connecting to ask it, please don’t hesitate to follow up on the GX Slack!
It’s easy to use
As additional evidence for how easy it is to get started with this repo, GX’s director of developer relations Josh Zheng successfully worked through the process described in the blog… right after logging into Databricks for the first time ever.
Join the conversation
Fraser shared a link to the recording of Dagster’s Building Better Analytics Pipelines event.
Josh Zheng announced that GX support is moving to Discourse! Learn more and share your feedback—we want to know what you think!
Pablo Contreras asked for ideas about building dashboards using data from the Validation Result Store.
José Correia is looking for peoples’ experiences with open-sourcing internal company projects.
Additional updates
Next month we’re meeting on Tuesday, June 20! Get the invite here.
GX co-founder and CEO Abe Gong shared more about how GX Cloud is different from Open Source and why.
We shared some videos about using the `mostly` parameter in ColumnMap Expectations and how to choose between Map and Aggregate Expectations.
Recently, we highlighted contributor Mateusz Kopeć!
We’re hiring in developer relations—check out our open roles.
Have you done something cool with GX that you’d like to share? If you’re interested in presenting at a community meetup, or if there’s a topic you’d like to hear from the GX team on, DM @Josh Zheng on the GX Slack.