We get the community together on the third Tuesday of every month—usually! This time, we met on the fourth Tuesday, October 25.
At meetups, we discuss the Great Expectations Roadmap, watch ecosystem integration demos, and explore different ways data leaders have implemented Great Expectations. Sign up here to join the next one!
Now, let's dive into the roundup.
Our new approach to data sources
Community update and celebrations
Cloud early adoption surveys
For October’s feature demo, Tal Gluck previewed our new style of data sources.
Watch the complete presentation here:
Or read on for a summary!
Data sources now
Right now, configuring a Datasource in GX involves use of a YAML file. This generic approach (YAML) to a specific job (connecting to a data source) lets us support a lot of scenarios!
But it has a few drawbacks, including that if your configuration falls off the rails it can be tricky to get it back on. Plus, if you personally aren’t working with a lot of data sources on a regular basis, the general approach isn’t that useful for you.
Goals of the new approach
Be more usable (including more Pythonic!) than the current way
Make it easier to work with multiples (including multiple Batches)
Make it easier to build and contribute new data sources
Be fully compatible with the old data sources–no breaking changes!
Some of the new features
Regex parser for automatically adding components of the filename to metadata
String sorter for assets
Batch name template
How it looks
The initial data source types that will be supported in the new way are Pandas and PostgreSQL. Once we have these two, we’ll continue to build out support for more sources.
We want community feedback, so if you’d like to be involved in testing the new-style data sources, ping @Tal Gluck in our Slack.
We’ve been sending out surveys about GX Cloud to the community! If you’re interested in being an early adopter, please respond to the survey or reach out on Slack to @Alina Weinstein or @Matthew Lundgren.
Join the conversation
Whether you’re a fledgling data practitioner or a seasoned data expert, our community welcomes you! Here are some conversations happening now:
Aleksei Chumagin from Provectus wrote about using GX to build a Serverless Data Quality Gate on AWS.
Davide Romano published an article titled “How to monitor Data Lake health status at scale.”
Sonal Goyal from Zingg has shared a post about her first year working on Zingg.
Herry Karmito asked about excluding known invalid rows from Validations.
Krishna Awasthi looked for advice on connecting GX to S3 Hudi.
Read an interview with community contributor Steven Secreti, who built Expectations that compare a new dataset’s profile to an earlier snapshot of the same dataset.
We’re hiring! Check out our open roles.
Have you done something cool with Great Expectations that you'd like to share? If you want to demo or have some data quality content you'd like us to feature, DM @Kyle Eaton on Slack.