We appreciate each and every one of the many talented people who have contributed to Great Expectations’ open source project and participated in our community—10,000 members and growing!
Today, we’re profiling Mateusz Kopeć.
Intro & icebreakers
❓ What’s your current role and organization?
I am a Senior Machine Learning Engineer in ING Hubs Poland. I touch a lot of different topics in my daily work, but typically they involve dealing with large volumes of data about bank transactions.
🎞️ What’s a movie you love?
"12 Angry Men" from 1957. Great acting, great story, everything else: minimalistic.
🎲 What’s your favorite board or video game?
"Civilization" PC game series, Civ2 was the very first game I bought, helped me to learn English.
👟 What’s a sport you play?
Running (independent of weather) and MTB (if it's warm and sunny).
⚫⚪ Light mode or dark mode?
Dark for code, light for everything else.
🔎 How did you discover Great Expectations? What did you do with it first?
I was looking for a tool to help me summarize the quality of data used for reporting purposes. I wanted to provide a quantitative description of what problems the data has, and apply the same logic to assess the quality for different data sources. GX was a really good fit for that, although it took me some time to understand all its abstractions and obtain first results.
🌱 What are some things you find rewarding about contributing to an open source project?
Just "giving back" to the community—the amount of time I save by using open source projects will never compare to the amount I spend contributing to them. But at least it's a small step in a good direction. And you don't need to prioritize based on value added—you just implement whatever you feel like implementing 😀.
🧡 Is there anything about contributing to GX specifically that you enjoy?
The maintainers were really fast to provide feedback for my pull requests and merge them to the codebase. Another nice thing was seeing my name mentioned in the Slack channel as a contributor—it is a small thing, but it's always nice to hear (or read 😉) a thank you message.
🛠️ If you’ve contributed Expectations: what do your Expectations do, and what are some reasons someone would want to use them?
I’ve mostly been adding Spark backend support to various Expectations with no common denominator, so it’s hard to summarize that in one paragraph. But I may highlight some Expectations which were of my initial interest when I started to use GX, and these are related to verification of bank account numbers and bank codes.
I highly recommend using these Expectations instead of something simpler like regular expressions, because they do test more than just the format of the value.
If the data you are using comes from manual input, even if the input is validated to some extent (e.g. not empty, or containing only valid characters), sometimes people will get creative and input something which will pass the validation but is not an actual valid value. To detect such cases, you really need some complex logic behind the Expectations.
🏆 What contribution to GX are you most proud of and why?
I think the first pull request I made—it’s the smallest PR I’ve ever done, since it only changes a single character in the codebase 😀. But seeing how easy it was to merge this contribution with the main code base encouraged me to create other pull requests, with more complex changes.
📣 Are there any other open source projects you contribute to that you’d like to shout out?
For all people processing data about bank transactions, I’d like to recommend the schwifty library. It allows us to verify the validity of IBANs (international bank account numbers) and BICs (bank identifier codes). Some GX Expectations use it internally, so improving schwifty will improve GX as well :)
✅ What does data quality mean to you?
Data quality is not about the data being perfect (it never is), but about acknowledging its limitations and showing the focus areas for improvement. Regularly checking data quality via automated means gives me confidence in the end-to-end results of running my programs. This is a holistic view of software engineering: not only the program has to be tested, but also the input (and output!) data has to be verified.
Thank you Mateusz for taking the time to speak with us!
If you’re thinking about joining the GX community, there’s no time like the present. Ease in with lurking in Slack or go straight to sharing your Custom Expectations: we’re happy to have you no matter how you want to engage.
You can join the GX Slack here.
Check out our guide for getting started with contributing.
It’s easy to share a Custom Expectation if you follow our step-by-step process.
To contribute a package, start with this how-to, so everything as easy as possible.