Anonymized usage statistics
Anonymized stats from the community will help improve Great Expectations.
April 14, 2020
TL;DR: We’re adding anonymized usage statistics to Great Expectations. You can opt out at any time, but we’re hoping that you won’t: this data will be very helpful for improving the product.
A data vacuum
We can see GitHub stars:
We can see Pypi downloads:
We also get a little bit of metadata when static site assets load from our CDN.
We can see that the Great Expectations user base is growing rapidly. But we haven’t had any data on how the project is actually being used, which is starting to make it difficult to decide how to design future features and prioritize current work.
We want to build the best version of Great Expectations possible. To this end, we’ve added basic event tracking to the project, starting in the 0.10.0 release.
We do not track credentials, validation results, or arguments passed to Expectations. We consider these private, and frankly none of our business. User-created names are always hashed, to create a longitudinal record without leaking any private information. We track types of Expectations, to understand which are most useful to the community.
Usage statistics are fired when a DataContext is invoked from the CLI or a method call. For transparency, all event schemas are published in the code.
You can opt out of event tracking at any time by adding the following to the top of your project’s
anonymous_usage_statistics: enabled: false data_context_id: <<uuid>>
We’ve consciously modeled our approach and language on that of dbt. The dbt team has done an excellent job of respectfully requesting data that will help their whole community, while leaving full control in the hands of developers.
We hope that we can do the same. Please reach out in the #general channel on Slack if you have any questions.
You should star us on Github