backgroundImage

Anonymized usage statistics

Anonymized stats from the community will help improve Great Expectations.

Great Expectations
April 14, 2020
data-usage-cover

TL;DR: We’re adding anonymized usage statistics to Great Expectations. You can opt out at any time, but we’re hoping that you won’t: this data will be very helpful for improving the product.

A data vacuum

As a data company (Superconductive) building a data product (Great Expectations), we’ve been operating in a world with surprisingly little data about our core product.

We can see GitHub stars:

more-clapping

Graph of github stars, going up and to the right

We can see Pypi downloads:

Graph of pypi downloads, going up and to the right

We also get a little bit of metadata when static site assets load from our CDN.

We can see that the Great Expectations user base is growing rapidly. But we haven’t had any data on how the project is actually being used, which is starting to make it difficult to decide how to design future features and prioritize current work.

Usage statistics

We want to build the best version of Great Expectations possible. To this end, we’ve added basic event tracking to the project, starting in the 0.10.0 release.

We do not track credentials, validation results, or arguments passed to Expectations. We consider these private, and frankly none of our business. User-created names are always hashed, to create a longitudinal record without leaking any private information. We track types of Expectations, to understand which are most useful to the community.

Usage statistics are fired when a DataContext is invoked from the CLI or a method call. For transparency, all event schemas are published in the code.

You can opt out of event tracking at any time by adding the following to the top of your project’s

great_expectations/great_expectations.yml
file:

1anonymous_usage_statistics:
2 enabled: false
3 data_context_id: <<uuid>>
4

Due credit

We’ve consciously modeled our approach and language on that of dbt. The dbt team has done an excellent job of respectfully requesting data that will help their whole community, while leaving full control in the hands of developers.

We hope that we can do the same. Please reach out in the #general channel on Slack if you have any questions.

Search our blog for the latest on data quality.


©2024 Great Expectations. All Rights Reserved.