
The experiment
Last week, there was a bunch of buzz around finding the right word for a group of data scientists.
Lot’s of ideas bouncing around, but no way to identify a clear winner. To answer that question, Kyle and I threw together a quick A/B poll to see which ideas are genuinely most popular.
You know, for science.
We created the poll in allourideas, then posted to twitter, the locallyoptimized slack channel, and a few subreddits.
The results
Nearly 500 people participated, submitting a total of 8,524 votes. Thanks to everyone who voted!
“ensemble” ran ahead for the first 12 hours or so, only to be overtaken by “cluster,” and then pushed to third by “distribution.” At the very last minute, “ensemble” made a push back into tieing for second. (Ref: xkcd: Sports)
This made me happy, because “ensemble” would have been my personal choice. It’s more elegant than cluster, even if the connotation of cluster comes closer to the reality of a lot of data work today.
Here are the top 10 ideas, plus a smattering of others that we liked.
For the curious, here’s the overall ranking from top to bottom.
New ideas and honorable mentions
In addition to the 43 ideas in the initial poll, 123 more ideas were submitted in the course of the poll.
Most of these were duplicates (It wouldn’t be a real data science problem without dupes), and several were unprintable (Thanks, internet!) We dutifully screened and activated new ideas as they came through.
Many of the new ideas were very clever, and deserve honorable mentions.
- A “scatterplot” of data scientists?
- A “random forest” of data scientists?
- A “SQL” of data scientists?
- A “dag” of data scientists?
C’mon. These are awesome. I would love to have a collection of job descriptions and recruiter emails from an alternate universe where any of those terms caught on.
Other honorable mentions: Kaggle has clearly done some great brand work in the data science world.
So has Project Jupyter:
I'm just going to leave that here.
Couldn’t resist
Okay, last: since this poll created a bunch of fresh, new data, I couldn’t resist playing with the latest toys in Great Expectations.
I exported the data and ran it through Great Expectations' init flow, which includes automated data profiling and documentation. Happily, allourideas exports csvs in a format that Great Expectations can parse without any special configuration, so I was able to get up and running in just a few seconds.
Check out the results hereThe docs are somewhat informative, and this little exercise sparked several new ideas for improving them. As I've said before, Great Expectations compile-to-docs functionality is an area of active exploration and development for the project. But that's a topic for another time.
For now, here's the raw data for the poll results, in case anyone wants to dig deeper.
If you’d like to make data science less of a cluster, check out Great Expectations to see if automated data testing and documentation can help.