A pod of whales. A murder of crows. A [_____] of data scientists.
The data community weighs in on what we should call ourselves.
January 14, 2020
Last week, there was a bunch of buzz around finding the right word for a group of data scientists.
You know, for science.
Nearly 500 people participated, submitting a total of 8,524 votes. Thanks to everyone who voted!
“ensemble” ran ahead for the first 12 hours or so, only to be overtaken by “cluster,” and then pushed to third by “distribution.” At the very last minute, “ensemble” made a push back into tieing for second. (Ref: xkcd: Sports)
This made me happy, because “ensemble” would have been my personal choice. It’s more elegant than cluster, even if the connotation of cluster comes closer to the reality of a lot of data work today.
Here are the top 10 ideas, plus a smattering of others that we liked.
For the curious, here’s the overall ranking from top to bottom.
New ideas and honorable mentions
In addition to the 43 ideas in the initial poll, 123 more ideas were submitted in the course of the poll.
Most of these were duplicates (It wouldn’t be a real data science problem without dupes), and several were unprintable (Thanks, internet!) We dutifully screened and activated new ideas as they came through.
Many of the new ideas were very clever, and deserve honorable mentions.
- A “scatterplot” of data scientists?
- A “random forest” of data scientists?
- A “SQL” of data scientists?
- A “dag” of data scientists?
C’mon. These are awesome. I would love to have a collection of job descriptions and recruiter emails from an alternate universe where any of those terms caught on.
Other honorable mentions: Kaggle has clearly done some great brand work in the data science world.
So has Project Jupyter:
I’m just going to leave that here.
Okay, last: since this poll created a bunch of fresh, new data, I couldn’t resist playing with the latest toys in Great Expectations.
I exported the data and ran it through Great Expectations’ init flow, which includes automated data profiling and documentation. Happily, allourideas exports csvs in a format that Great Expectations can parse without any special configuration, so I was able to get up and running in just a few seconds.
The docs are somewhat informative, and this little exercise sparked several new ideas for improving them. As I’ve said before, Great Expectations compile-to-docs functionality is an area of active exploration and development for the project. But that’s a topic for another time.
For now, here’s the raw data for the poll results, in case anyone wants to dig deeper.
If you’d like to make data science less of a cluster, check out Great Expectations to see if automated data testing and documentation can help.
You should star us on Github