Data Day Texas 2020
January 29, 2020
Several people have asked for my slides from Data Day Texas last week. Here they are!
This was my first time at the conference. I found it a fun and high-energy group of real, in-the-trenches data people. In other words, I felt very much at home.
Data Day has an unusually strong contingent of graph database afficiandaos. In years past, graphs DBs have sometimes been a separate conference and sometimes just a track within the main conference. Attendees seemed split on this aspect of the conference: some were really excited about graph dbs; others said “yeah, I don’t really go to those talks.” I didn’t hear a lot of people in the middle.
Because the Data Day is only one day and I had my talk in the morning plus office hours in the afternoon, I missed a pretty big slice of the rest of the presentations. Regrets and apologies to all the awesome people who I can’t say anything specific about.
For example, I missed it myself, but I was told that Heidi Waterhouse’s talk on the Death of Data was amazing. I also heard that the Cassandra keynote on the Next Five Years in Databases did a good job bringing a bunch of different trends together.
The human in the loop track
I caught the most talks from the human-in-the-loop ML track. There’s a ton of really interesting thinking going into workflows where humans and machines learn together. We’re rapidly moving past “humans provide labels; machines optimize fit” to a whole bunch of fascinating and specific questions about how.
- Which labels?
- In which order?
- How much trust should be assigned to labels?
- At what cost?
- Exactly what is a label? Can they be linked/nested?
- When can computers suggest, hint, etc. labels?
- How can we bring external datasources to bear?
- How do labels and models travel across domains?
- etc. etc.
Since each of these questions also implies a set of UX and infrastructure decisions, there turns out to be a lot of space for innovation. I left thinking that it’ll be interesting to see how general the solutions turn out to be: will we settle into a handful of widely scaled modes for training HitL ML models? Or will the right solutions be different for every organization?
IMO, the jury’s out, and it’s going to have a big impact on what data science looks like 5 years from now.
Written by The Great Expectations Team
You should star us on Github