Just One More Stratification!
Or: How to say “no” as a data person
March 17, 2020
Written by Sam Bail
While my previous role touched on pretty much all aspects of “data” in the context of an actual external facing data product, ranging from data product management to data engineering, I also spent some time providing insights about feature usage of a SaaS product to internal stakeholders – i.e. classical BI or product analytics type of “data”. One of the things that stood out to me was the almost insatiable demand of consumers for more data – more analyses, more stratifications, more filtering. Often, these requests would come in as ad-hoc requests: “Hey, these numbers look great, could you group them by year in addition to month so we can see some trends?”, or, “Maybe if we exclude this kind of population…”
Since I had a direct comparison with my experience working on a data engineering team, I noticed a difference between how stakeholders approached software engineering work compared to data work. It almost seemed as though they considered it to be… somewhat easier to “just run some numbers real quick”, whereas I hardly ever received requests to just “build a new feature real quick”. Here are some of my thoughts as to why that might be the case, and what “data people” can do to avoid getting stuck in an endless loop of “just one more stratification”.
Understand the problem and ask “why” a lot: This is definitely not a problem that’s specific to data work, but a pretty well-known issue in software engineering and other disciplines. Stakeholders often tell you what they want based on what they think will solve their problem, but don’t always tell you the why. Oftentimes, when you start digging into the why, you’ll find that their proposed solution doesn’t really address the problem, or only part of it. Another thing I noticed is that people are often curious about data, but can’t really think of any concrete actions they might take based on what they see. Curiosity and getting a mental model of some numbers is great and might be a valid use case, but oftentimes this ends up having no direct impact on the business goals. While it may be tempting to dive right into the data and give your stakeholders what they ask for, it’s usually worth spending a significant amount of time on understanding their needs and how they fit into higher-level business needs, both immediate and long-term.
Agree on the specs: It’s easy to just brainstorm ideas and get an idea of what users want, then go off and pull some data, but this also sets you up for scope creep. Treat data work the same way as a software project: there’s a clear spec including stratifications and filters, some form of plan based on priorities, as well as customer acceptance criteria. One option is also to prototype some output using dummy data to make sure your stakeholders have the same idea in mind of what the final deliverable looks like – e.g. will they get a CSV file, an Excel, a pivot table, an interactive dashboard, some numbers in an email…? Additionally, treat data work as regular work – if you’re working with a ticketing system, make sure to estimate and assign points, stick to a sprint cadence, and/or timebox work. This allows you to treat the project as an actual project with a plan – and if stakeholders ask for work outside of the currently scoped project, you can ask them.
Empower users: A lot of people in stakeholders positions are incredibly curious when it comes to data. They want to experiment with slicing and dicing data to get a mental model of what the space looks like. One way to take the load off of analysts that might have to deal with ad-hoc requests is to give your non-technical users an interface to explore the data. It’s important to have data access and literacy in an organization to avoid making engineers bottlenecks, but there are a couple of things to consider here:
Any platform, even if it’s “just” internally facing, is a product that needs to be treated as such. As in: it takes time to build, it needs a roadmap, an owner and plan for maintenance/enhancements/bug fixes, a plan for authentication and authorization (who/which team gives and gets permissions to what?), training and onboarding of new users, a plan for availability/uptime… And if you’re thinking “oh hell that sounds like a lot of work”, you’re absolutely right. It’s an investment.
Data can be misinterpreted. Make sure to understand what users might want to do and manage expectations as to what they can and cannot infer from the data they’re being provided. Any uncontrolled use of data should probably just act as hypothesis generation, with a clear process for establishing “production-ready insights” in collaboration with a data team.
Due diligence: This is less about managing stakeholders, and more about avoiding extra work for yourself. Even for small ad-hoc requests and pulling some quick numbers, I believe code reviews or just sanity checks (looking over the shoulder style “reviews”) are important – first and foremost to ensure what you deliver is correct, of course, but in the second place also to avoid having to go back and redoing work because of a bug. This also applies to understanding and cross-checking whether what you’re doing actually matches the specs.
These are really just a few suggestions based on my experience working in a data role, and while I obviously can’t claim they’re typical, I’m hoping that the above points resonate with you, fellow data person. Though it can be difficult at times to create more friction up front with stakeholders, these steps may prove to be a worthy investment of time to them and also your sanity. We’d love to hear your thoughts on this article and find out more about the kinds of typical requests you receive and how you handle them - feel free to tweet me @spbail #onemorestratification!
You should star us on Github