Speed Changed Analytics, but Our Processes Need to Change Too.

Robert Yi

March 4, 2022

Let's take a breath and rethink how we work.

When the automobile went mainstream in the U.S. during the 1920s, the nation re-built roads, wrote new laws, and erected signs and signals to control the flow of traffic. Without these process improvements and infrastructural overhauls, the advent of the automobile would’ve been untenable. In the last decade, we’ve experienced something analogous in the world of data: the advent of the cloud data warehouse.

Modern cloud data warehouses are fast. The speed at which you can interrogate your data is unprecedented. But, just as with the advent of the automobile, we need to rethink how we work with data to ensure we don’t just recklessly drive [answer data requests] faster. We need:

📏 Better alignment with business objectives.
Speed means a greater volume of analyses. This makes it even more critical to reach consensus on objectives before dumpster diving to make sure leadership is ingesting the correct data narratives. Else we risk falling victim to “garbage in, garbage everywhere”.
📒 Better practices around the process of producing and sharing analytics work.
The faster we work, the messier it all gets. We need standards that ensure that work is sustainable, reproducible, and organized.
🕵️‍♀️ Adjust our responsibilities and expectations.
We might consider a new title: we need “decision scientists” that are closely involved in decision-making processes.

I have no silver bullets to address the above within this article — just some observations that point to an emerging theme: the refactoring of our analytics processes.

The data ecosystems of the 2010s were… slow.

When I was at Wayfair (2017–2019), pulling data was painfully slow. It could take minutes to hours to extract even the most menial slices of data from Hive or Vertica. You might suspect that this slowness simply limited the amount of work that could be done, but the repercussions were broader. The slowness impacted the nature of the work we prioritized. I’d often favor work that didn’t necessitate pulling from the warehouse — playing with data in-memory in Jupyter notebooks, clever algorithmic solutions, machine learning.

And I don’t think I was alone in orienting this way. The culture of data work in this era seemed to inherently favor the deep and exhaustive. I’d wager that the use cases we emphasized by doing this sort of slow work even shaped collective corporate attitudes on data work: problem-solving over stakeholder management; sophistication over interpretation; perhaps even data science over analytics. What we saw was all there was, and all we had the patience (or time) to see was work that lived outside of the warehouse. Iterative SQL workflows were prohibitively slow, and therefore, took a backseat.

A trivial yet powerful principle was at play here:

If you cannot get data quickly, you can’t focus on things that rely on getting data quickly.

And this quiet idea has been pervasive in data work over the last decade, having had enduring ramifications on how we’ve been using data. Because data has been historically so cumbersome to pull out of legacy data warehouses, we designed everything — from our decision-making processes to our org structure — on this premise.

Things are now fast, but our processes haven’t caught up.

A step function change happened in how I personally related to data when I moved to Airbnb in early 2019. While Airbnb had a similar stack to Wayfair, their optimized Presto + Druid (Minerva) layers enabled queries to execute more quickly and reliably than at Wayfair. And because Airbnb had been this way for a while, the culture had time to organically adapt. The practical ramifications: product requests were nearly non-stop, exploratory work backed nearly every major decision, SQL was a foundational language rather than simply an access point, and data science folks (increasingly of the “analytics” flavor) were being embedded in every team.

But the shift presented a ton of new problems. The revolution was really stakeholder-led, meaning we were missing a clear definition of our role in decision-making conversations from data leadership. We instead followed the cues of our product counterparts and juggled often orthogonal team initiatives with our embedded work. In the absence of crisp messaging from leadership, we were constantly navigating the tension between where we felt we were needed, what our managers wanted us to do, and what we felt was truly important. The prevailing sentiment from leadership at Airbnb (and, to be fair, across data science/analytics organizations among its other tech unicorn peers) was “answer as many requests as you can, but also do your own projects.” At the end of the day, we built far too many dashboards, yet still did too much repeat work, and, ultimately, focused on the wrong problems.

The root cause: while we were empowered to answer substantially more questions than before, we did not set up our teams, infrastructure, or objectives in a way that aligned to this new world.

Our processes are improving, but we could do better.

If you’ve followed the literature in the data community over the last couple of years, you may have noticed emerging narratives trying to establish better practices around how we should work in this brave new world. We’ve reached the consensus that stakeholder alignment is critical. Much like product teams live on user feedback, every piece of data work can be viewed as a “Data Product” used by stakeholders. If stakeholder problems aren’t addressed, the product is failing. We’ve also started to structure our teams differently and hire differently, building tightly-coupled working groups that can grease the pipeline from data preparation to decision. Still, we made mistakes, focusing far too heavily on reactive speed, building far too many dashboards in the pursuit of self-service. We haven’t cracked the formula for scalable, streamlined process, and I’m excited to see how we’ll tackle some of the big remaining problems:

📏 Analytics teams still need better alignment with business objectives.
There’s been a call in recent years to run your data team like a product team, and this has been an overall boon for the industry. But we still haven’t cracked how precisely to incentivize our teams to operate in a way that places business objectives at their core in a way that lets us scale past our capacity to simply answer more questions about data. The first step is to certainly hire ICs who can span the technical-business context gulf, but this is only half the solution: the missing piece here is still building alignment from the top down.
📒 Better practices around the process of producing and sharing analytics work.
We can now make data-driven decisions nearly all the time (and this will get even more true with the imminent rise of the metrics layer), so we need to think carefully about the processes around this work — how alignment is reached, how interpretations are curated and presented, and, in a world where so many more analyses are being made, how duplicate work is reduced. dbt has codified one aspect of this, enabling reusable data models to be centralized, stored as code, discovered, and leveraged by others. But the process of sharing insights still leaves a lot to be desired. More work needs to be written-up, and less should live in your IDE as tabs (we believe this can be largely fixed by doing SQL work in a doc).
🕵️‍♀️ Do we need decision scientists?
The last decade felt like the decade of the data scientist. The last few years felt a push towards analytics engineering. I imagine the next few years are going to be all about the decision scientist. “Decision science” is already a role at some big tech companies, where core data models, metrics layers, and basic exec reporting dashboards have been largely built, scaled, and centralized, meaning less need for data platform and more for interpreting and faithfully leveraging that data.

Decision science is going to go mainstream.

The 2010s belonged to data science. Then we saw the rise of analytics eng. But aeng -> less time on prep + better data -> more questions -> decision science. Will purple people cleave into magenta and fuschia people?

— robert yi 🐳 (@imrobertyi) February 28, 2022

I’m excited to see how the narratives will evolve. The industry has undergone astronomical shifts in how we procure and prepare data, but it’s time for us to reconsider the workflow around decision-making and analytics — the reasons why we got all this data in the first place.

We’re making a bet that the world needs to move towards better alignment and processes, and we’ve built Hyperquery to reflect our ambitions to this end. Hyperquery is a doc workspace for SQL + analytics work. By doing analytics work in an organized doc workspace, we make it easier to not only write and share query-based analyses but also drive alignment, scale impact, avoid reproducing work. Check out what we’re building at Hyperquery.ai.

‍

Tweet @imrobertyi / @hyperquery to say hi.👋
Follow us on LinkedIn. 🙂
To learn more about hyperquery, visit hyperquery.ai.‍

Speed Changed Analytics, but Our Processes Need to Change Too.

Let's take a breath and rethink how we work.

The data ecosystems of the 2010s were… slow.

Things are now fast, but our processes haven’t caught up.

Our processes are improving, but we could do better.

Get started today