Data teams are often run like service organizations: a request goes in, an answer comes out. To stem the ceaseless flow of questions, we do what any reasonable analyst would: we answer questions faster, we build dashboards on dashboards, we push back, and, failing all that, we jump the chasm and try to teach our stakeholders SQL. Yet this particular cocktail of solutions is both all-too-common and all-too-commonly ineffective. The root cause of this chaos: we’ve failed to define best practices around how we should be collaborating around analytics. In this post, I’d like to nail down once and for all what analytics collaboration is and how we should be making it better.
Data analysts analyze data to drive business value. This purpose is often pigeonholed into Supporting The Decision-Making Process, but this isn't quite the entire story. Analysts are experts at not only analyzing but, more importantly, studying, interpreting, and navigating the massive streams of data your company is likely ingesting. The best analysts are not transactional data APIs, taking in requests and returning data, but act as advisors and explorers in the overwhelming, high-opportunity, yet often deceptive world that is data. Analyst work can be bucketed into one of three categories: - Reporting & self-service: creating data products (dashboards or apps) that enable others to directly look at data.
- Ad hoc requests: reactively supporting business decisions and problems with data.
- Strategic initiatives: proactively finding opportunities in the data. In this post, we'll discuss each of these responsibilities, what they entail, and what qualifies as excellence therein...
When I was a data scientist at Airbnb, I once received an ad-hoc request to identify Olympians that had accounts on Airbnb. No context given, just a colorless request in Slack. After a day of painstakingly scraping the web and munging data in SQL, I dumped the data into a table in our data warehouse, sent it off, then promptly moved onto other work.Months later, I discovered that this work had contributed to the launch of a new business segment: Olympian and Paralympian online experiences. I felt proud that I’d been able to directly contribute to the business in this way, but I was bothered that no one had thought to let me know that the launch had occurred, let alone include me any more deeply in the launch process. I’d never felt more viscerally that my role in the product-building process was that of data vending machine: request in, data out...
Speed Changed Analytics, but Our Processes Need to Change Too.
Let's take a breath and rethink how we work.When the automobile went mainstream in the U.S. during the 1920s, the nation re-built roads, wrote new laws, and erected signs and signals to control the flow of traffic.
Back when I was a data scientist, I spent a substantial amount of time doing product analytics work — opportunity sizing, experiment deep dives, ad-hoc checks. But although I worked across a wide range of tools — Jupyter/Python, tidyverse, superset, internal tools, even Java UDFs — the bulk of t...
We wanted to solve the following pain point: it’s hard to find and get context on tables in your warehouse. But by working with customers over the last year we’ve learned that standalone data discovery, while effective at controlling data sprawl in massive organizations, only provides a stopgap...
As data analysts, we waste too much time on making dashboards for other people and not enough time on answering deep questions about critical business issues. This is a waste of resources for the individual, and a waste of resources for the business.
Why the IDE is not the future of SQL-based analytics
I’m just going to say it: the traditional IDE format is not great for writing queries for analytics work. I’ll start by explaining why, then tell you what you can do about it.