The plural of anecdote isn’t Data: Structuring the search for your Data Story

The-plural-of-anecdote-isnt-Data-Structuring-the-search-for-your-Data-Story
1. Make me Understand

The plural of anecdote isn’t Data: Structuring the search for your Data Story

I think health advisories about eating and drinking are all made up. My uncle enjoyed having fried food, drank like a fish and smoked often. He lived to a ripe old age of 87. My aunt, on the other hand, was a teetotaler and used to go for her daily walks. And yet she passed away because of a heart attack at 62!

​”I don’t think we should invest in Madhya Pradesh. My cousin had a tough time getting his approvals for a factory in Jabalpur

Nepotism exists in all sectors. Have you not heard of doctor/lawyer families? And by the way, the top management of ABC Bank is all from XYZ state

I’m sure you have come across statements like this (and more), when debating with friends and colleagues.

​What is the issue with statements like these? These are not data. They are opinions based on selective anecdotal evidence – that is often presented as data! (Especially when some of them have ‘statistics’ embedded in them).

As some wise person said, “The plural of anecdote isn’t Data”. You cannot just string together a series of anecdotes and call it the ‘facts’.

“In God we trust. All others must bring Data” – Unknown (frequently attributed to W Edwards Deming)

If you would like your recommendations (and arguments!) to be data-driven, you need the power of two tools:

  • MECE thinkingMECE stands for ‘Mutually Exclusive, Collectively Exhaustive’. It forces you to consider all possible angles and factors in the analysis. It is also described as ‘No overlaps, no gaps’. (This post is not intended to be a detailed explainer on MECE – will do that in a later post!)
  • The right Norm and variance: As part of MECE thinking, you need to choose the right norms for your data points. Norms can be compared across 3 basic parameters: Time, Competition and Component (share).

Let’s take a quick example – one that has been a raging debate in India for many months now: Nepotism.

​There are broadly two arguments being made. One which says that Bollywood is the worst industry when it comes to promoting its own. And another which states that there’s nothing especially wrong about Bollywood – nepotism exists in all industries.

If you debate this just using anecdotes, you’ll never hear the end of it. There are plenty of anecdotes, on both sides of the aisle. No one can win.

What you need is a data-driven approach.

Disclaimer: Crafting a detailed MECE approach for this question is beyond the scope of this post (sorry!)

​Still, I’m giving an indication of what I mean by data-driven approach:

1. Start by defining the main hypotheses that you wish to prove or disprove. Ideally, it should be quantifiable. For instance, here it could be something like: “The rate of nepotism in Bollywood is highest among all industries, geographies and over time

2.Figure out a key metric that can be used to measure nepotism in some way. Say, the “Percent of movies in a year that have ‘star-kids’ in key roles v/s ‘non-star kids’” (Let’s call it the ‘Nepotism rate’.)

3. Measure and compare this metric across the right ‘norm-variance‘ parameters:

– Time: Find the current ‘Nepotism rate’ for Bollywood, and then find the same for the last 5/10/20 years. Go as far back as you have data. The trends over time might give you a much better perspective, rather than just the number of one year

– Geography: Compare the metric in Bollywood vs. other industries (from India and from abroad). Sure, other countries are culturally different. Still, it might be useful to see how this shows up in the metrics. In addition, you can compare the trends in the ‘Nepotism rates’ for other film industries over time.

– Industry: This means taking the comparison outside of the movie industry. This is tricky. You may have to see something like “Percent of leadership roles in the industry that have ‘owner-kids’ in key roles v/s outsiders“. Sure, it may not be exactly comparable with the ‘movies released’ part… but it will give you some sense. You could also take other creative professions like music, art, stand-up comedy et al. The comparative numbers will give you interesting insights.

– Component: Dig deeper into Bollywood. How does the Nepotism rate differ across Studios? Across movie genres? Budgets? Big screen vs OTT? More fascinating insights will come your way.

4. Do the above exercise for other key metrics:

– How many Star/Non-star kids got a second-movie chance after the first flopped

– Number of years of struggle before getting first big break

– Total roles in a five-year period

– And so on!

5. While collecting this data, it is useful to have preliminary ‘hypotheses’ – your educated guesses on what the numbers/direction might be. Be super-careful of confirmatory bias though!

6. Finally, once you collect all the data, the next step is to craft the messages from the data, and craft a coherent narrative from those messages. And then visually represent it and prepare to present it.

I didn’t say the process would be easy! But it would be comprehensive. And data-driven.

So, the next time you get a loose anecdote-based comment, decide if it is worth getting into the discussion (most of them are inconsequential).

If it is worth getting into, then make it a data-driven one.

And remind your collaborators: The plural of anecdote is not data!

*****

Featured image credit: Photo by Ben White on Unsplash

Get Storytelling tips in your Inbox

Subscribe to the 'Story Rules on Saturday' newsletter

Get a free e-book that decodes the hidden storytelling structure used by leaders like Jeff Bezos, Bill Gates and Warren Buffett.
Your infomation will never be shared with any third party