Monday, August 7, 2023

Data (mis)Representation and Visualization

 So, Data is the buzzword and the biggest force out there today that is considered the key to future growth and success. It has spawned so many different streams and quotes already. 

Streams like Data Design, Data Analysis, Data Engineering, Data Science, Data Management, Data Steward, Data Governance, Data Migration, Data Ingestion, Data Transmission, Data Storage, Data Streaming, Big Data and a few others that I just may not know about.

In terms of quotes and equivalence that I've heard :

Data is the next Gold.

Data is like Air - everywhere around us and is absolutely needed for an organization to thrive.

Data is like Water - All around us, flowing from one place to another, stored in containers for future use

So I get it. Everybody understands how important data is. But do we know how to use it properly?

Apparently, too well. 10-15 years ago, if someone said that their reasoning/argument/logic is backed by data, it's highly probable that I would probably be swayed over to their way of thinking. Today - not as easy. Why? Because we have learned how to misrepresent data. We have learned how to dissect the data to back our way of thinking. We have made huge strides in analysis, and that has enabled us to portray facts in a light that supports our logic.

I recently came back from a road trip. On one of the days during the trip, my wife & I were driving from Grand Tetons to Yellowstone. It was cold, and it was raining. I was not enjoying driving my Audi A3, a small sedan, on the mountain roads when they were wet and cold and could've possibly been slippery. I wasn't the only person on the road, though. There were a lot of vehicles. To support my discomfort, though, I commented that not a lot of those other vehicles were small sedans or, for that matter, sedans at all. 

Then I started thinking analytically if I could quantify and back my statement with numbers. So I categorized the next 100 vehicles. Only 9 of them were sedans. I counted another 100, and only 6 of those were sedans. Of any size. That gives me a percentage point backing up my theory - Only 7.5% of total cars on the road that day were sedans. 92.5% of the people on the road on that wet, rainy day were driving a bigger vehicle. Does that make my point and back my theory up? It does. Would you think very few people drive sedans and consider that a weird fact. Maybe?

Until I tell you that only 20% of total cars sold are sedans. Until I tell you that when it comes to breaking down this data, out of the 200, rough estimates for other types of cars were : 3 big trucks, 20 RVs, 40 SUVs, 40 Pickup trucks, 10 Hatchbacks, 35 Minivans, 30 Crossovers, and 7 Vans. Giving them a percentage points of 2%, 10%, 20%, 20%, 5%, 18%, 15%, and 4%. Sedan numbers don't look too bad now - do they? It was mixed traffic, and the numbers show that. But because I picked a single category, I can cast it as an outlier. I can pick SUVs and say the same thing - 80% of the people on the road that day were not driving an SUV.

Data is no longer equivalent to truth. It's all about perspective. It's all about how you slice it, present it. 

Talking to a friend recently, they mentioned that there was a travel freeze in their company despite an amazing H1 because travel costs had risen 25% while the revenue was up only 15%. I do not know the actual numbers behind the scene, but that is weak logic. Why, you ask?

Because yes, you're showing that travel costs are rising at a sharper rate, are you actually quantifying the impact? Let's say my revenue last year was 10mn USD, and this year is 11mn USD. That's a 10% increase in revenue. Let's say my travel expenditure last year was 10K USD. But this year, to get more business, more people flew, and it's gone up to 20K USD. That's a 100% increase in my travel expenditure. However, the additional 10K USD in travel, when reflected against 1mn USD in revenue - how much of an impact does it make on the bottom line.

Now during a discussion with another friend, they pointed out that I was only presenting one side of the story, and I see his point. To clarify the scenario above, I'm not suggesting or questioning the decision to travel freeze. Presenting another aspect using different numbers, if we keep the revenue and growth the same, but let's say the travel expenditure last year was 500K and this year has been 625K providing us the 25% increase. That does cause a significant impact because while the growth in revenue is still 1 mn, the additional travel of 125K takes 1/8th of it immediately.

What I'm trying to suggest here is that as we face numbers every single day, we need to remember that simply because we come across a big number or a trend, we should not assume that we are able to understand the full picture. Sometimes summaries can impress an idea that is very different from the truth, and details are needed to understand the background and look at it holistically.

No comments:

Post a Comment