Please note, a version of this post was first published on LinkedIn.
Your “average” stinks. Doesn’t matter what that average is, I know it’s rotten to the core.
Average cost per click.
Average user time on page.
Average length of time to close a sale.
Average height of a delivery driver who fills up your vending machine.
Regardless of what you’re talking about, if you’re talking about “average”, you’re doing it wrong.
Because average, as a measurement or a descriptor, sucks.
I’m not the only one saying it.
When you’re average, there’s a lot of people who are better than you. Yes, there’s a lot of people who are worse than you, too, but in this success-obsessed culture, being average is often seen as being a failure.
And, despite the subtle indications that this post is going to be another about self-improvement, I’m actually going to talk a bit differently about average and how its use is failing you.
I’m going to talk about why using average, as a descriptive metric, is holding you back. It’s actually obscuring your insights, and keeping you from learning what you could from your analyses and your models.
Okay, I shouldn’t have to do this, but just so everyone’s on the same page: when I’m using the word “average” hear I mean the mean, or the sum of the individual components divided by the number of components. If there are 3 dice rolled, and they are a 1, a 3, and a 6, then the “average” roll is 3.33 ((1 + 3 + 6) / 3 = 3.333).
There’s also the “median” and “mode”, other measures which may sometimes be used for average. But here, we’re talking mean, as is usually considered when looking at values.
Organizations and individuals all over the world use average, to describe something like a central tendency of a group. They look at the average high or low temperature of the place they’re going to vacation next month, so they know what to pack:
The problem is, most of the time, things that are “average” don’t actually show up in real life. Like the dice rolls above: even though the average is 3.33, it’s actually impossible to get 3.33 on any one specific roll. Sure, roll a die a hundred times and you should get a total pretty close to 350, for an average of 3.5, but on any one specific roll? You will NEVER actually get 3.5
[side note – don’t you hate it when Google doesn’t play nice? I have this idea that there’s a Far Side cartoon perfectly describing the phenomenon I’m thinking of, but I can’t find it. It’s a family with the average “0.5 dogs and 1.5 kids”, and the dog and one of the kids are just like the “left” side, not the whole thing.]
So in this situation, making some kind of prediction or evaluation based on averages is rather meaningless. Anything that’s a quantum action (how many times did someone click on my website?), or isn’t a continuous variable (what are the expected values from rolling a 2D20?) probably shouldn’t be evaluated using “average”.
And yet, we do it all the time.
We Talk About Average Way Too Much. Distribution, Not Enough
So is there a solution? Clearly.
This guy goes into more detail than I’m prepared to right now. The point is, though, that talking about percentiles of a distribution provides vastly more information than simple averages.
Here’s a real clear example. These two data sets have exactly the same # of elements and the same average:
Obviously, I manipulated this data set a little bit to prove a point. The first column, Set #1, is just a set of random numbers. The second column, Set #2, is an operation on the first, for all but #20. That last one? Well, I solved for the value that would make the total and average the same (for this example).
The point is, if you were just looking at 2 different data sets, or perhaps how a data set modified over time (you’re looking at compile times for your program, for example, and checking to see whether your servers are performing better than they were last year), you may be missing crucial data if you only look at average.
Your average may be exactly the same, or only slightly worse than before, but you may have introduced significant outliers that are being obscured by the measurement. Here’s the same data, but with some percentiles added on (some rows omitted for clarity):
With this presentation, it would be obvious that something was systematically different between Set #1 and Set #2, and you’d have an indication that you had more investigation to work with.
And this example was for a set with very few elements. What happens if you’ve got a vast data set (something accessing big data, for example, or a time series with daily stock prices stretching back generations across thousands of stocks)? How might looking beyond averages help you identify problems?
Well, one way is to look at results graphically. Here’s what I mean.
Again, I have two data sets, much larger than before, but I’m still simplifying for the example. [This is based on an actual issue I encountered while still an actuary.]
Let’s assume that you’ve got some kind of measurement which produces these values:
What’s going on here? Average looks good. Minimum, 25th percentile and 75th percentile look good. Spot-checking one or two seems right. How are we getting a max of 11.79? And, importantly, is that a problem?
The reason this is a problem is that the maximum that should come out of this expression is 10.0. Mostly because I forced the issue by defining these two columns to be a formula of Rand()*10, which means I expect the max to be no more than 10.
Hmmm… is it something in the data? Let’s look. First, I sorted from smallest to largest (as you might do for a stochastic simulation):
As you can see, there’s something strange going on. [You can’t really see Set 1, but trust me, it’s there.] The strangeness is that jump at the end. Something systematic? Might this suggest there’s an error in the model, or a source data element?
Let’s dig further. What if I go back and reorder by item number (or scenario number, in a stochastic simulation)? What does that look like?
Now we can see that there’s clearly something going on with Set 2 in the first few trials or so.
When I look back through the model, I see that I had a slightly different formula in the first 100 cells. Instead of Rand()*10 (to force it to be a random number between 0 and 10), I had Rand()*10 + 2.
Yes, this was a little bit of a contrived example. But a similar experience actually happened during some stochastic testing of insurance liabilities once. When reviewing results, the average looked reasonable and ordered smallest-to-largest looked reasonable also. When we looked at the results in simulation order, though, we saw that there was something different about a set of early results.
It turned out that these stochastic scenarios were reading the inputs for a set of deterministic scenarios for that first batch, and throwing off the ultimate effect of the model.
It didn’t take long to correct those inputs, though, and re-run. But if we hadn’t looked at more than just the average, we never would have caught the mistake.
Yeah, But Is It That Worth It?
I don’t know. Some might find this a level of detail too specific for much of your work. But, when you’re dealing with huge data sets, complex relationships, and razor-thin margins for error, perhaps it’s not too precise.
Would the company have made different decisions about the insurance portfolio had those erroneous results been incorporated into the regular reporting? Probably not. The magnitude of their error wasn’t that great, just like the magnitude in my contrived example wasn’t that big. Heck, it wasn’t even large enough in the first 100 to move the average. So is it that big a deal?
Well, unfortunately, the answer is the standard: It depends.
Sometimes it will be. Sometimes it won’t be. And there’s no cut and dried formula to tell when it is and when it isn’t worth it to investigate your results for anomalies further.
Some of it comes with experience. Some of it comes from just being curious and following intuition. Some of it comes from your superiors needing to be absolutely sure of every decimal point you can give them, so you do what you’re asked for without worrying about it.
But, eventually, you’ll learn to add your own systems for spotting anomalies. And you’ll implement them early enough in your process that you can head off distractions before they appear.
Look, I’m all for taking shortcuts when they’re called for. Nobody really needs to take the back roads every time. That’s why we built the highways, damn it. That’s also why, to be frank, your average stinks. It’s a shortcut, and, as I’ve shown, just using an average (heck, even just using percentiles alone) can keep you from the insights you need to make informed decisions.
Because even with shortcuts, automation, dashboards, and whatever comprehensive views your C-suite is looking for, sometimes it’s good to actually get back into that data sandbox and play around a little bit.
Who knows – maybe you’ll see me there.