Why metrics plus accountability doesn’t always equal success

“You get what you measure,” is a common phrase in business. You measure a thing. You focus on that measure. You reach out with your feelings to the measure, like a Jedi. And boom, the measure moves.

“You get what you measure,” is a common phrase in business. You measure a thing. You focus on that measure. You reach out with your feelings to the measure, like a Jedi. And boom, the measure moves.

But here I am at home writing this article while clocking in at 202 lbs on the scale. I keep measuring myself. My goal is to get back to a svelte 180 lbs someday, so my pants will fit again. But I keep eating Jeni’s Ice Cream. And the scale stays put.

Am I getting what I measure?

Metrics on their own

I work at Mailchimp as chief product officer. I’ve worked with folks across the company for years as we’ve expanded what was once merely an email product into a complete marketing platform for small businesses. To achieve an increasingly sophisticated product, we’ve had to become increasingly sophisticated in how we measure performance. It’s been a journey, transitioning from “startup to grownup,” as we’re fond of saying.

At the outset, Mailchimp was primarily a qualitatively driven organization, heavy on customer empathy and the user-experience research needed to fuel it. Our company motto continues to be “listen hard, change fast.” Talking to customers, visiting customers, sitting with them as they use the product—that’s the qualitative “hard listening” that makes the dream work. I love our UX research team.

But we were always a little light on the hard numbers. Of course, this is by no means the worst way to be. If you’re using qualitative data to solve customer problems, you’re unlikely to stray off course. But sometimes folks, including me, would cherry-pick their favorite qualitative finding to make their case on a given matter. We had no “north star” measures that we could all point to in order to settle debate.

Qualitative research can be a little mismatched for the task of rapidly iterating on a release. Nor is it much help in driving A/B testing. You’d be unsurprised to learn that during Mailchimp’s purely qualitative era, we did very little (basically zero) A/B testing of the product, or comparing one version to another—because A/B testing requires that you have articulated, quantitative measures that you care about.

So we eventually introduced quantitative measures to our product teams and built out our product analytics practice. Today it nicely complements our UX research practice, which is part of the same team.

But I’d be lying if I said our early forays into measurement went swimmingly. While different teams selected key performance indicators (KPIs) that aligned to their missions, when we checked in with those teams and asked how their KPIs were doing, we noticed a few things:

The movement in team KPIs seemed to track with the macro-level growth of the company. The teams couldn’t articulate how their actions affected the KPIs they’d chosen.
When we looked at the work the teams had chosen to prioritize, it was all undoubtedly valuable according to some (often qualitative) measure, but it wasn’t necessarily the most valuable work given their selected KPI.
In this case, we didn’t get what we measured. Instead, we got the same team behaviors we’d always had… plus some nice dashboards. A measure on its own, then, is a tree falling in the forest.

Adding in accountability

Now, what you’re probably thinking at this point is that what Mailchimp left out of the equation here is accountability. We did that on purpose. Why? Because, like everyone else in the world, we have seen what happens when you introduce a mixture of equal parts measures and directly accountability. It’s dynamite in the worst way.

At Mailchimp we played with direct accountability to a measure one of our teams a few years back. What we found was that the team chose work that drove the metric they were accountable for—to the detriment of overall customer experience and revenue. It was a classic example of perverse incentives.

When you add accountability to the movement of a measure, you do indeed “get what you measure.” But it often feels Shakespearean. You get what you measure, but what you get is not what you wanted.

It’s the same debate around tying executive performance to stock price. You get stocks that increase, but the price you pay is long-term viability of the firm, not to mention the occasional but severely detrimental cost to the economy and society as a whole (see the whole 2008 financial crisis).

The saddest example of perverse incentives I can provide comes from my favorite fast-food restaurant. It’s a fried chicken joint that shall remain nameless. At my favorite chain, they measure drive-thru time from ordering at the window to completion of that order. This measure is used to track the effectiveness of the employees at getting cars through the process, with their hot, delicious chicken and biscuits served in a timely manner. In practice though, the measure slows everything down.

Every time (and I mean every time across multiple locations) I order at a drive-thru, the employees and I do the same choreographed “dance of the drive-thru timer.”

I order and the timer starts.
I get to the window and pay. My food isn’t ready, but I don’t expect it to be. I’m happy to wait at the window until it is.
The drive-thru window employee asks me to go park in the parking lot instead, and they will bring my food out to me later.
I join other cars in the parking lot.
Some time later, an employee saunters out to the cars with bags of chicken. They make the rounds, asking what each driver ordered. Eventually I’m matched with my order, and I drive away.
I eat the chicken, and all is forgiven.

You see, when I’m sent to the parking lot, the employees can tell the system that my car has cleared through, even though it honestly hasn’t. Now the incentive to serve me quickly is actually removed. And by adding a walk around the counter and into the parking lot, not to mention the process of matching of drivers to orders, more time is added to the delivery of the order.

Did the restaurant’s management “get what it measures?” If they’re measuring cars going past the window, then indeed they got what they measured. But I doubt that’s what they really wanted.

Indirect accountability by way of behaviors

The fact is, you can introduce accountability to metrics, but I find it’s best if done indirectly, by holding folks accountable to a set of behaviors that are meant to drive their measures.

For our teams building products at Mailchimp, the essential questions become then not “did you move your measure?” But rather:

What are the levers you have on the team for moving your measures?
Why did you think the work you just completed would move your measures? Did it? Why do you think it did or didn’t have the desired results?
What have you learned from that experience that will influence your next work?
Tell me why the work you’ve planned next, then, is going to move your measures?

These types of questions hold employees accountable not to the metric, but to desirable behaviors that support moving the metric. Specifically in Mailchimp’s case, once we introduced this line of questioning with teams, we saw changes in behavior. We finally got what we measured in the best way, which is to say that teams began to learn how to drive performance and could justify their work as being accretive to company and customer value.

So when I review my product managers at the end of the year, I don’t necessarily review them based on the movement of their KPIs. I review them based on their execution of the behaviors that should move their KPIs. I’m far more likely to ask, “Did this product manager run a work prioritization process that was true to the metrics and mission of the team, creating work that’s defensible from a metric’s perspective?” than I am to ask, “Did this product manager move their metric by 5%?”

Going back to the fried chicken example, what then might this restaurant do? Well, time through the drive-thru is their ultimate customer experience (CX) metric, but direct accountability doesn’t work. Instead, they should identify the control metrics that feed into a swift drive-thru experience—things like making sure that new chicken is dropped in the fryer at the right time as previous batches run low, or making sure that quantities of sides are kept just high enough to sustain bursts in drive-thru.

I don’t work in fast food, so don’t take me too literally here. The point is to select fine-grained measures that have less likelihood to be gamed in a way that’s detrimental to the business, and that are connected to the behaviors you want so that you can drive your ultimate metric forward.

What happens when you add direct accountability back in?

The problem with tying accountability to specific behaviors, rather than the direct movement of a measure, is that you will ultimately skew toward greater learning and experimentation at the expense of some smaller “sure things.” You indeed tolerate situations in which the measures don’t move at all, so long as the team is doing what it can to move them.

In the technology space where I work, this is an acceptable risk, especially for teams whose metrics are complex and difficult to move. In a rapidly evolving and competitive landscape, you need to not penalize failure if you want teams to attempt transformative work rather than leaning too heavily into incrementality.

But in instances where there’s a lot of direct control over the movement of a measure, it can be ok to add back in some direct accountability for moving the measure. Each company needs to decide how hard to lean on that—because as you lean harder on a team to make the measure move (say, tying compensation to it) then you’re going to see side effects. For example, big bets that have a high likelihood of failure will evaporate from the roadmap. Teams will instead focus on more reliable levers for changing the measure. More incrementality, less transformational thinking. Maybe that’s ok. Maybe not.

You should always hold folks accountable for behaviors that support driving measures. You should lean selectively on the direct accountability after that, depending on how you view the work, the team, and the measure within your larger portfolio as an organization.

You get what you measure. Kinda.

To sum up, KPIs are not magic incantations. Just because you know at a quantitative level how well a part of your product or business is doing doesn’t mean you will magically change the measure merely by your knowledge of it.

But measuring is the first step in having a focal point for the right conversations about behaviors. And that’s where change happens—on the scale, at the drive-thru, in a tech company, or anywhere else.