HomeInsightsKPIs in AI. How to Measure What You Want to See.

What happens when kpis in ai meets AI?

Anatomy of a KPI designed for the result What happened here? Three mechanisms worth naming - but more important is learning to recognize them in action. Mechanism 1: Goodhart's…

KPIs in AI. How to Measure What You Want to See.

What's worth measuring in an AI project and what isn't

A small, subjective list. Based on what's given me trouble.

Worth measuring early Not worth measuring at MVP stage
Input data quality Project ROI
Adoption rate - are people actually using it? Cost-per-query in production
Decision time (before vs. after AI) Model accuracy in isolation from context
Number of errors caught by users NPS after one week
Time needed to fix a model error Comparison with competitors' solutions

This list is subjective and depends on project stage and system type. What makes sense at pilot stage may be a poor metric in a mature production system - and vice versa.

Why is ROI a bad metric at MVP stage? Because an MVP is a hypothesis. You're measuring the ROI of a hypothesis before you know whether the hypothesis is true. That's like rating a restaurant based on the first dish out of the test kitchen.

Adoption rate is an often undervalued metric. Don't ask "how many times was the model called." Ask "how many times did a user actually act on the model's output." Those are different numbers. Sometimes very different.

What happens after the board presentation

Back to the company from the beginning of this article.

May. The presentation goes beautifully. All green KPIs. The board is delighted. The project gets funding. Goes to production.

June. Users start complaining. The model makes errors nobody planned for in the KPIs. Because the KPIs didn't measure errors. They measured response speed.

July. The first serious incident. The model suggested the wrong category in the classification system. Three weeks of manual corrections.

August. The project manager is surprised. "But the KPIs were good." They still are. That's exactly the problem.

December. The annual report. The AI project described as "in stabilization." That's a euphemism. Technical debt. Data quality issues nobody measured in May. Six months later, things that were visible from the start - but nobody wanted to see them.

This isn't a hypothetical. It's a scenario I've watched play out in different variations several times.

How to design KPIs to learn, not just to win

Three principles I use in practice.

Principle 1: Ask "what do we want to find out?", not "how do we look good?"

Before every metric, ask one question: what does this number tell us about whether we're doing the right thing? If it can't answer that question - remove it from the dashboard.

Example: "Model response time" tells us nothing about whether we're doing the right thing. "Percentage of users who acted on the model's recommendation" - that does.

Principle 2: Measure early and ugly

An ugly metric measured early is better than a clean metric measured too late. Don't wait for production. Measure during the pilot. Even if the data is incomplete.

Early, ugly data gives you time to course-correct. Late, beautiful data gives you material for a presentation. That's all.

Principle 3: Separate learning metrics from reporting metrics

Learning metrics are measures that help you understand what's working and what isn't. Reporting metrics are what you show the board. They don't have to be the same numbers.

What does this look like in practice? Two separate views. Board dashboard: percentage of users acting on model recommendations, decision time before/after, number of escalations. Team dashboard: breakdown of error types, adoption rate by user segment, input data quality. Everyone gets what they need - without filtering.

The board needs evidence of progress. The team needs signals for learning. Thinking that one dashboard can serve both goals is the source of many problems.

A small, honest reflection

I'm not writing this from the position of someone who always did it right. I was in that conference room. I saw that choice. And sometimes I didn't have the energy - or the courage - to say out loud: "that's the wrong question."

The pressure for good results is real. The board wants green. The project sponsor wants green. You want green too - because the project matters and you genuinely want more resources.

But green KPIs in June and red in December is a far worse scenario than yellow KPIs all year with honest information about what you're learning.

The thing is, yellow KPIs require courage. And trust that the board will value honesty over marketing.

How do you talk to a board that wants nothing but green? One technique that helped me: instead of "we have yellow KPIs" - "we have an early signal about a problem we can fix now for X, or fix later for 10X." The board doesn't want yellow. The board wants control over risk. Yellow KPIs with context aren't failure - they're risk management.

Do the KPIs in your AI project tell you something real? Or are some of them there mostly to look good in May?