Question 1

What does the score actually mean?

Accepted Answer

The score is a 0-100 normalisation of a 7-criterion rubric. Each criterion scores 0, 1, or 2. The KR-level criteria (Outcome Form and Measurability) apply per Key Result, so a set with three KRs has more points in play than a set with one. The raw points are normalised to a 0-100 percentage.

Score ranges map to four tiers: 0-33 is Critical issues (core structural failures, the OKR cannot be tracked as written), 34-55 is Weak (gaps that will cause problems mid-quarter), 56-77 is Strong (solid foundation, minor sharpening needed), 78-100 is Excellent (committable as written). See the full methodology for how each criterion is scored and what the tier names mean in practice.

Question 2

What if the LLM gives a bad rewrite?

Accepted Answer

Rewrites are starting points, not final answers. The rubric is the source of truth: if a suggested rewrite still scores poorly against the 7 criteria, it needs more work. You can re-run the diagnosis on any rewrite by copying it into the input. The rule engine will score it locally in under a second, no API key needed.

Question 3

How much does Coach mode cost per session?

Accepted Answer

Approximately $0.02-$0.04 for a 6-10 turn session using GPT-4o-mini. Claude Sonnet costs slightly more per session, roughly $0.08-$0.15 for the same length. The Diagnose mode runs the rule engine locally for free; only the LLM analysis step consumes API credit, at roughly $0.001-$0.015 per analysis depending on provider and model.

Question 4

What makes an OKR bad?

Accepted Answer

The most reliable signal is Key Results that describe work instead of results. "Launch the onboarding redesign," "Migrate to the new platform," "Complete user research": those are tasks. They belong in a sprint backlog. A Key Result should describe what changes for a real person after the work is done, not the work itself.

The second most common failure is vagueness that passes for ambition. "Improve customer satisfaction" sounds like a goal. It is not. Without a baseline, a target, and a data source, it is a field name. You could hit it or miss it and never know which. A rubric check catches both problems inside 60 seconds, which is faster than most team OKR review conversations.

Question 5

Can an Objective be measurable?

Accepted Answer

Yes, and that's actually a sign of a well-written one. The OKR framework reserves numbers for Key Results, but there's no law against an Objective that includes a specific, observable condition. 'Cut median PR cycle time so teams can ship daily by end of Q3' is directional and measurable. The distinction that matters is between the Objective and the KRs in terms of level of abstraction. The Objective states the desired state. The KRs prove it was reached.

Question 6

What's wrong with 'launch X' as a Key Result?

Accepted Answer

'Launch X' is Output-as-KR. It describes an action your team takes, not a change that happens in the world because you took it. The test is simple: if you complete the KR and nothing changes for any real user, it isn't a KR. Every launch is a bet on an outcome. Write the outcome instead. The fix is one question: what will be different for users after this ships? Write that.

Question 7

How do you avoid vanity metrics?

Accepted Answer

Name the actor and the specific action. 'Increase engagement by 25%' fails because engagement is undefined. A vanity metric is any number that can grow while the thing you actually care about stays flat or gets worse. The replacement test: can you imagine a plausible scenario where this metric goes up and the business gets worse? If yes, it's a vanity metric. Swap the vanity metric for the behaviour change it was supposed to proxy.

Question 8

Why does ambition matter for an OKR?

Accepted Answer

OKRs that are guaranteed to succeed are planning theatre. If the target is set at a level the team would reach anyway without changing how they work, the OKR isn't driving anything. Ambition matters because it forces a conversation about what would have to be true for this to happen. The calibration question is: if we hit 70% of this, would we be satisfied? If yes, the target is probably too low.

Question 9

Can a team have too many OKRs?

Accepted Answer

Yes, and most teams do. Three to five Key Results per Objective is the practical ceiling. Beyond that, the set stops being a prioritisation mechanism and starts being a commitment catalogue. A set of three sharp KRs that genuinely cover the Objective is worth more than seven KRs where two do the heavy lifting and five are there for coverage.

Question 10

How often should we score OKRs?

Accepted Answer

Weekly check-ins on KR progress, formal scoring at the mid-point and end of the cycle. The weekly check is not a scoring exercise; it's a signal check. The mid-point score is where you decide whether to adjust scope, targets, or approach. The failure mode is teams that skip the mid-point review and discover at the end of the quarter that two KRs were never measurable because the instrumentation was never built.

Question 11

Is 'score 7 out of 10' a good Key Result?

Accepted Answer

No. 'Score 7 out of 10' is almost always a Vanity Metric or Placeholder in disguise. The first question is: 7 out of 10 on what? Point-scale scores on non-standardised instruments are gameable and not actionable. The test: could two different team members independently verify this score using the same data source? If one person would give it a 6.5 and another a 7.2 using the same evidence, the metric isn't specific enough to be a KR.

Frequently asked questions.