The rubric is opinionated. It will flag some genuinely decent OKRs as incomplete because a baseline is missing or an alignment reference is not stated. That is a feature. The goal is not perfect OKRs by some abstract standard. It is OKRs that actually change outcomes rather than producing well-formatted planning theatre.

Try it now →

The 7 criteria

O1
Clarity

Does the Objective name a specific customer and a specific scope?

Vague beneficiaries produce vague KRs downstream. If the Objective does not say who benefits or from what, the team cannot prioritise between the many ways to reach the stated direction.

Passes (scores 2)
"Internal backend engineers stop losing time to environment failures" names a customer (internal backend engineers) and a scope (environment failures). No ambiguity about who benefits.
Fails (scores 0)
"Improve the developer experience" could mean internal engineers, external API consumers, or both. "Experience" covers everything and therefore describes nothing.
O2
Timebox

Is there an explicit date or quarter?

An Objective without a timebox cannot be tracked. Teams defer the hard conversation about whether they are on track because there is no date to be on track against.

Passes (scores 2)
"By end of Q3 2026" is an explicit reference that creates a review moment and a deadline for the team.
Fails or partial
"This year" scores 1. "Soon" and objectives with no time reference score 0.
O3
Strategy

Is the Objective problem-framed, with no solution prescribed in the text?

A team that writes solution-first objectives has usually skipped the problem definition step. If the solution changes mid-quarter, the Objective becomes false. Problem-framed Objectives survive pivots.

Passes (scores 2)
"Cut the time it takes customers to complete their first order" names a problem and a direction without specifying which features, platforms, or methods will be used.
Fails (scores 0)
"Launch the self-service checkout portal so customers can place orders faster" embeds the portal as the answer before any work has started.
KR
Outcome Form

Does the Key Result follow the structure "who does what by how much"?

Output-verbs (launch, migrate, deliver, create, build, implement) score 0. A metric with a vague actor scores 1. The full "who + does what + by how much" structure scores 2. This criterion applies per Key Result.

Passes (scores 2)
"New customers complete checkout without contacting support, from 34% to 52%" has a named actor, a specific behaviour, and a measurable range.
Fails (scores 0)
"Launch checkout improvements by end of Q3" is work, not a result. The outcome version asks what changes for customers after the launch.
KR
Measurability

Does the KR include both a baseline and a target?

One present, one missing scores 1. Neither scores 0. Both, plus an implied or named data source, scores 2. If the baseline is unknown, the correct OKR is to instrument the metric first, not to improve it.

Passes (scores 2)
"Session-to-signup conversion moves from 2.1% to 3.5% (source: GA4, 30-day rolling average)" tells you the current state, the target, and where to find the number.
Partial (scores 1)
"Increase conversion rate to 3.5%" has no baseline, so you cannot know if the market simply moved the number without any team effort.
A1
Alignment

Does the OKR set reference its parent objective or the strategy it contributes to?

Alignment is not just governance overhead; it is the mechanism that connects team effort to organisational outcomes. The work may be well-intentioned and still be optimising the wrong thing.

Passes (scores 2)
"Contributes to company OKR: Become the lowest-friction checkout experience in our category" states the link explicitly rather than assuming it.
Fails (scores 0)
An OKR set with no reference to anything above it scores 0, regardless of how well-constructed the KRs are.
C1
Completeness

Are there placeholders in the OKR set?

Anything marked X%, TBD, (owner), (tbc), or "numbers tbd" scores 0. A placeholder is a deferred decision. Submitting an OKR with placeholders is submitting a draft as a commitment.

Passes (scores 2)
Every field populated with real numbers, real owners, and real data sources, with no follow-up conversation required to interpret the set.
Fails (scores 0)
"Increase NPS from X to Y (owner: TBD)" creates the appearance of measurability without the substance.

The 6 anti-patterns

Output-as-KR

A KR that describes work your team does rather than a change that happens in the world. The verb is the tell: migrate, launch, deliver, build, implement.

"Migrate 100% of orders to the new OMS by Q3." The outcome of migration might be speed, reliability, or error reduction. Write the KR about that instead.
Impact-as-KR

A KR so high-level and lagging that no single team can control it. A team that writes this kind of KR cannot tell at week 6 whether they are contributing or bystanders.

"Increase annual revenue by 20%." Revenue is the result of many teams' work. Find the specific behaviour one level down: what do customers do differently that drives the revenue?
Vanity Metric

A plausible-sounding number that does not connect to a specific actor or behaviour. Vanity metrics are easy to move without moving the thing that matters. The test: can you imagine a scenario where this metric goes up and the business gets worse?

"Increase engagement by 25%." Engagement of what, by whom, on which surface? Name the actor and the action: "Email subscribers who click a product card, from 6% to 11%."
Placeholder

A KR with unknown numbers committed as though they were known. If the baseline is unknown, the KR is a wish. Instrument the metric first, then revisit the improvement goal next cycle with real numbers.

"Reduce load time from X% to Y%." No baseline, no target. This is a direction, not a result.
Binary Milestone

A pass/fail milestone that tells you whether something happened, not whether it worked. Usually an Output-as-KR in disguise. Ask what the milestone was supposed to change, then measure that.

"100% of teams onboarded to the new framework." If the onboarding was supposed to reduce planning cycle time, measure that.
Task-List-in-Disguise

Three or more KRs that are really one project plan. These are inputs, not results. A set with seven KRs where two do the heavy lifting and five are there for coverage is a set with five hidden tasks.

"Assign two engineers. Create the mapping document. Get sign-off from Legal." These describe effort. Compress to one or two KRs about the outcome the tasks were supposed to produce.

The "So What?" test

For every KR, ask three questions before committing. Any "no" means the KR needs rewriting.

Question 1
If all KRs turn green, is the Objective obviously achieved? If not, the KRs are not tightly coupled to the Objective. Something is missing.
Question 2
If this KR turns red, does it signal a real problem the team must act on? If the answer is "we'd notice but carry on," the KR is not important enough to be in the set.
Question 3
Does the team actually control this metric? If the metric can move due to factors entirely outside the team's influence, it is a weak signal for team performance.

The test surfaces the gap between activity and outcome. Most OKR problems are visible the moment you ask these three questions. Teams that skip the test usually discover the gap in the retrospective, which is too late to act.

How the score is computed

Each of the 7 criteria scores 0, 1, or 2. The KR-level criteria (Outcome Form and Measurability) apply per Key Result, so a set with three KRs has more KR-level points in play than a set with one. The total raw score is normalised to a 0-100 percentage.

Score range Tier What it means
0-33 Critical issues Core structural failures. The OKR cannot be tracked or evaluated as written.
34-55 Weak Some criteria pass, but the set has gaps that will cause problems mid-quarter.
56-77 Strong Solid foundation. A few criteria need sharpening before commitment.
78-100 Excellent All criteria met or nearly met. This is a committable OKR.

The tiers are diagnostic signals, not grades. A score of 42 means specific criteria are dragging the set down. The per-criterion breakdown shows exactly which ones and why.

Try it now →