F1 · PROVEPilot
The pilot worked. The business broke it.
Wire it to your real systems — Salesforce, the warehouse, the approval flow — and it stops working. Nobody can tell you which step.
90% × 6 steps = 53% · × 20 = 12%
Your auditor asks“Show me the proof this multi-step action met its specification before it reached the customer.”
BCG · McKinsey 2025
F2 · REPLAYProduction
Six weeks after rollout, accuracy dropped.
It was flawless in the pilot. The model never changed — production data did. Pilot data is not production data.
42% of orgs abandoned most AI initiatives in 2025
Your auditor asks“Show me the audit record that this performed within spec on the last 30 days of production traffic.”
Cursor · Apr 2025
F3 · BINDProduction
Internal tests pass. Customers complain.
Your evals look great; the support tickets keep climbing. The benchmark stopped measuring what the model actually does.
Eval scores quietly stop tracking reality
Your auditor asks“What is the version, date, and contamination status of the dataset you used to show compliance?”
LMArena · Q4 2025
F4 · PREVENTAudit
Your governance platform records. It does not stop.
It tells you when something went wrong. It cannot tell you it will not happen again.
90% use AI daily · 18% govern it
Your auditor asks“Before the action was taken, what evidence existed that it was permitted?”
Replit · Jul 2025
F5 · SPECIFYAudit
Risk will not sign off on a guess.
Engineering shipped six months ago. Risk still has not approved it — nobody can produce the document they are asking for.
Only 28% of enterprise AI projects fully pay off
Your auditor asks“Where is the proof this system’s behavior was specified, verified, and bound to the audit record?”
UnitedHealth · 2024–26
F6 · LEADAudit
The deadline moved to 2027. Your build cycle is still longer than that.
Colorado and the EU pushed their AI mandates to 2027; SR 26-2 handed AI governance straight to you. The committee still meets monthly. Time is not on your side.
CO Jan 2027 · EU 2027 · SR 26-2 now
Your auditor asks“In court, in front of a regulator — can you prove this system did what it should, and only that?”
EU · CO · SR 26-2