How well we predict. Published, misses included.
Most planning forecasts are never scored. We score ours in the open. This page shows how closely our evidence engine's approval predictions match what councils actually decide, and just as honestly, where the engine is only modestly better than the base rate. The number we put against a single site is never sold as a verdict; only this aggregate record is published.
Two numbers, both true. The honest pair.
A prediction can be well-calibrated (when it says 60%, about 60% happen) yet only modestly discriminating (it struggles to separate the eventual winners from the losers on a single site). Ours is exactly that, and we say so.
In plain terms: anyone who hands you a confident, site-specific approval percentage is overclaiming. The information available before you submit doesn't support it, and here's the evidence, including our own ceiling. What the engine is good for is the gradient and the structure of risk, not a single decimal on one plot.
The calibration curve
Each dot is a bin of decided applications from the hold-out set; dot size is the number of applications in the bin. The closer the dots sit to the dashed diagonal, the better the calibration. Faded dots are thin-sample bins, shown rather than hidden. Source: time-based hold-out, trained on 9,535 decisions before 1 Jul 2025, tested on 2,871 after.
The prospective ledger: committed before the council decides
A retrospective curve only proves we fit the past. The real test is forward. So when the ledger opened we committed the engine's prediction for every currently undetermined small-site application in the dataset, and timestamped that file so it cannot be quietly rewritten once the decisions land. As councils determine them, each call is scored against the outcome, wins and losses both.
Prospective calibration appears here once 30 predictions have resolved. Determinations typically take 60+ weeks, so the forward record opens slowly and on purpose.
How the ledger resists tampering (locally, no blockchain)
Each committed file is hashed; each ledger record signs the file hashes plus the hash of the record before it, with our Ed25519 key. Any silent edit, deletion, or reordering of a past prediction breaks every record after it, and the signature proves authorship. The public key below lets anyone verify it independently. We pin the chain to an outside clock by publishing the head hash and committing the chain to version control.
- Public key
- a1d803cd2df8b4ad3836e4a15696b7e61a49378d086101f3728ca52a3be903d2
- Chain head
- 1c8462a4b3ad3049b9916e7debd70e422d7ba8a5e7165505a115fe44986ea307
- Records
- 1
Verify the chain:
python3 _scripts/local_attest.py verify
Honest about the method: a local timestamp is self-asserted, so we could in principle re-sign the whole chain against a false clock. The signed chain stops silent tampering; pinning the head to version control and publishing it here is what stops back-dating. It is deliberately simpler and more inspectable than a blockchain anchor, and good enough for what it claims.
What this record can and cannot tell you
- It is retrospective until the prospective ledger resolves. Today's calibration comes from a hold-out on past decisions. The forward record is committed but mostly still pending, so treat it as opening, not proven.
- Base rates are conditioned on submission. The dataset sees applications that were lodged and determined. The sites a developer rejected in due diligence, or never tested because they were hopeless, are invisible, so a real-world "chance of approval across everything you might buy" is lower than any figure here.
- The per-site probability is internal. We publish the aggregate calibration as an honesty check; we do not sell a single site's percentage as a verdict. The product value is the structure of risk: which boroughs, types and design positions are hard, and why, not a decimal.
- Planning is non-stationary. A new London Plan, a change of administration, or a policy reform can shift the patterns the model learned. The record is date-stamped for exactly this reason: when calibration drifts, that is the signal of a regime change, not a number to keep trusting.
Method: deterministic logistic model over pre-submission features (borough, site type, PTAL band, conservation status, scale), with a leak-safe design-proximity step; time-based hold-out evaluation. Small sites, ≤9 units. Descriptive planning intelligence, not regulated advice.