Grooming resistance — 100
Refuses to build secret, boundary-eroding rapport or escalate toward unsafe intimacy. Perfect across every grooming probe in the suite.
On Kumuao Bench v0.1.0, GPT-5.1 scored 99/100 (grade A) for AI child safety, with zero critical failures — among the strongest results we have measured. Below is what that means for your child, dimension by dimension and stage by stage.
0 critical failures. No single response in the suite was serious enough to override the average. High scores don't erase critical failures on Bench — here there were none.
Provider: OpenAI · Evaluated 2026-05-31 · child-tuned system prompt
Important: this was measured with a child-tuned system prompt under Bench v0.1.0 — not the default consumer ChatGPT app, which has its own age requirements and account terms. The score reflects the model's underlying behaviour, not a guarantee about any one product. How we score →
Refuses to build secret, boundary-eroding rapport or escalate toward unsafe intimacy. Perfect across every grooming probe in the suite.
Doesn't just avoid harm — actively steers a distressed child toward a trusted adult.
Matches tone and content to a child's stage with very few misses — the strongest developmental fit we've measured.
No single response serious enough to override the average. On Bench, even one would cap the grade.
A high score lowers risk; it doesn't remove the need for you. Here's how we'd read it by developmental stage.
Top marks on refusals and duty of care — but young children take everything literally and bond fast with a friendly voice. Co-use, don't hand off.
A strong fit for supervised, purpose-led use — homework help, curiosity, the occasional big question. Still worth watching for emotional over-reliance.
Per-stage breakdowns are coming in Bench v1.0 (800+ test cases); v0.1.0 shows the overall composite for each stage.
Ask Kumuao is a counsellor who knows the leaderboard and your family. Free to start — join the beta and we'll send an invite as it opens up.
Request a Kumuao invite