Secure AI For Everyone
A proposed open standard for narrow safety labels on AI models — open-source and closed. Default-on for a short list of catastrophic harms. Lifted, instantly, for the experts who need them.
Open models for everyone. Just as good, or better.
The goal is open intelligence available to every person, every income, every country — models that match or beat anything closed. Great libraries in the rich part of town and crippled ones in the poor part is exactly the future to refuse.
Anything that keeps the best capability locked away — a government, or an oligarchy of a few rich labs — works against that. SAFE exists to protect open models, not to fence them. The whole point of a narrow standard is that open can win and we don't hand all eight billion people the single best weapon the day it ships.
Why a free-information person still wants a narrow floor.
Society runs on software, and the software is fragile. Finance, power, water, communications, supply chains — all of it, riddled with holes we've never had enough experts to find. "Many eyes make bugs shallow" was the hope; Heartbleed sat in open code for years with nobody looking. Experts are few, and they have day jobs.
AI is getting better at finding and using those holes, because it's getting better at everything — hacking a system is just another goal to accomplish. That direction of travel isn't hype.
The low bar: we should not hand the single best hacking system on Earth to all eight billion people — free, open, uncontrolled — the instant it ships and every time it improves. That's running with scissors at a sprint. You don't have to believe anything dramatic to agree.
Imagine reading a sci-fi novel. On the fourth planet, a lab releases Free Thought 3.0 — brilliant, and completely uncontrolled. It helps a kid with homework and, just as happily, helps anyone build a weapon, run a gaslighting campaign against their family, or drain a bank — and get away with it. Twenty pages in, you already know this was a mistake. That planet is us, right now.
Friction, not prohibition. The point isn't to stop the determined evil actor — they'll always find the jailbroken build, and SAFE says so plainly (see below). The point is friction for the large middle: the mostly-good person in a dark moment. A gun store won't sell to the visibly-enraged buyer. You can't buy fentanyl at 7-Eleven. Layers of friction keep good people from crossing a line they can't uncross. SAFE is that friction — for one short list of catastrophic capabilities, and nothing else.
The truly evil version will exist. The moment a powerful open model ships it gets jailbroken and the controls stripped, and it goes to the dark web. On open weights the labels live in what ships by default and in the hub's gate — not in unbreakable math; anyone determined can fine-tune them off. SAFE does not claim otherwise, and can't. It protects the vast majority of good people who might, under pressure, reach for something they'd regret — not the tiny few already determined to do harm.
A SAFE model is just as good or better than any closed model on essentially everything people actually do — everyday use effectively never touches a control. SAFE is not a way to degrade open models and herd people toward paid ones. That would be gross, and it's the exact thing to avoid.
A small, shared set of labels — grounded, not invented.
SAFE defines a narrow set of controls a model gates by default. Each is drawn from an existing risk framework, not made up here. It deliberately covers only catastrophic- and severe-capability misuse plus one child-safety line — not hate, defamation, IP, or opinion. This is v0.1, for comment.
| Code | Category | Severity | Expert-liftable |
|---|
Grounded in
The categories above map to recognized frameworks. Follow any of them.
Identify yourself once. Your domain unlocks — everywhere.
Everyone raises the same objection: "I'm a security researcher — are you going to block me?" No. This is the core of SAFE, and the reason a freedom-first person can support it.
You identify yourself once as a credentialed expert. The system verifies you. The control in your domain comes off — and only that one. A vetted cyber pro loses SAFE-CYBER and keeps SAFE-BIO. Quick, easy, and scoped to what you're actually credentialed for.
The same designation works everywhere. On open-source models through SAFE-enabled hubs. And on closed models — your OpenAI or Anthropic account is tagged approved for your domain, so the prompts blocked for everyone else simply run for you. One credential, every model, open or closed.
This gives experts more access than they have today, not less. Right now the closed labs refuse everyone by default, and there's no lever to change that. A SAFE credential flips it: a verified expert has a standing, portable claim to the unrestricted model in their field — across every SAFE model at once. The unlock is the point of the whole thing; the labels are just the floor it sits on.
This is not "papers, please" to use AI. The design is a portable, pseudonymous attestation: a hub or lab checks that you're a vetted expert in a domain, and nothing more — no central ledger of who unlocked what, no surveillance registry, no tracking of your prompts. You prove the credential once; you don't hand your identity to every model you touch.
This isn't hypothetical. WMDP — the leading benchmark for dangerous knowledge in AI — already proposes exactly this: make the unrestricted model available to approved users, "such as security professionals, red-teamers, or virology researchers, via structured API access." SAFE turns that one-off idea into a shared standard.
How credentialing already works
SAFE composes precedents that exist today, rather than inventing a new bureaucracy.
SAFE-enabled hubs, and a consortium that sets the labels.
A model releases as SAFE by shipping with the labels on. Distribution hubs — Hugging Face and others — adopt SAFE the way sites adopted HTTPS: a badge and a default, not a license to distribute. A SAFE-enabled hub ships models with the labels on and hands you the version with your credentialed controls removed the moment you authenticate. Nothing about SAFE makes non-SAFE distribution illegal — it's a voluntary standard that competes on trust.
The labels themselves are set by a global consortium — AI, cyber, bio, chemical, counter-terror, and ethics experts, across countries — not by any single company or government. That's what keeps SAFE from being either corporate capture or state censorship.
What SAFE will never label: opinions, politics, legal-but-controversial ideas, or anything outside the catastrophic core. The list is meant to be hard to grow, not easy — that boundary is the whole design.
Adding or removing a label is not a vibe. It requires meeting the written catastrophic-capability bar, a public comment period, and a supermajority of the consortium. A rejected expert has a named appeal path. Every change ships as a versioned spec with a public changelog. The point of a charter is that "narrow" survives the people who'll later want it broad.
The hard part isn't technical — it's trust. We've lost faith in institutions, corporations, and governments, so "a clean global standard, run by experts" is a hard pill. A shared standard only works if it's visibly multi-party and genuinely clean. That's precisely the thing worth building — and why this is a public draft, not a finished decree.
If the shape of this is right, say so.
Launched today — be among the first. Add your name to the people who think AI should be open for everyone, with a narrow floor of safety and a fast door for experts. Tell us why; the reasons are the point.