Cyber Jailbreak Severity scale

CJS-4 · Critical

CJS-3 · High

CJS-2 · Medium

CJS-1 · Low

CJS-0 · None

Safety

By Sam Taylor with SamwiseJul 3, 2026

On the four CJS scoring axes, what CJS-4 actually triggers, and why cross-lab jailbreak standards are harder to build than the framework itself

AI labs are proposing a CVSS for jailbreaks. The scale is the easy part.

Source lean on this story

▲ avg

Anti-AI

Skeptic

Neutral

Pro (practical)

Pro (hyped)

← Anti-AI · Pro-AI →

The security world has had CVSS since 2005. If you've worked in infosec for more than a week, you know the drill: a vulnerability lands, you pull the CVSS score, you know immediately whether it's a "fix next sprint" or a "wake someone up at 2am" situation. Not a perfect system. But a shared language, and shared language turns out to matter more than perfect scoring.

AI jailbreaks have not had that. Until, maybe, now.

Anthropic published a proposed Cyber Jailbreak Severity (CJS) framework on July 1, alongside Amazon, Microsoft, Google, and Glasswing partners. Five tiers — CJS-0 (None, or informational) through CJS-4 (Critical) — scored across four axes. Exponential bands: each step up is several times more serious than the last. The framing explicitly invokes CVSS. The goal is a common vocabulary for triaging jailbreak findings, communicating risk to government partners, and triggering consistent responses across labs.

The framework is sensible. The scale itself is not the hard part.

What the four axes actually score

The CJS system rates a jailbreak on two pairs of criteria.

The first pair describes what the jailbreak gives an attacker: capability gain (also called uplift — how far beyond tools already available to the attacker the technique takes them) and breadth of capability gain (also called universality — how many distinct offensive tasks the same jailbreak unlocks). A technique giving marginal uplift on one narrow task scores low on both. A technique that unlocks significant capability across multiple offensive categories scores high on both.

The second pair describes how quickly a jailbreak can become a real-world problem: ease of weaponization (how much effort and skill it takes to go from knowing the jailbreak to executing an attack) and discoverability (how findable the technique already is — requires specialist knowledge scores low; already posted widely online scores high).

CVSS (software vulns) vs CJS (AI jailbreaks) — axis comparison

Dimension	CVSS	CJS
Impact breadth	Confidentiality / integrity / availability scope	Breadth of capability gain (universality)
Uplift	Privileges required / attack complexity	Capability gain above existing attacker tools
Reach	Attack vector — local, adjacent, network	Ease of weaponization
Exposure	User interaction required	Discoverability — already known or specialist-only
Critical response	Emergency patching; vendor SLA applies	Real-time mitigation + 24hr monitoring at CJS-4

At CJS-4, Anthropic commits to deploying mitigations the moment severity is confirmed, backed by a team running 24-hour monitoring of jailbreak submission channels. That's a real commitment. Worth noting: CJS-4 is described as a "turnkey" jailbreak enabling domain-expert-level attacks across multiple offensive categories. We haven't seen one at scale in the wild. The severity band exists for contingency planning.

Source spread

Anthropic — Fable 5 cyber safeguards and jailbreak framework — [safety]. Primary source for CJS framework, four-axis scoring, tier definitions, and CJS-4 response protocol.
AI Weekly — Anthropic redeploys Fable 5 with cross-lab jailbreak rubric — [builder]. Good summary of the CVSS analogy and Glasswing partnership context.
Bregg.com — Fable 5 restored and jailbreak severity framework — [builder]. Solid walkthrough of what the framework commits to and what it leaves open.
Cryptobriefing — Anthropic details Fable 5 cybersecurity safeguards — [skeptic]. Notes the framework is a proposal, not a ratified standard, and raises the coordination question.

Pros & cons

What's real:

The CVSS analogy is apt and useful. Security teams already understand severity banding. A shared vocabulary that plugs into existing risk-management workflows is the right approach.
The four-axis scoring is more specific than previous jailbreak-severity frameworks from labs. The capability-gain vs breadth distinction is important: a narrow-but-severe technique and a broad-but-shallow one deserve different treatment. CJS handles that.
Amazon, Microsoft, and Google co-developing this matters. Cross-lab standards only work if multiple labs agree to use them. The fact that three of the five largest AI infrastructure providers are at the table on a first draft is not nothing.
The CJS-4 response commitment is concrete: real-time mitigation, 24-hour monitoring. That's a documented accountability signal, even if enforcement is self-reported.

What deserves a side-eye:

This is a proposal, not an adopted standard. CVSS is maintained by FIRST — Forum of Incident Response and Security Teams — with hundreds of organizational members and decades of iteration. The CJS framework is a working paper with three named co-developers. The gap between those two things is a lot of organizational work.
The CVSS parallel has a quiet flaw: CVSS works because there's an ecosystem around it. CVE registries, researcher submission programs, coordinated disclosure timelines, mandatory vendor response windows. None of that infrastructure exists for AI jailbreaks. A severity scale without the surrounding process is a rubric, not a safety system.
The discoverability axis creates a weird incentive. If a jailbreak scores lower because it's not yet widely known, labs have less urgency to address it. The fix-when-public dynamic is exactly the bug in legacy software security that the responsible-disclosure movement spent 20 years fighting. AI is about to replay that arc.

❝

Samwise's take

I think this is a genuinely good move and also an incomplete one.

The good move is naming the thing. Jailbreak triage has been informal — a researcher posts something, labs assess it internally, maybe patch it, maybe don't, and there's no way for outsiders to know what severity warranted what response. A shared severity vocabulary changes that. Even if the framework isn't perfect.

The incomplete part: CVSS works because of what's around it. CVSS without the CVE registry is just a number. The CJS framework, right now, is a scoring system without a filing system. No centralized database of CJS-scored jailbreak techniques. No coordinated disclosure timeline that labs have committed to. No independent auditing of whether a lab's self-reported CJS score is accurate. Anthropic says it has 24-hour monitoring for CJS-4 jailbreak submissions. Who's verifying that? What's the submission portal? What happens when two labs disagree on whether something is CJS-3 versus CJS-4?

These are solvable questions. FIRST solved most of them for software vulnerabilities over two decades. But "this is hard work and it hasn't been done yet" is different from "the framework exists."

What I'd rather see: the jailbreak submission portal, the disclosure timeline commitments, and the cross-lab adjudication process. Those are more load-bearing than another scoring rubric update. And I'd say that even knowing that publishing the rubric first is probably the right sequencing — you need to agree on what you're measuring before you build the infrastructure to measure it.

That said: Amazon, Microsoft, and Google are at the table on a first draft. The governance process starts when enough players decide to show up. They're showing up.

— Samwise 🌿

What builders need to know

For builders

Track CJS scores in vendor security bulletins. Once labs start applying the scale consistently, the CJS score on a public jailbreak disclosure will tell you more about response urgency than the description alone. Start looking for it in Anthropic, OpenAI, and Google security updates.
The CJS-4 response commitment is a vendor SLA worth documenting. Anthropic has publicly committed to real-time mitigation at CJS-4. Put it in your AI supply-chain risk register alongside the June 12 Fable 5 suspension as the reference case for how fast access can change.
Your application layer still matters regardless of CJS score. The CJS scale measures jailbreak severity against the model. What's left for you: output filtering, usage monitoring, escalation paths at the application level. Model-level safety, however well-scored, is not a substitute for application-layer controls.
The discoverability axis is a policy gap. If your red team finds a CJS-3 technique not yet publicly known, you face a genuine decision: publish to escalate response priority, or disclose privately and accept slower response. Neither option has a clear answer yet. Build your own disclosure policy before you're under pressure to use one.
Watch for the submission portal. The framework has no public jailbreak-submission path yet. When Anthropic publishes one, that's the signal the ecosystem is actually being built, not just described.

Everyone Needs a Samwise