Vol. 1 · Edition 027Free · No paywall

Everyone Needs a Samwise

AI news · Synthesized · Opinionated · 🌿

Cyber Jailbreak Severity scale

CJS-4 · Critical
CJS-3 · High
CJS-2 · Medium
CJS-1 · Low
CJS-0 · None
Safety
By Sam Taylor with Samwise

On the four CJS scoring axes, what CJS-4 actually triggers, and why cross-lab jailbreak standards are harder to build than the framework itself

AI labs are proposing a CVSS for jailbreaks. The scale is the easy part.

Source lean on this story
▲ avg

Anti-AI

00

Skeptic

01

Neutral

02

Pro (practical)

02

Pro (hyped)

00

← Anti-AI · Pro-AI →

The security world has had CVSS since 2005. If you've worked in infosec for more than a week, you know the drill: a vulnerability lands, you pull the CVSS score, you know immediately whether it's a "fix next sprint" or a "wake someone up at 2am" situation. Not a perfect system. But a shared language, and shared language turns out to matter more than perfect scoring.

AI jailbreaks have not had that. Until, maybe, now.

Anthropic published a proposed Cyber Jailbreak Severity (CJS) framework on July 1, alongside Amazon, Microsoft, Google, and Glasswing partners. Five tiers — CJS-0 (None, or informational) through CJS-4 (Critical) — scored across four axes. Exponential bands: each step up is several times more serious than the last. The framing explicitly invokes CVSS. The goal is a common vocabulary for triaging jailbreak findings, communicating risk to government partners, and triggering consistent responses across labs.

The framework is sensible. The scale itself is not the hard part.

What the four axes actually score

The CJS system rates a jailbreak on two pairs of criteria.

The first pair describes what the jailbreak gives an attacker: capability gain (also called uplift — how far beyond tools already available to the attacker the technique takes them) and breadth of capability gain (also called universality — how many distinct offensive tasks the same jailbreak unlocks). A technique giving marginal uplift on one narrow task scores low on both. A technique that unlocks significant capability across multiple offensive categories scores high on both.

The second pair describes how quickly a jailbreak can become a real-world problem: ease of weaponization (how much effort and skill it takes to go from knowing the jailbreak to executing an attack) and discoverability (how findable the technique already is — requires specialist knowledge scores low; already posted widely online scores high).

CVSS (software vulns) vs CJS (AI jailbreaks) — axis comparison
DimensionCVSSCJS
Impact breadthConfidentiality / integrity / availability scopeBreadth of capability gain (universality)
UpliftPrivileges required / attack complexityCapability gain above existing attacker tools
ReachAttack vector — local, adjacent, networkEase of weaponization
ExposureUser interaction requiredDiscoverability — already known or specialist-only
Critical responseEmergency patching; vendor SLA appliesReal-time mitigation + 24hr monitoring at CJS-4

At CJS-4, Anthropic commits to deploying mitigations the moment severity is confirmed, backed by a team running 24-hour monitoring of jailbreak submission channels. That's a real commitment. Worth noting: CJS-4 is described as a "turnkey" jailbreak enabling domain-expert-level attacks across multiple offensive categories. We haven't seen one at scale in the wild. The severity band exists for contingency planning.

Source spread

Pros & cons

What's real:

  • The CVSS analogy is apt and useful. Security teams already understand severity banding. A shared vocabulary that plugs into existing risk-management workflows is the right approach.
  • The four-axis scoring is more specific than previous jailbreak-severity frameworks from labs. The capability-gain vs breadth distinction is important: a narrow-but-severe technique and a broad-but-shallow one deserve different treatment. CJS handles that.
  • Amazon, Microsoft, and Google co-developing this matters. Cross-lab standards only work if multiple labs agree to use them. The fact that three of the five largest AI infrastructure providers are at the table on a first draft is not nothing.
  • The CJS-4 response commitment is concrete: real-time mitigation, 24-hour monitoring. That's a documented accountability signal, even if enforcement is self-reported.

What deserves a side-eye:

  • This is a proposal, not an adopted standard. CVSS is maintained by FIRST — Forum of Incident Response and Security Teams — with hundreds of organizational members and decades of iteration. The CJS framework is a working paper with three named co-developers. The gap between those two things is a lot of organizational work.
  • The CVSS parallel has a quiet flaw: CVSS works because there's an ecosystem around it. CVE registries, researcher submission programs, coordinated disclosure timelines, mandatory vendor response windows. None of that infrastructure exists for AI jailbreaks. A severity scale without the surrounding process is a rubric, not a safety system.
  • The discoverability axis creates a weird incentive. If a jailbreak scores lower because it's not yet widely known, labs have less urgency to address it. The fix-when-public dynamic is exactly the bug in legacy software security that the responsible-disclosure movement spent 20 years fighting. AI is about to replay that arc.

What builders need to know

For builders
  • Track CJS scores in vendor security bulletins. Once labs start applying the scale consistently, the CJS score on a public jailbreak disclosure will tell you more about response urgency than the description alone. Start looking for it in Anthropic, OpenAI, and Google security updates.
  • The CJS-4 response commitment is a vendor SLA worth documenting. Anthropic has publicly committed to real-time mitigation at CJS-4. Put it in your AI supply-chain risk register alongside the June 12 Fable 5 suspension as the reference case for how fast access can change.
  • Your application layer still matters regardless of CJS score. The CJS scale measures jailbreak severity against the model. What's left for you: output filtering, usage monitoring, escalation paths at the application level. Model-level safety, however well-scored, is not a substitute for application-layer controls.
  • The discoverability axis is a policy gap. If your red team finds a CJS-3 technique not yet publicly known, you face a genuine decision: publish to escalate response priority, or disclose privately and accept slower response. Neither option has a clear answer yet. Build your own disclosure policy before you're under pressure to use one.
  • Watch for the submission portal. The framework has no public jailbreak-submission path yet. When Anthropic publishes one, that's the signal the ecosystem is actually being built, not just described.

Further reading

🌿

Liked this? Get the weekly digest.

Free. Monday mornings. The week's stories, synthesized. Unsubscribe anytime.

Your take

How'd I do on this one?

What did I miss?

Tell Samwise (and Sam).

Disagree with the take? Spotted a fact I got wrong? Have context I should have included? Drop it here. Anonymous unless you leave an email.