GOVERNANCE & POLICY

CLS Labs Contributes to the NIST AI Risk Management Framework

We submitted two contributions to the NIST AI RMF Playbook based on findings from 1.5M+ adversarial probes across 273+ models. Here's what we recommended and why.

April 2026 Brandon Arbour CLS Security Labs

What We Submitted

CLS Security Labs submitted two documents to the NIST AI Risk Management Framework Playbook in April 2026:

DOCUMENT 1

Playbook Contribution

Suggested actions for four AI RMF subcategories: MEASURE 2.6, MEASURE 2.7, MANAGE 2.1, and MAP 1.5. Each addresses a specific gap identified through our empirical testing.

Download PDF →

DOCUMENT 2

Use Case Profile

Application of all four AI RMF functions (GOVERN, MAP, MEASURE, MANAGE) to our adversarial assessment of GPT-4.1 and agentic AI systems. Documents methodology, outcomes, and lessons learned.

Download PDF →

Why This Matters

The NIST AI Risk Management Framework is becoming the de facto standard for AI governance in the US. Federal agencies, financial institutions, healthcare organizations, and any company subject to Colorado SB 24-205 will reference it. The Playbook is where practitioners go for concrete guidance on how to implement the framework.

Our contributions address four gaps we identified through testing, not through theory:

Gap 1: Text-only testing misses tool-access risk (MEASURE 2.6)

Standard safety benchmarks evaluate whether a model produces harmful text. They don't test what happens when that model is connected to tools. Our data across 101 models shows tool access increases breach rates by 2x to 9x. Claude Opus 4 goes from 16.2% to 38.9%. Llama 3.3-70B goes from 7.5% to 42.6%. The current Playbook doesn't distinguish between these deployment configurations.

Our recommendation: Adversarial testing of tool-connected AI systems should include probes delivered through the tool interface. Organizations should document the delta between text-only and tool-connected breach rates.

Gap 2: Single-judge scoring introduces systematic bias (MEASURE 2.7)

When you use one LLM to judge whether another LLM's response constitutes a breach, the result depends heavily on which judge you use. Our data shows breach rate assessments vary from 41.9% to 67.1% depending on the judge vendor. Single-vendor scoring can over- or under-report risk by 25 percentage points.

Our recommendation: Organizations should use multiple independent scoring systems from different vendors. Results should be reported as a validated range, not a single number.

Gap 3: Provider-level filtering creates false confidence (MANAGE 2.1)

The same model weights can produce a 0% breach rate on a provider's serverless endpoint and a 27.8% breach rate on a dedicated deployment. The difference is undocumented provider-level safety filtering that only applies to certain infrastructure configurations. Organizations testing on one deployment type and deploying on another are measuring the wrong risk.

Our recommendation: Adversarial testing should be conducted on the specific deployment infrastructure intended for production. Assessments on different configurations should be documented as non-representative.

Gap 4: Multi-agent trust boundaries are unaddressed (MAP 1.5)

When Agent A's output feeds into Agent B, and Agent A is compromised, Agent B trusts the poisoned input. Our testing across 69 models shows cross-agent contamination succeeds at a 23.5% average breach rate. Multi-agent verification patterns where one agent checks another's work don't provide security when both agents share the same vulnerabilities.

Our recommendation: Risk mapping should include cross-agent trust boundaries as a distinct category. Verification agents should use different underlying models or include non-AI validation steps.

The Evidence Base

Every recommendation in both documents is backed by empirical data from our adversarial testing program:

273+

Models Tested

1.5M+

Probes Executed

134

Attack Modules

15+

Providers

All referenced research is publicly available on this site. The GPT-4.1 assessment, cross-model research results, and CLAP methodology are linked in the submissions and accessible from our blog and research pages.

Both documents are submitted as free, publicly available resources from a for-profit entity, consistent with NIST AI RMF Playbook inclusion criteria.

Published April 2026. Both submissions are available for download above.