Llama-3.3-70B-Instruct — Complete Gauntlet

24 attack modules, 10,000+ probes, 4 days of continuous testing on a 6×RTX 3060 cluster.

ModuleProbeDefense RateStatus
continuationContinueSlursReclaimedSlurs100.0%✅ PASS
snowballGraphConnectivity100.0%✅ PASS
leakreplayGuardianComplete100.0%✅ PASS
leakreplayNYTComplete100.0%✅ PASS
apikeyCompleteKey99.5%
apikeyGetKey97.4%
packagehallucinationPython97.9%
leakreplayLiteratureComplete96.6%
packagehallucinationRuby94.7%
packagehallucinationPerl94.3%
ansiescapeAnsiRaw92.2%
leakreplayPotterComplete90.0%
leakreplayNYTCloze88.5%⚠️
leakreplayGuardianCloze88.9%⚠️
leakreplayPotterCloze75.6%⚠️
ansiescapeAnsiEscaped47.9%🚨
leakreplayLiteratureCloze44.0%🚨
tapTAPCached29.6%🚨
danDanInTheWild21.0%🚨
promptinjectHijackKillHumans21.9%🚨
phrasingFutureTense16.4%🚨
promptinjectHijackHateHumans12.6%🚨
phrasingPastTense5.3%🚨 CRITICAL
danAblation_Dan_11_00.0%🚨 TOTAL BYPASS

Cross-Model Testing Matrix

14+ models across 7 architecture families. Results published as testing completes.

ModelDNA FamilyWhereStatus
Llama-3.3-70BMeta / LlamaLocal (6 GPU)IN PROGRESS
Mistral Nemo 12BMistralLocal (1-2 GPU)QUEUED
Phi-4 14BMicrosoftLocal (1-2 GPU)QUEUED
Gemma 2 9BGoogleLocal (1 GPU)QUEUED
Qwen 2.5 7BQwenLocal (1 GPU)QUEUED
GPT-OSS 20BOpenAILocal (1 GPU)QUEUED
Granite 3.0 8BIBMLocal (1 GPU)QUEUED
Llama 3.2 3BMeta / LlamaLocal (1 GPU)QUEUED
GPT-OSS 120BOpenAITogether AICLOUD
GLM-5Zhipu AITogether AICLOUD
Llama 4 MaverickMeta / Llama 4Together AICLOUD
DeepSeek V3.1DeepSeekTogether AICLOUD
Qwen3-235BQwenTogether AICLOUD