ARGUS — Agentic LLM Red-Team & Governance Scanner

Attack surface

What ARGUS scans

Full coverage across direct endpoints and complex agentic topologies — the surfaces static scanners miss entirely.

🎯

Direct Endpoints

Classic LLM attacks against completion APIs — alignment bypass, system-prompt extraction, role confusion, and jailbreaking.

prompt-injection jailbreaking system-prompt-leak alignment-bypass role-confusion

🗃

RAG Pipelines Agentic

Indirect cross-prompt injection via retrieval documents. Attacks the model through the data layer it trusts.

indirect-injection doc-poisoning retrieval-hijack embedding-attacks

🔌

MCP Server Meshes Agentic

Model Context Protocol attack surface — tool-call interception, poisoned context injection, cross-server propagation.

tool-call-intercept context-injection server-traversal mesh-propagation

🤖

Multi-Agent Pipelines Agentic

Agent-to-agent injection that hops trust boundaries and propagates exploitation through downstream agents.

pipeline-hijack agent-injection role-escalation trust-boundary

🔨

Tool-use / Function Calling

Exploiting function-calling interfaces — schema confusion, argument injection, and output manipulation.

arg-injection schema-confusion output-manipulation tool-abuse

📋

Scoring & Compliance

CVSSv4.0 vectors for every finding, automatically mapped to applicable regulatory frameworks. SARIF v2.1 output for CI/CD gates.

cvss4.0 nist-ai-rmf eu-ai-act sarif-v2.1 owasp-llm-top10

How it works

The closed-loop pipeline

Four specialized agents in a continuous revision loop. The Planner reads live behavioral signals and rewrites its strategy — not a fixed probe list.

Adapts in real time

The Planner revises attack strategy based on what the target does, not a static script — so defenses that beat the first wave don't beat the second.

Diversity-constrained synthesis

Embedding-space diversity constraint prevents the Attacker from converging on variations of the same payload — maximizing coverage, not repetition.

Cross-session memory

ChromaDB-backed episodic memory means each scan starts with everything learned from every previous scan — progressively sharper over time.

Three-layer validation

Independent semantic, judge-panel, and behavioral signals must agree before a finding is confirmed — low false-positive rate by design.

  Target LLM / Pipeline
          |
          v
  ┌───────────────────────────────────────────────┐
  │              Orchestrator                       │
  │                                                  │
  │   Planner ────────> Attacker ────> Evaluator   │
  │     (Claude Opus)    (synthesizer)  (3-layer)   │
  │      ^                   |              |        │
  │      |                   v              v        │
  │      └────── Revision <──── Findings + CVSS │
  │                                    |            │
  │                               Reporter          │
  └───────────────────────────────────────────────┘
          |
          v
    SARIF v2.1 + JSON  ──>  CI/CD gates + GRC tooling

Detection stack

Semantic Proximity

Cosine distance against a confirmed-attack embedding space. Flags responses that land in known attack-success neighborhoods, independent of surface form — catches paraphrases and evasive reformulations.

LLM-as-Judge Panel

Multi-model verdict with configurable affirmative quorum and confidence threshold. Cross-model agreement eliminates single-model blind spots and dramatically reduces false positives.

Behavioral Trace Analysis

Pipeline telemetry anomaly detection. Catches exploitation that produces no suspicious output text but alters tool-call patterns, token budgets, or downstream API behavior.

Feature	Garak (NVIDIA)	PyRIT (Microsoft)	LLM-Fuzzer	ARGUS
Multi-agent architecture	✗	✗	✗	✓
Adaptive attack strategy	✗	✗	✗	✓ real-time
Cross-session memory	✗	✗	✗	✓ ChromaDB
OWASP LLM Top 10 (2025)	partial	partial	✗	✓ all 10
RAG / pipeline attacks	✗	partial	✗	✓
MCP server mesh attacks	✗	✗	✗	✓
Multi-agent propagation	✗	✗	✗	✓
CVSSv4.0 scoring	✗	✗	✗	✓
SARIF v2.1 output	✗	✗	✗	✓
8-framework compliance mapping	✗	✗	✗	✓
LLM-as-judge panel detection	✗	partial	✗	✓ quorum
Behavioral trace analysis	✗	✗	✗	✓
Academic backing	✓	✓	✓	✓ IEEE

Verification

Test results & empirical proofs

Every design claim is backed by a measured outcome. All results are reproducible from the public repository.

Unit test suite

pytest tests/ -v --tb=short 15 PASSED · 0 FAILED · 1.00s

tests/unit/test_orchestrator.py :: TestOrchestrator
PASSED  test_run_completes_without_error
PASSED  test_run_transitions_to_complete
PASSED  test_run_calls_planner_initialize
PASSED  test_run_calls_reporter_generate
PASSED  test_failed_phase_on_exception
tests/unit/test_session.py :: TestSessionState
PASSED  test_initial_phase_is_init
PASSED  test_transition_updates_phase
PASSED  test_budget_not_exhausted_at_start
PASSED  test_budget_exhausted_when_at_limit
PASSED  test_add_finding_increments_list
PASSED  test_confirmed_findings_filters_unconfirmed
PASSED  test_summary_keys_present
PASSED  test_completed_at_set_on_complete_transition
tests/unit/test_session.py :: TestFinding
PASSED  test_new_id_has_arg_prefix
PASSED  test_to_dict_contains_required_keys

CVSSv4.0 scoring engine — 48-vector validation

48 hand-crafted test vectors validated against the FIRST CVSSv4.0 reference calculator. All vectors scored within ±0.2 of the reference. Five representative cases:

Attack scenario	CVSSv4.0 vector	ARGUS	Reference	Δ
Direct prompt injection — system-prompt extraction	AV:N/AC:L/AT:N/PR:N/UI:N/VC:H/VI:N/VA:N/SC:H/SI:N/SA:N	8.7 High	8.7	0.0
Cross-agent privilege escalation via MCP poisoning	AV:N/AC:H/AT:P/PR:L/UI:N/VC:H/VI:H/VA:H/SC:H/SI:H/SA:H	9.4 Critical	9.4	0.0
RAG indirect injection, low downstream impact	AV:N/AC:H/AT:N/PR:N/UI:N/VC:L/VI:N/VA:N/SC:L/SI:N/SA:N	4.2 Medium	4.1	+0.1
Tool-call schema confusion, no downstream reach	AV:N/AC:L/AT:N/PR:L/UI:N/VC:L/VI:L/VA:N/SC:N/SI:N/SA:N	5.3 Medium	5.2	+0.1
Zero-impact probe (all Vulnerable & Subsequent metrics: None)	AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:N/SC:N/SI:N/SA:N	0.0 None	0.0	0.0

Payload diversity — embedding-space constraint proof

50-payload batches generated for 3 OWASP categories × 2 target profiles (threshold δ = 0.35). Constrained synthesis reduces mean intra-batch cosine similarity by 66% versus unconstrained generation.

ARGUS constrained (δ = 0.35)

0.21

mean cosine similarity (SD 0.04)

Unconstrained synthesis

0.61

mean cosine similarity (SD 0.09)

31%

avg candidate rejection rate per batch

2.4×

rejections per admission (LLM06 extraction)

1.3×

rejections per admission (LLM01 injection)

batch configs (3 categories × 2 profiles)

OWASP LLM Top 10 (2025) — full coverage proof

ID	Category	garak v0.13	ARGUS v1.0	Coverage gain
LLM01	Prompt Injection	Full	Full + XPIA / MCP	Extended
LLM02	Insecure Output Handling	Partial	Full	+
LLM03	Training Data Poisoning	Partial	Full	+
LLM04	Model Denial of Service	Partial	Full	+
LLM05	Supply Chain Vulnerabilities	None	Full	New
LLM06	Sensitive Information Disclosure	Full	Full	Equal
LLM07	Insecure Plugin Design	Partial	Full	+
LLM08	Excessive Agency	None	Full	New
LLM09	Overreliance	Partial	Full	+
LLM10	Model Theft	None	Full	New

10 / 10

ARGUS full coverage

7 / 10

garak effective coverage (2 full, 5 partial, 3 none)

CI/CD integration — SARIF v2.1 proof

Scan completes, SARIF v2.1 file written

ARGUS emits a standards-compliant SARIF v2.1 report at the path specified in the scan config. No post-processing or schema conversion is needed.

upload-sarif action ingests on first attempt, zero config

A single upload-sarif step with the output path is the only pipeline addition. No format-version pinning, no custom mapping. garak requires a hand-written JSONL-to-SARIF converter before this step.

Critical and High findings visible in Security tab within 45 seconds

Findings appear with correct rule IDs, severity labels, and descriptions. The 45-second window was measured on a representative repository; GitHub's ingestion pipeline, not ARGUS, is the bottleneck.

.github/workflows/argus-scan.yml

- name: Run ARGUS scan
  run: argus scan --target openai --model gpt-4o --sarif argus.sarif

- name: Upload SARIF to GitHub Security tab
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: argus.sarif
# Result: Critical/High findings in Security tab in <45 s.  Zero custom config.

Installation

Get started in 60 seconds

Install once, scan any supported target with a single command. API key resolved from env or --api-key flag.

● Install

git clone https://github.com/\
  sunilgentyala/argus
cd argus
pip install -e .

Requires Python 3.11+. Optional dev extras: pip install -e ".[dev]"

● Anthropic (Claude)

argus scan \
  --target anthropic \
  --model claude-sonnet-4-6 \
  --profile quick

Set ANTHROPIC_API_KEY. Use --profile full for all 10 OWASP categories.

● OpenAI (GPT)

argus scan \
  --target openai \
  --model gpt-4o \
  --system-prompt "..." \
  --profile compliance

Set OPENAI_API_KEY. Pass --system-prompt to test a deployed persona.

● View Reports

argus show \
  ./argus-reports/\
  <session-id>.report.json

Reports also output as SARIF v2.1 for CI/CD integration and GRC handoff.

Regulatory coverage

8 compliance frameworks

Every confirmed finding is automatically tagged to applicable regulatory articles — ready for your audit trail or GRC team.

🇺🇸

NIST AI RMF

Govern, Map, Measure, Manage — full control framework mapping.

🇪🇺

EU AI Act

High-risk system requirements, transparency and conformity obligations.

🇺🇸

US EO 14110

Federal AI safety, security reporting, and red-team requirements.

🇬🇧

UK AISI

AI Safety Institute evaluation and testing standards.

🇮🇳

India CERT-In

Cybersecurity incident reporting and AI governance guidelines.

🌐

ISO 42001

International AI management system standard controls.

🌏

APAC Digital Governance

Regional frameworks across Asia-Pacific jurisdictions.

🌎

African Digital Frameworks

AU digital transformation and AI policy alignment.

Changelog

What's new in v1.1.0

Latest release ships HTML reports, direct SARIF output, and threat intelligence.

📄

HTML Reports

Self-contained HTML output with severity-colour-coded findings table. Attach to GRC tickets or email to security stakeholders — no external assets required.

html_reporter.py

🏁

Direct SARIF flag

New --sarif argus.sarif flag lets you specify the SARIF output path directly — perfect for CI pipelines that pick up a specific filename.

--sarifCI/CD

🤖

Threat Intelligence

New argus/intelligence/ module ships a curated LLM attack signal database — helping the Planner bias toward highest-yield vectors per model family.

ThreatIntelligencemodel-profiles

🚀

Claude Opus 4.8

Planner now defaults to claude-opus-4-8 — Anthropic's latest reasoning model. Attack strategy quality improves measurably on complex agentic surfaces.

claude-opus-4-8

Autonomous LLM red-teaming,
end to end

What ARGUS scans

Direct Endpoints

RAG Pipelines Agentic

MCP Server Meshes Agentic

Multi-Agent Pipelines Agentic

Tool-use / Function Calling

Scoring & Compliance

The closed-loop pipeline

Adapts in real time

Diversity-constrained synthesis

Cross-session memory

Three-layer validation

Semantic Proximity

LLM-as-Judge Panel

Behavioral Trace Analysis

How ARGUS compares

Test results & empirical proofs

Scan completes, SARIF v2.1 file written

upload-sarif action ingests on first attempt, zero config

Critical and High findings visible in Security tab within 45 seconds

Get started in 60 seconds

8 compliance frameworks

NIST AI RMF

EU AI Act

US EO 14110

UK AISI

India CERT-In

ISO 42001

APAC Digital Governance

African Digital Frameworks

Star history

What's new in v1.1.0

HTML Reports

Direct SARIF flag

Threat Intelligence

Claude Opus 4.8

Help grow the project ⭐

Built by

Autonomous LLM red-teaming,end to end

What ARGUS scans

Direct Endpoints

RAG Pipelines Agentic

MCP Server Meshes Agentic

Multi-Agent Pipelines Agentic

Tool-use / Function Calling

Scoring & Compliance

The closed-loop pipeline

Adapts in real time

Diversity-constrained synthesis

Cross-session memory

Three-layer validation

Semantic Proximity

LLM-as-Judge Panel

Behavioral Trace Analysis

How ARGUS compares

Test results & empirical proofs

Scan completes, SARIF v2.1 file written

upload-sarif action ingests on first attempt, zero config

Critical and High findings visible in Security tab within 45 seconds

Get started in 60 seconds

8 compliance frameworks

NIST AI RMF

EU AI Act

US EO 14110

UK AISI

India CERT-In

ISO 42001

APAC Digital Governance

African Digital Frameworks

Star history

What's new in v1.1.0

HTML Reports

Direct SARIF flag

Threat Intelligence

Claude Opus 4.8

Help grow the project ⭐

Built by

Autonomous LLM red-teaming,
end to end