The defensive barrier for production AI agents.
Production-grade open-source Python framework defending AI agents against prompt-injection attacks. Five-layer defense architecture — sanitizer, ML + pattern detector, compartmentalized RBAC, encrypted audit trail, and human confirmation gates. Vendor-neutral. MCP-native. Compliance-ready for HIPAA, SOC 2, and NERC CIP.
Prompt injection is no longer theoretical. Every URL your agent fetches, every document it summarizes, every MCP server it queries is a potential attack surface. Three real attack patterns Bulwark blocks today:
Adversaries embed instructions in zero-pixel fonts or fully transparent text. Your LLM reads them. Your users never see them.
Zero-width characters and right-to-left overrides hide adversarial payloads inside ordinary-looking text. Renders clean. Parses dirty.
Injected system:, assistant:, or "ignore previous instructions" tokens convince the model the prior context is over and a new operator is in charge.
Each request flows through five independent checkpoints. No single point of failure. If a sanitizer misses, the detector catches. If both pass, RBAC enforces least privilege. If a tool needs human judgement, the gate holds the line.
Zero-permission isolation. Strips HTML tricks, Unicode abuse, and bidirectional overrides before content ever reaches the model.
Fine-tuned BERT classifier plus regex pattern catalog. Two independent signals must both pass for traffic to proceed.
Role × tool permissions. Default-deny. A research agent cannot send email. An email agent cannot move money.
Async approval workflow for high-stakes actions. Webhook, Slack, or email. Configurable timeout with auto-deny.
AES-128 GCM encryption at rest. 7-year retention by default. Forensically queryable: "why did agent X recommend Y on date Z?"
Install, wrap your tool executors with guard(), ship. Bulwark is a drop-in security layer — no code rewrite, no model swap.
Python 3.10+. One package. Optional extras for transformers, dashboards, integrations.
Pass your tool functions to guard(). Choose an agent role. Specify which tools are outbound (monitored for exfiltration).
Each call now flows through all five defense layers. Blocked attempts surface as InjectionDetectedError; approved actions are signed into the audit trail.
Add MCP servers, custom patterns, compliance modes (HIPAA / SOC 2 / NERC CIP), and human-gate channels (Slack, webhook, email).
$ pip install bulwark-agent-security # With ML detector + dashboard: $ pip install "bulwark-agent-security[transformers,dashboard]"
import asyncio from bulwark import BulwarkConfig, AgentRole, guard async def read_database(args): return [{"id": 1, "name": "Alice"}] async def main(): secured = guard( executors={"read_database": read_database}, config=BulwarkConfig(agent_role=AgentRole.RESEARCH), ) print(await secured["read_database"]({"sql": "SELECT 1"})) asyncio.run(main())
# HIPAA + SOC 2 production setup from bulwark import guard, BulwarkConfig, AgentRole config = BulwarkConfig( agent_role=AgentRole.WRITE, compliance=["HIPAA", "SOC2", "NERC_CIP"], audit_encryption_key=os.environ["AUDIT_KEY"], human_gate_webhook="https://slack.example/bulwark", detection_threshold=0.65, ) secured = guard( executors={"query_phi": query_phi, "send_clinical_alert": send_alert}, config=config, outbound_tools=["send_clinical_alert"], )
Bulwark's audit trail, RBAC, and human-gate primitives map directly to enterprise compliance requirements. Ship agents in healthcare, energy, and finance — without rebuilding evidence trails.
Encrypted audit logs with 7-year retention. Per-tool access control. Tamper-evident integrity. Forensic reconstruction of every PHI-touching agent decision.
Role-based agent permissions, monitoring of system operations, and incident response evidence — all generated by default, exported on demand.
For energy operators running AI on or near control systems. Compartmentalized RBAC with default-deny, hardened sanitizer with zero-permission isolation.
Works with Anthropic, OpenAI, MCP, LangChain, or your own model. No lock-in, no SDK rewrite — guard your existing tool executors with one decorator.
Ships with an MCP proxy integration. Drop Bulwark in front of any MCP server to enforce sanitization, RBAC, and audit on every tool call.
Designed from real agent failure modes seen in healthcare RCM, genomics, and energy operations — not from a CTF lab. Built to survive contact with users.
Free. Open source. Apache 2.0. The same defensive primitives you'd build internally — packaged and battle-tested.