Introduction
LLM Gateways is a prompt-security API that sits between your application and any large language model. Before a user prompt reaches your LLM, you send it to LLM Gateways — it runs multi-layer detection and returns a risk score, a list of detected threats, and a recommended action (allow or block) in milliseconds.
What it protects against
- Prompt injection — instructions hidden in user input designed to override your system prompt
- Jailbreaks — "DAN", role-play, and other attempts to bypass model safety guidelines
- System-prompt extraction — requests crafted to leak your confidential system prompt
- PII leakage — accidental inclusion of personal data that should never reach an external LLM
- Token smuggling — unicode tricks and invisible characters used to hide malicious content
How it works
Every scan runs through three detection layers in order of speed:
- Rules layer — 78+ regex/keyword patterns, <1 ms
- Semantic layer — embedding similarity against known attack vectors, ~2–5 ms
- LLM judge — optional second-opinion from a fine-tuned classifier for borderline prompts, ~50–200 ms
The layers cascade: if the rules layer produces a confident score the semantic layer is skipped, keeping median latency under 10 ms. See Concepts for the full detection model.
Privacy
Raw prompts are never stored. Scan logs contain only a SHA-256 hash of the prompt, the risk score, threat labels, and timing metadata.
Next steps
- Quick Start — make your first scan call in 2 minutes
- Authentication — create and manage API keys
- API Reference — full endpoint documentation
- Concepts — risk scores, threat types, and detection layers