Introduction

LLM Gateways is a prompt-security API that sits between your application and any large language model. Before a user prompt reaches your LLM, you send it to LLM Gateways — it runs multi-layer detection and returns a risk score, a list of detected threats, and a recommended action (allow or block) in milliseconds.

What it protects against

Prompt injection — instructions hidden in user input designed to override your system prompt
Jailbreaks — "DAN", role-play, and other attempts to bypass model safety guidelines
System-prompt extraction — requests crafted to leak your confidential system prompt
PII leakage — accidental inclusion of personal data that should never reach an external LLM
Token smuggling — unicode tricks and invisible characters used to hide malicious content

How it works

Every scan runs through three detection layers in order of speed:

Rules layer — 78+ regex/keyword patterns, <1 ms
Semantic layer — embedding similarity against known attack vectors, ~2–5 ms
LLM judge — optional second-opinion from a fine-tuned classifier for borderline prompts, ~50–200 ms

The layers cascade: if the rules layer produces a confident score the semantic layer is skipped, keeping median latency under 10 ms. See Concepts for the full detection model.

Privacy

Raw prompts are never stored. Scan logs contain only a SHA-256 hash of the prompt, the risk score, threat labels, and timing metadata.

Next steps

Quick Start — make your first scan call in 2 minutes
Authentication — create and manage API keys
API Reference — full endpoint documentation
Concepts — risk scores, threat types, and detection layers