Developer DocumentationGet started free →

Introduction

LLM Gateways is a prompt-security API that sits between your application and any large language model. Before a user prompt reaches your LLM, you send it to LLM Gateways — it runs multi-layer detection and returns a risk score, a list of detected threats, and a recommended action (allow or block) in milliseconds.

What it protects against

  • Prompt injection — instructions hidden in user input designed to override your system prompt
  • Jailbreaks — "DAN", role-play, and other attempts to bypass model safety guidelines
  • System-prompt extraction — requests crafted to leak your confidential system prompt
  • PII leakage — accidental inclusion of personal data that should never reach an external LLM
  • Token smuggling — unicode tricks and invisible characters used to hide malicious content

How it works

Every scan runs through three detection layers in order of speed:

  1. Rules layer — 78+ regex/keyword patterns, <1 ms
  2. Semantic layer — embedding similarity against known attack vectors, ~2–5 ms
  3. LLM judge — optional second-opinion from a fine-tuned classifier for borderline prompts, ~50–200 ms

The layers cascade: if the rules layer produces a confident score the semantic layer is skipped, keeping median latency under 10 ms. See Concepts for the full detection model.

Privacy

Raw prompts are never stored. Scan logs contain only a SHA-256 hash of the prompt, the risk score, threat labels, and timing metadata.

Next steps