Guardrail

Guardrail is the live monitoring and enforcement layer for your deployed AI systems. It extends the SPLX Platform by allowing you to define, activate, and manage runtime guardrails, detect misuse or policy violations as they occur, and maintain security, safety, compliance, and trust in production.

Figure 1: Guardrail Overview Page

Features

Feature
What it does

Policy Definitions

Create and manage rules for disallowed inputs or outputs of your AI system, such as content filters or policy constraints.

Detection & Filtering

Guardrails continuously monitor both inputs and outputs to detect issues such as system prompt leakage, jailbreak attempts, PII exposure, and other policy violations. Messages that break defined policy are flagged in real time, ensuring unsafe or non-compliant interactions are immediately identified and controlled.

Dashboards & Metrics

Provides live dashboards and tables that capture both flagged and unflagged messages. These views include detailed metrics on guardrail activity, violations, and runtime events, giving you visibility into how your AI system behaves under protection.

Low Latency Enforcement

Optimized for performance so checks can happen without noticeable delay for end users.

Getting Started

To use Runtime Protection:

  1. Create and connect a guardrail First, create a guardrail in the SPLX Platform. The Platform acts as the control plane, where you define and manage policies, while the guardrail service is the runtime enforcement layer that evaluates your AI system’s input and output messages. As part of the guardrail creation process, you also connect to the guardrail service, and the Platform generates a Guardrail ID that is later used when sending messages to the service to ensure the correct policies are enforced.

  2. Define protection policy Within the platform’s UI, specify the policy you want enforced for both inputs and outputs. For example, you can prevent system-prompt leakage, block jailbreak attempts, and flag PII exposure. Decide which inputs and outputs should be flagged.

  3. Monitor real-time results Through the UI, use live dashboards & tables to observe how messages are being evaluated, see both flagged and allowed messages, metrics on violations, and runtime behavior.

  4. Review & iterate policy Track effectiveness and update your guardrail policy in response to new threat patterns or usage behavior.

Continue by creating a guardrail.

Last updated