Guardrail Overview

The Overview page provides a high-level summary of how a selected guardrail is performing in production. It brings together metrics, charts, and recent flagged messages so you can monitor policy effectiveness and system behavior at a glance.

Figure 1: Guardrail Overview Page

Time Frame & Granularity

All data shown on the Overview page is scoped to the time frame you select in the calendar at the top of the page. You can quickly choose relative ranges (e.g. last 15 minutes, last 24 hours, yesterday) or pick a custom range from the date selector.

In addition, you can adjust the granularity of the data:

  • 1 hour

  • 1 day

  • 1 week

  • 1 month

The selected granularity is applied across all charts, allowing you to zoom in for fine-grained analysis or zoom out for long-term trends.

The refresh button in the top-right corner lets you fetch the latest results immediately. Metrics and charts are also refreshed automatically at regular intervals to keep the dashboard up to date.

Key Metrics

At the top of the page, you will see key statistics for the selected guardrail:

  • Number of messages - total number of input and output messages processed in the selected time frame.

  • Number of flagged messages - how many messages violated one or more guards.

  • Flagged percentage - percentage of total messages that were flagged.

  • P95 Latency - 95th percentile latency for guardrail evaluation, useful for tracking runtime performance.

Charts

The Guardrail Overview includes several charts, all filtered by your selected time frame and granularity:

  • Flagged / Safe messages - trends of flagged vs. safe messages.

  • Threats flagged - breakdown of violations by active guard type (e.g. Context Leakage, Jailbreak, Off Topic).

  • Flagged messages direction - shows whether policy violations are happening more often on user inputs or on system outputs. This helps you understand if risks are primarily coming from what users are submitting (e.g. jailbreak attempts) or from what the AI is generating.

Last Flagged Messages

The Last flagged messages table lists the most recent violations with details including:

  • Timestamp - when the message was processed.

  • Direction - whether the violation occurred on input or output.

  • Request type - sync or async.

  • Latency - how long the guardrail took to evaluate the message.

  • Threats flagged — which guard(s) caused the message to be flagged.

Click See all to view the full list of flagged messages.

Usage

Use the Overview to:

  • Monitor the effectiveness of active policy.

  • Identify which guards are most frequently triggered.

  • Track latency to ensure runtime protection does not degrade user experience.

  • For detailed inspection of all traffic, use the Messages page.

  • All metrics and charts reflect only the selected time frame and granularity.

  • Combine Guardrail Overview with Policy and Playground to continuously refine your protection.

Last updated