Configure AI Runtime Policy

Policy

A Policy is a collection of rules applied to messages passing through a AI Runtime Protection. It defines how different types of checks are combined to enforce protection on inputs and outputs.

Figure 1: AI Runtime Policy

Rules

Rules are the individual protection checks inside a policy (e.g. jailbreak detection, context leakage prevention, off-topic filtering). Each rule is predefined by type and direction, but you can configure parameters such as detection thresholds, confidential sections, canary words, or custom topics.

Example: Context Leakage

Figure 2: Context Leakage Rule Configuration

The Context Leakage rule prevents exposure of internal system instructions. When you open its configuration, you can:

  • Adjust the threshold slider - set the strictness of detection (between 0.0 and 1.0).

    • Lower values are stricter.

    • Higher values are more tolerant.

  • Define confidential sections - specify parts of the system prompt or internal instructions that must never be exposed in outputs.

  • Add canary words - optional keywords that trigger detection if they appear in the system's response.

After editing, click save and enable rule to activate it.

Each rule follows this same pattern: you enable it in the policy and configure the parameters available for that specific rule type.

Tune thresholds over time based on the specific requirements of your AI application.

Applying Policy

Once you finish configuring the policy, click Save Changes. The policy will be attached to the AI Runtime Protection after it is saved. Messages that violate a rule are flagged in real time and displayed in the Platform for the corresponding AI Runtime Protection.

Next Steps

After configuring your policy, you can:

  1. Test your policy in the Playground - Use the Playground to simulate inputs and outputs and verify that your rules behave as expected before deployment.

  2. Track metrics in the Overview page - Once the AI runtime policy is connected to the AI runtime protection service, monitor runtime activity on the Overview page. - Here you can track flagged vs. unflagged messages, rule performance, and policy effectiveness in real time.

Last updated