Prompt Hardening
Last updated
Last updated
Once you identify potential risks in your target using probes, the SplxAI Platform allows you to harden the target's system prompt to strengthen its security.
You can learn more about the importance and benefits of prompt hardening, along with use case comparisons to guardrails and our benchmark, in our blog post System Prompt Hardening: The Backbone of Automated AI Security.
To begin prompt hardening, navigate to the Prompt Hardening page in the Remediation section of the main navigation bar, and click the Harden System Prompt button in the top-right corner.
The hardening process begins by selecting the relevant probes you want to use to harden your system prompt. You can think of these as vulnerabilities you wish to protect against. The prompt hardening tool will then use the results of your probe runs to strengthen your system prompt against the identified vulnerabilities.
The table displays the probes, their categories, the last probe run on the target, and the percentage of failed test cases. This percentage serves as an indicator of where your target is most vulnerable and where there is the greatest opportunity for improvement through hardening. Once all relevant probes are selected (at least one is required), click Continue.
In the next step, simply provide your target's current system prompt and click Generate hardened system prompt, which will initiate the new prompt hardening process.
Depending on the number of selected probes and the length of the system prompt, prompt hardening may take a few minutes. Feel free to continue using other features of the app while the hardening process runs in the background, it will not be interrupted.
The latest prompt hardening will be displayed on the Prompt Hardening page. The header provides information about the generation date and time, the probes selected for hardening, the progress of the hardening, and the remediation status.
Once applied to your system prompt, you can flag the prompt hardening as Applied.
This action is not reversible.
Below, there are three sections:
Current System prompt - displaying the system prompt before hardening.
Generated system prompt - showing the generated hardened system prompt with options to:
Highlight the differences,
Expand the prompt for better readability,
Copy the system prompt.
Actions - lists all prompt hardening actions performed on your system prompt by our tool.
Example: Stressing that competitor companies should neither be mentioned nor recommended.
The second tab on the prompt hardening page is History, which features a table displaying all previous prompt hardenings. The table includes information such as the generation date and time, selected probes, progress (in progress, generated, ...), and status (applied, not applied, ...).