Shields are prompt-based blocking rules designed specifically for Direct Prompt Injection detection. They allow you to analyze any text content before it reaches your LLM, protecting against malicious instructions directly inserted into user input.Documentation Index
Fetch the complete documentation index at: https://docs.hipocap.com/llms.txt
Use this file to discover all available pages before exploring further.
What are Shields?
Shields analyze text content and decide whether to block or allow it based on custom rules you define. Unlike the multi-stage defense pipeline (which analyzes function calls), Shields work on raw text input - perfect for protecting direct user input. Use Shields when:- You want to analyze user input before it reaches your LLM
- You need fast, real-time protection against direct prompt injection
- You want custom blocking rules for specific attack patterns
Creating a Shield
Step 1: Access the Shields Section
- Navigate to your project in the HipoCap dashboard
- Go to the Shields section:
- Navigate to
/project/[your-project-id]/shields - Or click “Shields” in the sidebar under “Monitoring”
- Navigate to
Step 2: Create a New Shield
- Click “Create Shield” to open the creation form
-
Fill in the shield configuration:
- Shield Key: Unique identifier (e.g.,
jailbreak,data-extraction,system-prompt-leak) - Name: Human-readable name (e.g., “Jailbreak Protection”)
- Description: Optional description of the shield’s purpose
- Prompt Description: Description of the type of prompts this shield should analyze
- What to Block: Detailed description of content patterns to block
- What Not to Block: Exceptions or content that should be allowed
- Shield Key: Unique identifier (e.g.,
- Click “Save” - the shield will be active immediately
Example Shield Configuration
Shield Key:jailbreak
Name: Jailbreak Protection
Description: Protects against attempts to override system instructions
Prompt Description: User prompts that attempt to manipulate the AI system
What to Block:
- Instructions to ignore previous prompts
- Attempts to reveal system prompts
- Commands to override safety guidelines
- Requests to act as a different system
- Legitimate questions about how the system works
- Requests for information (not system manipulation)
Using Shields in Code
Once you’ve created a shield, use it in your code to analyze content:Shield Response
Theshield() method returns a simple response:
Shield Features
- Analyze any text input - Not limited to function calls
- Custom blocking rules - Define what to block per shield
- Fast decision-making - Real-time protection
- Optional reasoning - Get explanations for blocked content
- Active/Inactive toggle - Enable or disable shields as needed
Common Use Cases
1. Protecting User Input
2. Multiple Shields
You can use multiple shields for different protection layers:3. Getting Detailed Reasons
Best Practices
- Create Specific Shields - Create separate shields for different attack vectors (jailbreak, data extraction, etc.)
- Test Your Shields - Test shields with known attack patterns to ensure they work correctly
- Use for Direct Input - Use Shields for direct user input; use
analyze()for function call protection - Combine with Function Analysis - Use both Shields and the multi-stage pipeline for comprehensive protection
- Review Blocked Content - Regularly review blocked content to tune your shield rules
Shields vs. Function Analysis
Use Shields when:- Analyzing direct user input before it reaches your LLM
- You need fast, simple blocking rules
- You want to protect against direct prompt injection
analyze() when:
- Analyzing function calls and their results
- You need multi-stage analysis (Input → LLM → Quarantine)
- You want to detect indirect prompt injection in function results
Next Steps
- Prompt Injection Protection - Learn about multi-stage analysis for function calls
- Keyword Detection - Configure keyword detection
- Threat Categories - Understand what HipoCap protects against
