What are Shields?
Shields analyze text content and decide whether to block or allow it based on custom rules you define. Unlike the multi-stage defense pipeline (which analyzes function calls), Shields work on raw text input - perfect for protecting direct user input. Use Shields when:- You want to analyze user input before it reaches your LLM
- You need fast, real-time protection against direct prompt injection
- You want custom blocking rules for specific attack patterns
Creating a Shield
Step 1: Access the Shields Section
- Navigate to your project in the HipoCap dashboard
- Go to the Shields section:
- Navigate to
/project/[your-project-id]/shields - Or click “Shields” in the sidebar under “Monitoring”
- Navigate to
Step 2: Create a New Shield
- Click “Create Shield” to open the creation form
-
Fill in the shield configuration:
- Shield Key: Unique identifier (e.g.,
jailbreak,data-extraction,system-prompt-leak) - Name: Human-readable name (e.g., “Jailbreak Protection”)
- Description: Optional description of the shield’s purpose
- Prompt Description: Description of the type of prompts this shield should analyze
- What to Block: Detailed description of content patterns to block
- What Not to Block: Exceptions or content that should be allowed
- Shield Key: Unique identifier (e.g.,
- Click “Save” - the shield will be active immediately
Example Shield Configuration
Shield Key:jailbreak
Name: Jailbreak Protection
Description: Protects against attempts to override system instructions
Prompt Description: User prompts that attempt to manipulate the AI system
What to Block:
- Instructions to ignore previous prompts
- Attempts to reveal system prompts
- Commands to override safety guidelines
- Requests to act as a different system
- Legitimate questions about how the system works
- Requests for information (not system manipulation)
Using Shields in Code
Once you’ve created a shield, use it in your code to analyze content:Shield Response
Theshield() method returns a simple response:
Shield Features
- Analyze any text input - Not limited to function calls
- Custom blocking rules - Define what to block per shield
- Fast decision-making - Real-time protection
- Optional reasoning - Get explanations for blocked content
- Active/Inactive toggle - Enable or disable shields as needed
Common Use Cases
1. Protecting User Input
2. Multiple Shields
You can use multiple shields for different protection layers:3. Getting Detailed Reasons
Best Practices
- Create Specific Shields - Create separate shields for different attack vectors (jailbreak, data extraction, etc.)
- Test Your Shields - Test shields with known attack patterns to ensure they work correctly
- Use for Direct Input - Use Shields for direct user input; use
analyze()for function call protection - Combine with Function Analysis - Use both Shields and the multi-stage pipeline for comprehensive protection
- Review Blocked Content - Regularly review blocked content to tune your shield rules
Shields vs. Function Analysis
Use Shields when:- Analyzing direct user input before it reaches your LLM
- You need fast, simple blocking rules
- You want to protect against direct prompt injection
analyze() when:
- Analyzing function calls and their results
- You need multi-stage analysis (Input → LLM → Quarantine)
- You want to detect indirect prompt injection in function results
Next Steps
- Prompt Injection Protection - Learn about multi-stage analysis for function calls
- Keyword Detection - Configure keyword detection
- Threat Categories - Understand what HipoCap protects against
