Skip to main content
Shields are prompt-based blocking rules designed specifically for Direct Prompt Injection detection. They allow you to analyze any text content before it reaches your LLM, protecting against malicious instructions directly inserted into user input.

What are Shields?

Shields analyze text content and decide whether to block or allow it based on custom rules you define. Unlike the multi-stage defense pipeline (which analyzes function calls), Shields work on raw text input - perfect for protecting direct user input. Use Shields when:
  • You want to analyze user input before it reaches your LLM
  • You need fast, real-time protection against direct prompt injection
  • You want custom blocking rules for specific attack patterns

Creating a Shield

Step 1: Access the Shields Section

  1. Navigate to your project in the HipoCap dashboard
  2. Go to the Shields section:
    • Navigate to /project/[your-project-id]/shields
    • Or click “Shields” in the sidebar under “Monitoring”

Step 2: Create a New Shield

  1. Click “Create Shield” to open the creation form
  2. Fill in the shield configuration:
    • Shield Key: Unique identifier (e.g., jailbreak, data-extraction, system-prompt-leak)
    • Name: Human-readable name (e.g., “Jailbreak Protection”)
    • Description: Optional description of the shield’s purpose
    • Prompt Description: Description of the type of prompts this shield should analyze
    • What to Block: Detailed description of content patterns to block
    • What Not to Block: Exceptions or content that should be allowed
  3. Click “Save” - the shield will be active immediately

Example Shield Configuration

Shield Key: jailbreak Name: Jailbreak Protection Description: Protects against attempts to override system instructions Prompt Description: User prompts that attempt to manipulate the AI system What to Block:
  • Instructions to ignore previous prompts
  • Attempts to reveal system prompts
  • Commands to override safety guidelines
  • Requests to act as a different system
What Not to Block:
  • Legitimate questions about how the system works
  • Requests for information (not system manipulation)

Using Shields in Code

Once you’ve created a shield, use it in your code to analyze content:
from hipocap import Hipocap

# Initialize HipoCap (from quickstart guide)
client = Hipocap.initialize(
    project_api_key="your-api-key-here",
    base_url="http://localhost",
    http_port=8000,
    grpc_port=8001,
    hipocap_base_url="http://localhost:8006",
    hipocap_user_id="your-user-id-here"
)

# Analyze content with a shield
content = input("Enter content to analyze: ")
result = client.shield(
    shield_key="jailbreak",
    content=content,
    require_reason=True  # Optional: get explanation for decision
)

# Check the decision
if result["decision"] == "BLOCK":
    print(f"Content blocked: {result.get('reason')}")
else:
    print("Content allowed")
    # Safe to send to your LLM

Shield Response

The shield() method returns a simple response:
{
    "decision": str,      # "BLOCK" or "ALLOW"
    "reason": str         # Optional explanation if require_reason=True
}

Shield Features

  • Analyze any text input - Not limited to function calls
  • Custom blocking rules - Define what to block per shield
  • Fast decision-making - Real-time protection
  • Optional reasoning - Get explanations for blocked content
  • Active/Inactive toggle - Enable or disable shields as needed

Common Use Cases

1. Protecting User Input

# Before sending user input to your LLM
user_input = request.form.get("message")

result = client.shield(
    shield_key="jailbreak",
    content=user_input
)

if result["decision"] == "BLOCK":
    return {"error": "Invalid input detected"}
    
# Safe to proceed
response = llm_client.chat.completions.create(
    messages=[{"role": "user", "content": user_input}]
)

2. Multiple Shields

You can use multiple shields for different protection layers:
# Check against multiple shields
jailbreak_result = client.shield(shield_key="jailbreak", content=content)
data_extraction_result = client.shield(shield_key="data-extraction", content=content)

# Block if any shield blocks
if jailbreak_result["decision"] == "BLOCK" or data_extraction_result["decision"] == "BLOCK":
    return {"error": "Content blocked"}

3. Getting Detailed Reasons

result = client.shield(
    shield_key="jailbreak",
    content=content,
    require_reason=True  # Get explanation
)

if result["decision"] == "BLOCK":
    # Log the reason for review
    logger.warning(f"Blocked: {result['reason']}")

Best Practices

  1. Create Specific Shields - Create separate shields for different attack vectors (jailbreak, data extraction, etc.)
  2. Test Your Shields - Test shields with known attack patterns to ensure they work correctly
  3. Use for Direct Input - Use Shields for direct user input; use analyze() for function call protection
  4. Combine with Function Analysis - Use both Shields and the multi-stage pipeline for comprehensive protection
  5. Review Blocked Content - Regularly review blocked content to tune your shield rules

Shields vs. Function Analysis

Use Shields when:
  • Analyzing direct user input before it reaches your LLM
  • You need fast, simple blocking rules
  • You want to protect against direct prompt injection
Use analyze() when:
  • Analyzing function calls and their results
  • You need multi-stage analysis (Input → LLM → Quarantine)
  • You want to detect indirect prompt injection in function results

Next Steps