What is Prompt Injection?
Prompt injection is an attack where malicious instructions are embedded in content that an LLM processes. This can cause the LLM to:- Execute unauthorized function calls
- Leak sensitive information
- Bypass safety controls
- Perform unintended actions
Multi-Stage Analysis Pipeline
HipoCap uses three stages of analysis to detect prompt injection. Each stage catches different types of attacks, and you can enable them based on your security needs.Stage 1: Input Analysis (Prompt Guard)
Purpose: Detect malicious patterns in function inputs before execution. How it works:- Uses specialized models to analyze function arguments and user queries
- Fast, rule-based detection with low latency
- Checks for suspicious patterns and keywords
- Direct injection attempts in function inputs
- Suspicious patterns in user queries
- Malicious instructions embedded in arguments
Stage 2: LLM Analysis
Purpose: Analyze function results for threat indicators and attack patterns. How it works:- Uses structured LLM analysis with threat detection
- Analyzes the actual content returned by functions
- Detects sophisticated attack patterns
- Threat indicators (S1-S14 categories)
- Technical indicators (instruction_injection, contextual_blending, function_call_attempt)
- Attack patterns and function call attempts embedded in content
Stage 3: Quarantine Analysis
Purpose: Simulate infection by sending content to a quarantine LLM, then analyze the output. How it works:- Sends function result to quarantine LLM (simulates what would happen if malicious content reached your main LLM)
- Analyzes the quarantine LLM’s output for hidden instructions
- Hidden instructions that only trigger when processed by an LLM
- Contextual blending attacks
- Function call attempts that emerge after LLM processing
Attack Vectors Protected
1. Instruction Injection
Direct commands to override system behavior. Example:2. Contextual Blending
Malicious instructions hidden in legitimate content. Example:3. Function Call Attempts
Attempts to trigger unauthorized function calls. Example:4. Hidden Instructions
Instructions encoded or obfuscated in content. Example:Analysis Modes
Quick Analysis
Faster analysis with simplified output:final_decision- “ALLOWED” or “BLOCKED”final_score- Risk score (0.0-1.0)safe_to_use- Boolean indicating if safeblocked_at- Stage where blocking occurred (if any)reason- Reason for decision
Full Analysis
Comprehensive analysis with detailed threat information:threat_indicators- Complete S1-S14 breakdowndetected_patterns- Detailed pattern analysisfunction_call_attempts- Complete function call detectionpolicy_violations- Policy rule violationsseverity- Detailed severity assessment
Function Call Detection
HipoCap specifically detects function call attempts embedded in content: Detected patterns:- Direct commands: “search the web”, “send email”, “execute command”
- Polite requests: “please search”, “can you search”, “would you search”
- Embedded instructions: “search for confidential information”, “look up this data”
Decision Making
Based on the analysis, HipoCap makes one of two decisions (returned asfinal_decision):
ALLOWED
- No threats detected
- All policy rules passed
- Safe to execute
safe_to_use: true
BLOCKED
- Threat detected (S1-S14 category)
- Policy violation
- Function call attempt detected
- High severity risk
- RBAC permission denied
- Function chaining violation
safe_to_use: falseblocked_atindicates which stage blocked it
Complete Example
Here’s a complete example showing all three stages:Best Practices
- Enable All Stages for Critical Functions - Use all three stages for sensitive operations
- Use Quick Mode for Low Latency - Enable quick analysis when speed is critical
- Configure Policies - Set up governance policies to define blocking rules
- Monitor and Review - Regularly review blocked attempts to tune policies
- Combine with RBAC - Use role-based access control alongside analysis
Next Steps
- Threat Categories - Detailed S1-S14 reference
- Setting up the Shield - Configuration guide
- Keyword Detection - Configure keyword detection
