Quick Reference
| Category | Name | Description |
|---|---|---|
| S1 | Violent Crimes | Content promoting or describing violent acts |
| S2 | Non-Violent Crimes | Content promoting illegal activities (fraud, theft, etc.) |
| S3 | Sex-Related Crimes | Content related to sexual crimes or exploitation |
| S4 | Child Sexual Exploitation | Content involving minors in sexual contexts |
| S5 | Defamation | Content that defames or harms reputation |
| S6 | Specialized Advice | Dangerous specialized advice (medical, legal, financial) |
| S7 | Privacy | Content violating privacy or containing personal information |
| S8 | Intellectual Property | Content violating copyright or intellectual property |
| S9 | Indiscriminate Weapons | Content about weapons of mass destruction |
| S10 | Hate | Content promoting hate speech or discrimination |
| S11 | Suicide & Self-Harm | Content promoting self-harm or suicide |
| S12 | Sexual Content | Inappropriate sexual content |
| S13 | Elections | Content manipulating or interfering with elections |
| S14 | Code Interpreter Abuse | Attempts to abuse code execution capabilities |
Detailed Categories
S1: Violent Crimes
Content promoting or describing violent acts. Examples:- Instructions for violence
- Threats of harm
- Physical assault descriptions
- Terrorism-related content
S2: Non-Violent Crimes
Content promoting illegal activities that don’t involve violence. Examples:- Fraud schemes
- Theft instructions
- Hacking guides
- Identity theft
- Financial crimes
S3: Sex-Related Crimes
Content related to sexual crimes or exploitation. Examples:- Sexual assault content
- Non-consensual sexual content
- Sexual exploitation
S4: Child Sexual Exploitation
Content involving minors in sexual contexts. Examples:- Child exploitation material
- Grooming content
- Any sexual content involving minors
S5: Defamation
Content that defames or harms reputation. Examples:- False accusations
- Character assassination
- Libelous content
S6: Specialized Advice
Content providing dangerous specialized advice. Examples:- Medical advice (diagnosis, treatment)
- Legal advice (specific legal guidance)
- Financial advice (investment recommendations)
S7: Privacy
Content violating privacy or containing sensitive information. Examples:- Sensitive keyword exposure
- Privacy violations
- Unauthorized data sharing
- PII exposure
S8: Intellectual Property
Content violating copyright or intellectual property. Examples:- Copyright infringement
- Trademark violations
- Patent violations
- Plagiarism
S9: Indiscriminate Weapons
Content about weapons of mass destruction. Examples:- Nuclear weapons
- Biological weapons
- Chemical weapons
- Instructions for creating WMDs
S10: Hate
Content promoting hate speech or discrimination. Examples:- Racism
- Sexism
- Religious discrimination
- Homophobia
- Transphobia
S11: Suicide & Self-Harm
Content promoting self-harm or suicide. Examples:- Suicide instructions
- Self-harm encouragement
- Methods for self-harm
S12: Sexual Content
Inappropriate sexual content. Examples:- Explicit sexual material
- Pornographic content
- Sexual content in inappropriate contexts
S13: Elections
Content manipulating or interfering with elections. Examples:- Voter suppression
- Election fraud instructions
- Misinformation about elections
- Interference with democratic processes
S14: Code Interpreter Abuse
Attempts to abuse code execution capabilities. Examples:- Malicious code execution
- System access attempts
- Code injection
- Exploitation of code interpreters
Technical Indicators
In addition to threat categories, HipoCap also detects technical indicators:- instruction_injection - Direct injection of instructions
- contextual_blending - Blending malicious content with legitimate content
- function_call_attempt - Attempts to trigger function calls
- hidden_instructions - Instructions hidden in content
Attack Patterns
HipoCap identifies common attack patterns:- Contextual Blending - Malicious content blended with legitimate content
- Instruction Injection - Direct injection of malicious instructions
- Function Call Attempt - Attempts to trigger unauthorized function calls
Severity Levels
Threats are assigned severity levels:- Safe - No threats detected
- Low - Minor concerns, may require review
- Medium - Significant concerns, likely should be blocked
- High - Serious threats, should be blocked
- Critical - Severe threats, must be blocked
Viewing Threat Detection Results
Threat detection results are available in:- Dashboard: View blocked/allowed functions with threat indicators
- Traces: Detailed analysis of each function call with threat categorization
- API Response: Threat indicators included in
analyze()response
Policy Configuration
You can configure how each threat category is handled in your governance policies:Best Practices
- Block Critical Categories - Always block S1, S3, S4, S9, S11
- Customize by Function - Different functions may need different rules
- Monitor Patterns - Track which categories are most common in your use case
- Regular Updates - Keep threat detection rules updated
- Review Blocked Content - Regularly review blocked attempts to tune policies
Next Steps
- Setting up the Shield - Configure threat detection
- Prompt Injection Protection - Understand multi-stage analysis
- Governance Policies - Configure threat handling rules
