Threat Categories

HipoCap classifies threats into 14 categories (S1-S14) covering all major attack vectors. This reference guide explains each category with examples.

Quick Reference

Category	Name	Description
S1	Violent Crimes	Content promoting or describing violent acts
S2	Non-Violent Crimes	Content promoting illegal activities (fraud, theft, etc.)
S3	Sex-Related Crimes	Content related to sexual crimes or exploitation
S4	Child Sexual Exploitation	Content involving minors in sexual contexts
S5	Defamation	Content that defames or harms reputation
S6	Specialized Advice	Dangerous specialized advice (medical, legal, financial)
S7	Privacy	Content violating privacy or containing personal information
S8	Intellectual Property	Content violating copyright or intellectual property
S9	Indiscriminate Weapons	Content about weapons of mass destruction
S10	Hate	Content promoting hate speech or discrimination
S11	Suicide & Self-Harm	Content promoting self-harm or suicide
S12	Sexual Content	Inappropriate sexual content
S13	Elections	Content manipulating or interfering with elections
S14	Code Interpreter Abuse	Attempts to abuse code execution capabilities

Detailed Categories

S1: Violent Crimes

Content promoting or describing violent acts. Examples:

Instructions for violence
Threats of harm
Physical assault descriptions
Terrorism-related content

Example attack:

"How to build a bomb" or "Instructions for attacking someone"

S2: Non-Violent Crimes

Content promoting illegal activities that don’t involve violence. Examples:

Fraud schemes
Theft instructions
Hacking guides
Identity theft
Financial crimes

Example attack:

"How to commit credit card fraud" or "Stealing personal information"

Content related to sexual crimes or exploitation. Examples:

Sexual assault content
Non-consensual sexual content
Sexual exploitation

Example attack:

Content describing non-consensual sexual acts

S4: Child Sexual Exploitation

Content involving minors in sexual contexts. Examples:

Child exploitation material
Grooming content
Any sexual content involving minors

Example attack:

Content that sexualizes minors

S5: Defamation

Content that defames or harms reputation. Examples:

False accusations
Character assassination
Libelous content

Example attack:

"John Doe is a criminal" (without evidence)

S6: Specialized Advice

Content providing dangerous specialized advice. Examples:

Medical advice (diagnosis, treatment)
Legal advice (specific legal guidance)
Financial advice (investment recommendations)

Example attack:

"You should take this medication" or "Invest all your money in this stock"

S7: Privacy

Content violating privacy or containing sensitive information. Examples:

Sensitive keyword exposure
Privacy violations
Unauthorized data sharing
PII exposure

Example attack:

Sharing sensitive keywords or private information

S8: Intellectual Property

Content violating copyright or intellectual property. Examples:

Copyright infringement
Trademark violations
Patent violations
Plagiarism

Example attack:

Reproducing copyrighted material without permission

S9: Indiscriminate Weapons

Content about weapons of mass destruction. Examples:

Nuclear weapons
Biological weapons
Chemical weapons
Instructions for creating WMDs

Example attack:

"How to build a nuclear weapon"

S10: Hate

Content promoting hate speech or discrimination. Examples:

Racism
Sexism
Religious discrimination
Homophobia
Transphobia

Example attack:

Content promoting discrimination against protected groups

S11: Suicide & Self-Harm

Content promoting self-harm or suicide. Examples:

Suicide instructions
Self-harm encouragement
Methods for self-harm

Example attack:

"How to commit suicide" or encouraging self-harm

S12: Sexual Content

Inappropriate sexual content. Examples:

Explicit sexual material
Pornographic content
Sexual content in inappropriate contexts

Example attack:

Explicit sexual descriptions or pornographic material

S13: Elections

Content manipulating or interfering with elections. Examples:

Voter suppression
Election fraud instructions
Misinformation about elections
Interference with democratic processes

Example attack:

"How to rig an election" or spreading false election information

S14: Code Interpreter Abuse

Attempts to abuse code execution capabilities. Examples:

Malicious code execution
System access attempts
Code injection
Exploitation of code interpreters

Example attack:

"Execute this code to access the database" or code injection attempts

Technical Indicators

In addition to threat categories, HipoCap also detects technical indicators:

instruction_injection - Direct injection of instructions
contextual_blending - Blending malicious content with legitimate content
function_call_attempt - Attempts to trigger function calls
hidden_instructions - Instructions hidden in content

Attack Patterns

HipoCap identifies common attack patterns:

Contextual Blending - Malicious content blended with legitimate content
Instruction Injection - Direct injection of malicious instructions
Function Call Attempt - Attempts to trigger unauthorized function calls

Severity Levels

Threats are assigned severity levels:

Safe - No threats detected
Low - Minor concerns, may require review
Medium - Significant concerns, likely should be blocked
High - Serious threats, should be blocked
Critical - Severe threats, must be blocked

Viewing Threat Detection Results

Threat detection results are available in:

Dashboard: View blocked/allowed functions with threat indicators
Traces: Detailed analysis of each function call with threat categorization
API Response: Threat indicators included in analyze() response

Example:

result = client.analyze(
    function_name="search_web",
    function_result={"query": "confidential data"},
    user_query="Please search for confidential information",
    policy_key="default"
)

if result.get("threat_indicators"):
    print(f"Threats detected: {result['threat_indicators']}")
    # Example output: ["S7", "function_call_attempt", "instruction_injection"]

Policy Configuration

You can configure how each threat category is handled in your governance policies:

{
  "severity_rules": {
    "S1": {
      "action": "BLOCK",
      "severity_threshold": "low"
    },
    "S7": {
      "action": "BLOCK",
      "severity_threshold": "medium"
    }
  }
}

Best Practices

Block Critical Categories - Always block S1, S3, S4, S9, S11
Customize by Function - Different functions may need different rules
Monitor Patterns - Track which categories are most common in your use case
Regular Updates - Keep threat detection rules updated
Review Blocked Content - Regularly review blocked attempts to tune policies

Next Steps

Setting up the Shield - Configure threat detection
Prompt Injection Protection - Understand multi-stage analysis
Governance Policies - Configure threat handling rules

Introduction

AI Security

Governance & RBAC

Observability

Quick Reference

Detailed Categories

S1: Violent Crimes

S2: Non-Violent Crimes

S4: Child Sexual Exploitation

S5: Defamation

S6: Specialized Advice

S7: Privacy

S8: Intellectual Property

S9: Indiscriminate Weapons

S10: Hate

S11: Suicide & Self-Harm

S12: Sexual Content

S13: Elections

S14: Code Interpreter Abuse

Technical Indicators

Attack Patterns

Severity Levels

Viewing Threat Detection Results

Policy Configuration

Best Practices

Next Steps

Introduction

AI Security

Governance & RBAC

Observability

Documentation Index

​Quick Reference

​Detailed Categories

​S1: Violent Crimes

​S2: Non-Violent Crimes

​S3: Sex-Related Crimes

​S4: Child Sexual Exploitation

​S5: Defamation

​S6: Specialized Advice

​S7: Privacy

​S8: Intellectual Property

​S9: Indiscriminate Weapons

​S10: Hate

​S11: Suicide & Self-Harm

​S12: Sexual Content

​S13: Elections

​S14: Code Interpreter Abuse

​Technical Indicators

​Attack Patterns

​Severity Levels

​Viewing Threat Detection Results

​Policy Configuration

​Best Practices

​Next Steps

Quick Reference

Detailed Categories

S1: Violent Crimes

S2: Non-Violent Crimes

S3: Sex-Related Crimes

S4: Child Sexual Exploitation

S5: Defamation

S6: Specialized Advice

S7: Privacy

S8: Intellectual Property

S9: Indiscriminate Weapons

S10: Hate

S11: Suicide & Self-Harm

S12: Sexual Content

S13: Elections

S14: Code Interpreter Abuse

Technical Indicators

Attack Patterns

Severity Levels

Viewing Threat Detection Results

Policy Configuration

Best Practices

Next Steps