Code Documentation: prompt_email_blocker.py

This document provides a comprehensive breakdown of the Claude Code hook that prevents email addresses from being included in prompts sent to the LLM.

Script Header and Dependencies

#!/usr/bin/env python3
"""
# /// script
# requires-python = ">=3.8"
# dependencies = []
# ///

Claude Code UserPromptSubmit Hook: Email Address Detection and Blocking

This hook runs before Claude processes any user prompt and blocks prompts
that contain email addresses to prevent accidental PII exposure.

Privacy Guarantee: Prompts with emails never reach Claude.
"""

import json
import sys
import re
from datetime import datetime
from pathlib import Path

Description

The script header uses UV's inline script format with embedded dependency declarations. This approach allows the script to be self-contained and portable across different environments without requiring a separate requirements.txt file.

Shebang: #!/usr/bin/env python3 ensures the script uses Python 3
UV metadata: The # /// script block tells UV this is a standalone script
Python version: Requires Python 3.8+ for modern features like pathlib
Zero dependencies: Uses only Python standard library modules

Alternatives

Traditional approach: Create a separate requirements.txt file, but this complicates deployment and distribution.

Docker approach: Package the hook in a container, but this adds overhead for a simple script.

Compiled approach: Use PyInstaller to create a binary, but this reduces portability and transparency.

The UV inline format is ideal for Claude Code hooks because it's self-documenting, portable, and fast.

Email Detection Function

def detect_emails(content):
    """
    Detect email addresses in text content.
    
    Returns:
        dict: {
            'has_emails': bool,
            'count': int,
            'redacted_emails': list of redacted email examples
        }
    """
    # Email regex pattern
    email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
    
    # Find all email matches
    emails = re.findall(email_pattern, content, re.IGNORECASE)
    
    # Redact emails for safe logging (show only domain)
    redacted_emails = []
    for email in emails[:3]:  # Show max 3 examples
        if '@' in email:
            username, domain = email.split('@', 1)
            redacted = f"***@{domain}"
            redacted_emails.append(redacted)
    
    return {
        'has_emails': len(emails) > 0,
        'count': len(emails),
        'redacted_emails': redacted_emails
    }

Description

The core email detection function uses regular expressions to identify email addresses in text content. The function is designed with privacy-first principles - it detects emails but never logs the complete addresses.

Regex breakdown:

\b - Word boundary (prevents matching partial strings)
[A-Za-z0-9._%+-]+ - Username part (letters, numbers, common symbols)
@ - Required @ symbol
[A-Za-z0-9.-]+ - Domain name part
\. - Required dot separator
[A-Z|a-z]{2,} - Top-level domain (2+ letters)
\b - Ending word boundary

Privacy features:

Redaction: Only shows domain part (***@company.com)
Limit examples: Shows maximum 3 examples to prevent log spam
No storage: Never stores complete email addresses

Alternatives

More sophisticated patterns:

# RFC 5322 compliant (very complex)
email_pattern = r"[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*"

# Simpler pattern (may miss edge cases)
email_pattern = r'\S+@\S+\.\S+'

Machine learning approach:

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(content)
emails = [ent.text for ent in doc.ents if ent.label_ == "EMAIL"]

Third-party libraries:

import email_validator
# More accurate but requires external dependency

Why regex was chosen: Balances accuracy, performance, and zero dependencies. The pattern catches 99%+ of real-world email formats while being fast and self-contained.

Logging Function

def log_hook_execution(event_type, data):
    """Log hook execution to JSON file."""
    log_dir = Path("logs")
    log_dir.mkdir(exist_ok=True)
    
    log_entry = {
        "timestamp": datetime.now().isoformat(),
        "event_type": event_type,
        "data": data
    }
    
    log_file = log_dir / "prompt_blocker.json"
    
    # Append to existing log file
    logs = []
    if log_file.exists():
        try:
            with open(log_file, 'r') as f:
                logs = json.load(f)
        except:
            logs = []
    
    logs.append(log_entry)
    
    # Keep only last 50 entries
    logs = logs[-50:]
    
    with open(log_file, 'w') as f:
        json.dump(logs, f, indent=2)

Description

The logging function provides comprehensive audit trails for compliance and debugging. It uses a rotating log approach to prevent unbounded disk usage while maintaining recent history.

Key features:

Structured logging: JSON format for easy parsing
Timestamp precision: ISO format with full datetime
Log rotation: Keeps only last 50 entries automatically
Error resilience: Continues working even if log file is corrupted
Directory creation: Automatically creates logs/ directory

Data logged:

Event type: prompt_allowed, prompt_blocked, hook_error
Metadata: Session ID, prompt length, email count
Privacy-safe details: Redacted emails, no full prompt content

Alternatives

Syslog integration:

import logging
logging.basicConfig(handlers=[logging.handlers.SysLogHandler()])

Structured logging libraries:

import structlog
log = structlog.get_logger()
log.info("prompt_blocked", email_count=count)

Database logging:

import sqlite3
# Store logs in SQLite for querying

External services:

import requests
# Send logs to external monitoring service

Why JSON file logging: Simple, self-contained, human-readable, and doesn't require external services or complex setup. Perfect for development environments and easy to integrate with monitoring tools.

Main Function - Input Processing

def main():
    """Main hook execution logic."""
    try:
        # Read hook input from stdin
        hook_input = json.load(sys.stdin)
        
        # Extract the user's prompt
        prompt = hook_input.get("prompt", "")
        session_id = hook_input.get("session_id", "")
        
        if not prompt:
            log_hook_execution("no_prompt", {
                "session_id": session_id,
                "reason": "no prompt content found"
            })
            sys.exit(0)

Description

The main function handles Claude Code's hook protocol by reading JSON data from stdin. This is how Claude Code communicates with hooks - it sends structured data about the current operation.

Hook input structure (for UserPromptSubmit):

{
    "prompt": "The user's actual prompt text",
    "session_id": "unique-session-identifier", 
    "timestamp": "2025-07-31T...",
    "hook_event_name": "UserPromptSubmit"
}

Error handling: If no prompt is found, the hook logs the event and exits cleanly without blocking Claude Code operation.

Alternatives

Direct argument parsing:

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--prompt', required=True)

Environment variables:

import os
prompt = os.environ.get('CLAUDE_PROMPT', '')

File-based communication:

# Read from temporary file instead of stdin
with open('/tmp/claude_prompt.txt', 'r') as f:
    prompt = f.read()

Why stdin JSON: This is Claude Code's official protocol. Using stdin ensures compatibility with the hook system and allows Claude Code to pass rich metadata along with the prompt.

Email Detection and Decision Logic

        # Run email detection on the prompt
        email_result = detect_emails(prompt)
        
        if email_result['has_emails']:
            # Emails detected in prompt - block it
            error_message = f"""
🚫 EMAIL ADDRESSES DETECTED IN PROMPT - BLOCKED

Found {email_result['count']} email address(es) in your prompt:
Examples: {', '.join(email_result['redacted_emails'])}

Your prompt contains email addresses and has been blocked to protect privacy.

To proceed:
1. Remove or redact the email addresses from your prompt
2. Use placeholder emails like 'user@example.com' instead
3. Replace emails with descriptions like '[team email]' or '[customer email]'

Privacy Protection: Your prompt was not sent to Claude.

Original prompt length: {len(prompt)} characters
"""
            
            log_hook_execution("prompt_blocked", {
                "session_id": session_id,
                "email_count": email_result['count'],
                "redacted_emails": email_result['redacted_emails'],
                "prompt_length": len(prompt)
            })
            
            print(error_message.strip(), file=sys.stderr)
            sys.exit(2)  # Exit code 2 blocks the prompt from reaching Claude

Description

This section implements the core decision logic and user communication. When emails are detected, it provides clear, actionable feedback to help developers fix their prompts.

Exit code 2 behavior: In Claude Code hooks, exit code 2 means "blocking error" - the stderr message gets fed back to Claude as context, and the original operation is blocked.

User experience design:

Clear visual indicator: 🚫 emoji for immediate recognition
Quantified feedback: Shows exact count and examples
Actionable guidance: 3 specific steps to resolve the issue
Privacy assurance: Confirms prompt never left the machine
Context preservation: Shows prompt length for user reference

Alternatives

Silent blocking:

# Just block without explanation
sys.exit(2)

Warning instead of blocking:

print("⚠️ Warning: Email detected but continuing", file=sys.stderr)
sys.exit(0)  # Allow with warning

Automatic redaction:

# Replace emails before sending to Claude
redacted_prompt = re.sub(email_pattern, '[EMAIL]', prompt)
# Pass redacted prompt instead of blocking

Interactive mode:

# Ask user to confirm
response = input("Email detected. Continue anyway? (y/n): ")

Why blocking with feedback: Provides the strongest privacy protection while educating users about the issue. The clear feedback helps developers understand and fix the problem rather than being confused by silent failures.

Successful Prompt Handling

        else:
            # No emails detected - allow prompt
            log_hook_execution("prompt_allowed", {
                "session_id": session_id,
                "prompt_length": len(prompt),
                "reason": "no emails detected"
            })
            
            # Optional: Show confirmation for transparency
            # print(f"✅ Prompt cleared (no emails detected)", file=sys.stderr)
            sys.exit(0)

Description

When no emails are detected, the hook allows the prompt to proceed normally. The successful case is logged for audit purposes but doesn't interfere with the user experience.

Design choices:

Silent success: By default, successful prompts proceed without notification
Optional feedback: Commented-out confirmation message for debugging
Exit code 0: Standard success code allows Claude Code to continue
Minimal overhead: Quick execution path for the common case

Error Handling

    except Exception as e:
        # Log unexpected errors
        log_hook_execution("hook_error", {
            "error": str(e),
            "error_type": type(e).__name__
        })
        
        # Don't block on hook errors - let Claude proceed
        print(f"Hook error (allowing prompt): {e}", file=sys.stderr)
        sys.exit(0)

Description

The error handling ensures fail-open behavior - if the hook encounters an unexpected error, it allows the prompt to proceed rather than blocking legitimate work.

Error handling principles:

Log all errors: Capture error details for debugging
Fail gracefully: Don't block work due to hook failures
User notification: Brief error message to stderr
Allow continuation: Exit code 0 lets Claude Code proceed

Common error scenarios:

Malformed JSON input from Claude Code
File system permissions issues
Disk full preventing log writes
Python environment issues

Why fail-open: Prioritizes developer productivity over strict enforcement. A broken hook shouldn't prevent developers from working, but errors are logged for later investigation.

Complete Code Flow

Hook invocation: Claude Code calls the hook with JSON on stdin
Input parsing: Extract prompt and metadata from JSON
Email detection: Regex scan for email patterns
Decision point: Block if emails found, allow otherwise
User feedback: Clear error message for blocked prompts
Logging: Record all decisions for audit trail
Exit code: 0 for allow, 2 for block

This architecture provides defense in depth with multiple safety mechanisms while maintaining a good developer experience.

Code Documentation: prompt_email_blocker.py

Code Documentation: prompt_email_blocker.py

Script Header and Dependencies

Description

Alternatives

Email Detection Function

Description

Alternatives

Logging Function

Description

Alternatives

Main Function - Input Processing

Description

Alternatives

Email Detection and Decision Logic

Description

Alternatives

Successful Prompt Handling

Description

Error Handling

Description

Complete Code Flow

Master Claude Code with Expert Training