TutorialsAdvancedContent Safety and Filtering
IntermediateUpdated Dec 18, 2025

Content Safety and Filtering

Implement robust content moderation using AI to ensure user-generated content meets safety standards.

MP
Maya Patel
API Architect
7 min read

Why Content Safety Matters

AI-generated content needs moderation to:

  • Prevent harmful outputs
  • Comply with regulations
  • Protect your brand
  • Create safe user experiences

Built-in Safety Features

Abstrakt models include automatic safety filtering:

python
result = client.run("fal-ai/flux/dev", {
    "input": {
        "prompt": "A beautiful landscape",
        "safety_checker": True  # Enabled by default
    }
})

# Check if content was flagged
if result.nsfw_detected:
    print("Content was filtered")

Pre-Generation Filtering

Check prompts before generation:

python
def check_prompt_safety(prompt):
    """Check if prompt is safe before generation."""
    
    result = client.run("fal-ai/content-moderation", {
        "input": {"text": prompt}
    })
    
    return {
        "safe": result.safe,
        "categories": result.flagged_categories,
        "scores": result.category_scores
    }

# Usage
safety = check_prompt_safety(user_prompt)

if not safety["safe"]:
    raise ValueError(f"Unsafe content detected: {safety['categories']}")

Post-Generation Filtering

Check generated content:

python
def moderate_image(image_url):
    """Moderate a generated image."""
    
    result = client.run("fal-ai/image-moderation", {
        "input": {"image_url": image_url}
    })
    
    return {
        "safe": result.safe,
        "nsfw_score": result.nsfw_score,
        "violence_score": result.violence_score,
        "categories": result.flagged_categories
    }

Safety Categories

CategoryDescription
sexualAdult/explicit content
violenceGraphic violence
hateDiscriminatory content
self-harmSelf-injury content
illegalIllegal activities
harassmentBullying/harassment

Custom Blocklists

Create custom word/phrase blocklists:

python
BLOCKED_TERMS = [
    "specific_term_1",
    "specific_term_2",
    # Add your terms
]

def check_blocklist(prompt):
    """Check against custom blocklist."""
    prompt_lower = prompt.lower()
    
    for term in BLOCKED_TERMS:
        if term in prompt_lower:
            return False, term
    
    return True, None

# Usage
is_safe, blocked_term = check_blocklist(user_prompt)
if not is_safe:
    raise ValueError(f"Blocked term detected: {blocked_term}")

Implementing a Safety Pipeline

python
class SafetyPipeline:
    def __init__(self, client):
        self.client = client
        self.blocked_terms = set()
    
    def add_blocked_terms(self, terms):
        self.blocked_terms.update(terms)
    
    async def generate_safely(self, prompt):
        # Step 1: Check blocklist
        if not self._check_blocklist(prompt):
            raise SafetyError("Blocked term detected")
        
        # Step 2: AI moderation
        safety = await self._check_ai_safety(prompt)
        if not safety["safe"]:
            raise SafetyError(f"Unsafe: {safety['categories']}")
        
        # Step 3: Generate
        result = await self.client.run_async("fal-ai/flux/dev", {
            "input": {"prompt": prompt, "safety_checker": True}
        })
        
        # Step 4: Post-generation check
        if result.nsfw_detected:
            raise SafetyError("Generated content flagged")
        
        return result
    
    def _check_blocklist(self, prompt):
        return not any(term in prompt.lower() for term in self.blocked_terms)
    
    async def _check_ai_safety(self, prompt):
        result = await self.client.run_async("fal-ai/content-moderation", {
            "input": {"text": prompt}
        })
        return {"safe": result.safe, "categories": result.flagged_categories}

Handling Flagged Content

python
@app.post("/generate")
async def generate_endpoint(request: GenerateRequest):
    try:
        result = await safety_pipeline.generate_safely(request.prompt)
        return {"image_url": result.images[0].url}
    
    except SafetyError as e:
        # Log for review
        log_safety_incident(request.user_id, request.prompt, str(e))
        
        return {
            "error": "content_policy_violation",
            "message": "Your request couldn't be processed due to our content policy."
        }

Logging & Monitoring

python
def log_safety_incident(user_id, prompt, reason):
    """Log safety incidents for review."""
    incident = {
        "user_id": user_id,
        "prompt": prompt,
        "reason": reason,
        "timestamp": datetime.utcnow(),
        "reviewed": False
    }
    
    # Save to database
    db.safety_incidents.insert_one(incident)
    
    # Alert if needed
    if should_alert(user_id):
        send_alert(f"Multiple safety violations from {user_id}")

Best Practices

  1. Layer defenses - Use multiple checks
  2. Log everything - Enable review and improvement
  3. Human review - Have moderators review edge cases
  4. Update regularly - Keep blocklists current
  5. Clear policies - Communicate guidelines to users

Next Steps

#safety#moderation#content-filtering#trust