IntermediateUpdated Dec 18, 2025
Content Safety and Filtering
Implement robust content moderation using AI to ensure user-generated content meets safety standards.
MP
Maya Patel
API Architect
7 min read
Why Content Safety Matters
AI-generated content needs moderation to:
- Prevent harmful outputs
- Comply with regulations
- Protect your brand
- Create safe user experiences
Built-in Safety Features
Abstrakt models include automatic safety filtering:
python
result = client.run("fal-ai/flux/dev", {
"input": {
"prompt": "A beautiful landscape",
"safety_checker": True # Enabled by default
}
})
# Check if content was flagged
if result.nsfw_detected:
print("Content was filtered")Pre-Generation Filtering
Check prompts before generation:
python
def check_prompt_safety(prompt):
"""Check if prompt is safe before generation."""
result = client.run("fal-ai/content-moderation", {
"input": {"text": prompt}
})
return {
"safe": result.safe,
"categories": result.flagged_categories,
"scores": result.category_scores
}
# Usage
safety = check_prompt_safety(user_prompt)
if not safety["safe"]:
raise ValueError(f"Unsafe content detected: {safety['categories']}")Post-Generation Filtering
Check generated content:
python
def moderate_image(image_url):
"""Moderate a generated image."""
result = client.run("fal-ai/image-moderation", {
"input": {"image_url": image_url}
})
return {
"safe": result.safe,
"nsfw_score": result.nsfw_score,
"violence_score": result.violence_score,
"categories": result.flagged_categories
}Safety Categories
| Category | Description |
|---|---|
| sexual | Adult/explicit content |
| violence | Graphic violence |
| hate | Discriminatory content |
| self-harm | Self-injury content |
| illegal | Illegal activities |
| harassment | Bullying/harassment |
Custom Blocklists
Create custom word/phrase blocklists:
python
BLOCKED_TERMS = [
"specific_term_1",
"specific_term_2",
# Add your terms
]
def check_blocklist(prompt):
"""Check against custom blocklist."""
prompt_lower = prompt.lower()
for term in BLOCKED_TERMS:
if term in prompt_lower:
return False, term
return True, None
# Usage
is_safe, blocked_term = check_blocklist(user_prompt)
if not is_safe:
raise ValueError(f"Blocked term detected: {blocked_term}")Implementing a Safety Pipeline
python
class SafetyPipeline:
def __init__(self, client):
self.client = client
self.blocked_terms = set()
def add_blocked_terms(self, terms):
self.blocked_terms.update(terms)
async def generate_safely(self, prompt):
# Step 1: Check blocklist
if not self._check_blocklist(prompt):
raise SafetyError("Blocked term detected")
# Step 2: AI moderation
safety = await self._check_ai_safety(prompt)
if not safety["safe"]:
raise SafetyError(f"Unsafe: {safety['categories']}")
# Step 3: Generate
result = await self.client.run_async("fal-ai/flux/dev", {
"input": {"prompt": prompt, "safety_checker": True}
})
# Step 4: Post-generation check
if result.nsfw_detected:
raise SafetyError("Generated content flagged")
return result
def _check_blocklist(self, prompt):
return not any(term in prompt.lower() for term in self.blocked_terms)
async def _check_ai_safety(self, prompt):
result = await self.client.run_async("fal-ai/content-moderation", {
"input": {"text": prompt}
})
return {"safe": result.safe, "categories": result.flagged_categories}Handling Flagged Content
python
@app.post("/generate")
async def generate_endpoint(request: GenerateRequest):
try:
result = await safety_pipeline.generate_safely(request.prompt)
return {"image_url": result.images[0].url}
except SafetyError as e:
# Log for review
log_safety_incident(request.user_id, request.prompt, str(e))
return {
"error": "content_policy_violation",
"message": "Your request couldn't be processed due to our content policy."
}Logging & Monitoring
python
def log_safety_incident(user_id, prompt, reason):
"""Log safety incidents for review."""
incident = {
"user_id": user_id,
"prompt": prompt,
"reason": reason,
"timestamp": datetime.utcnow(),
"reviewed": False
}
# Save to database
db.safety_incidents.insert_one(incident)
# Alert if needed
if should_alert(user_id):
send_alert(f"Multiple safety violations from {user_id}")Best Practices
- Layer defenses - Use multiple checks
- Log everything - Enable review and improvement
- Human review - Have moderators review edge cases
- Update regularly - Keep blocklists current
- Clear policies - Communicate guidelines to users
Next Steps
- Set up webhooks for async moderation
- Learn RAG patterns
- Explore API basics
#safety#moderation#content-filtering#trust