Moderation Systems
Our moderation pipeline combines automated detection, human review, and creator accountability to enforce policies at scale while minimizing errors.
Last updated: February 2026Three-Layer Architecture
Layer 1: Automated Detection
Machine learning models scan uploads for policy violations before content is published. Catches CSAM, spam, known harmful media, and dangerous content in real time.
Layer 2: Community Reporting
Users can report content, comments, messages, and accounts. Reports are prioritized by severity and volume. Repeated valid reporters receive higher trust scores.
Layer 3: Human Review
Every enforcement action is reviewed by a trained human moderator. Complex and borderline cases are escalated to senior Trust & Safety staff.
Moderation Queue
Reports and automated flags enter a structured queue:
Service Level Agreements
| Priority Level | Description | Response Target |
|---|---|---|
| Critical | CSAM, terrorism, imminent violence threats | < 1 hour |
| High | Harassment, hate speech, dangerous misinformation | < 4 hours |
| Medium | Spam, impersonation, misleading content | < 24 hours |
| Low | Minor policy violations, quality disputes | < 72 hours |
Automated Systems
- Image/video scanning: Hashing and ML classification for known harmful content (PhotoDNA integration for CSAM).
- Audio analysis: Copyright fingerprinting and harmful speech detection.
- Text classification: Real-time comment and message scanning for spam, hate speech, and threats.
- Behavioral analysis: Detection of coordinated inauthentic behavior, spam rings, and bot networks.
- Metadata analysis: Identifying suspicious upload patterns, account creation anomalies, and evasion tactics.
Human Moderator Standards
- All moderators complete 40+ hours of training before reviewing live cases.
- Regular calibration sessions ensure consistency across the team.
- Quality audits review a random sample of decisions weekly.
- Moderators have access to wellness support and mandatory break schedules.
- Complex or borderline cases require senior reviewer sign-off.
False Positive Handling
We recognize that automated systems and human reviewers can make mistakes. Our approach to minimizing harm from false positives:
- Automated removals for non-critical violations are held in a review queue before enforcement.
- Creators are notified immediately and can appeal with one click.
- False positive appeals receive expedited review (target: 24 hours).
- Repeated false positives on the same creator trigger a system recalibration review.