Safety / human safety

Purpose: To protect the user from harmful content or consequences and ensure that legal and other compliance obligations are in place and monitored.

Organisational / Technical	Measure
A. Safeguards against misuse and prohibited usage
TECH	AI models are trained on human-labelled datasets to automatically classify content into safety tiers at scale
TECH	Prohibited content during conversations is automatically detected and filtered
BOTH	Systems are in place to identify concerning behavioural patterns over time
BOTH	Measures are in place to identify and prevent attempts to circumvent safety systems
ORG	An escalation process is implemented and communicated from monitoring, to highlighting, to warning to user ban
ORG	Regular legal team assessments of moderation decisions are taking place
B. Tiered security system
BOTH	Tiered severity system is periodically reviewed, inspected, and updated
BOTH	Tier definitions are continuously assessed against emerging patterns
ORG	Safety categories are developed collaboratively by manual moderators, data team, legal team, and payment platform representatives
C. Human oversight and moderation
ORG	Curate datasets of examples labelled by human moderators to train the automated systems
ORG	Employ a team of human moderators and ensure supervision and further training on a regular basis

Source: AIOLIA deliverable 3.1