Purpose: To protect the user from harmful content or consequences and ensure that legal and other compliance obligations are in place and monitored.
| Organisational / Technical | Measure |
|---|---|
| A. Safeguards against misuse and prohibited usage | |
| TECH | AI models are trained on human-labelled datasets to automatically classify content into safety tiers at scale |
| TECH | Prohibited content during conversations is automatically detected and filtered |
| BOTH | Systems are in place to identify concerning behavioural patterns over time |
| BOTH | Measures are in place to identify and prevent attempts to circumvent safety systems |
| ORG | An escalation process is implemented and communicated from monitoring, to highlighting, to warning to user ban |
| ORG | Regular legal team assessments of moderation decisions are taking place |
| B. Tiered security system | |
| BOTH | Tiered severity system is periodically reviewed, inspected, and updated |
| BOTH | Tier definitions are continuously assessed against emerging patterns |
| ORG | Safety categories are developed collaboratively by manual moderators, data team, legal team, and payment platform representatives |
| C. Human oversight and moderation | |
| ORG | Curate datasets of examples labelled by human moderators to train the automated systems |
| ORG | Employ a team of human moderators and ensure supervision and further training on a regular basis |
Source: AIOLIA deliverable 3.1