AI Safety Incident Classification
All incidents in the AI Incident Database have been processed using an LLM and classified according to the MIT Risk Repository causal and domain taxonomies then scored for harm-severity on 10 different dimensions based on the CSET AI Harm Taxonomy, using a scale to reflect impact from zero to 'worst-case catastrophe'.
This is intended as a proof of concept to explore the potential capabilities and limitations of a scalable incident analysis framework. Write-up of this work to follow, but in the meantime, please feel free to explore and share feedback
Home: Risk Classification - distribution of incident classifications (causal and domain) across the entire dataset
Record View - classification and harm severity scores for each individual record, including summary of reasoning and confidence in analysis
Timeline: Risk Classification - distribution of incident classifications by year
Timeline: Sub-domains - distribution of incident sub-domains by year
Timeline: High Severity Incidents - incidents with high direct harm severity scores by year
Timeline: High Severity Multiple Categories - incidents causing severe harm in more than one harm category
Timeline: Direct Harm Caused - distribution of harm severity scores by year
Information Quality: Assessment of confidence in classifications and whether the reports included adequate details.
Example outputs: