New Intro — Simon Mylius

Scalable AI Incident Classification

AI incident data could provide valuable safety learning to inform policy decisions but existing databases are often inconsistently structured or missing data. This project uses a large language model to process raw incident reports using a large language model, and classify the types of risk and severity of harm in multiple categories. The aim is to enrich datasets, presenting results graphically through a set of dashboards so that policymakers can explore trends and patterns to gain insights about the impacts of AI on society, which they could then use to inform policy decisions.

Over 4000 raw reports covering all 880 incidents in the AI Incident Database have been processed using an LLM and classified according to the MIT Risk Repository causal and domain taxonomies then scored for harm-severity on 10 different dimensions based on the CSET AI Harm Taxonomy, using a scale to reflect impact from zero to 'worst-case catastrophe'.

Important note on data and validity of analysis:
This classification database is intended to explore the potential capabilities and limitations of a scalable incident analysis framework. The classification analysis uses reports from the AI Incident Database (AIID) as input data which rely on submissions from the public and subject matter experts. The quality, reliability and depth of detail in the reports varies across the dataset. As the reporting is voluntary, the dataset is inevitably subject to some degree of sampling bias.
Therefore patterns and trends observed in the data should be taken as indicative and validated through further analysis.

The background to this work, the approach taken, preliminary results and next steps are discussed in this blog post.
All feedback welcome - to get in touch, help shape the direction of this work or sign up for updates, please use this feedback form.

Click through the links below to explore each of the interactive dashboards:

Risk Classification

Incident count by domain / subdomain
Proportion of all incidents by causal and domain risk categories
High-severity incidents by risk domain

Key Insights:

The risk domain with most reported incidents is '7 AI system safety, failures, & limitations' (30%)
Within that domain, the vast majority of reported incidents (227 of 240) were in the subdomain '7.3 Lack of capability or robustness'
Causality: 35% of reported incidents were tagged as intentionally caused.

Record View

Full analysis of each incident including
Risk classification (causal, domain)
Harm severity scores across 10 categories (reported direct, indirect and inferred)
Assessment of information quality/confidence in analysis

Key Insights:

Custom filtering by multiple fields to explore patterns and correlations

Link to Page

Feature 1
Feature 2
Feature 3

Key Insights:

Key insight 1
Key insight 2