Scalable AI Incident Classification
AI incident data could provide valuable safety learning to inform policy decisions but existing databases are often inconsistently structured or missing data. This project uses a large language model to process raw incident reports using a large language model, and classify the types of risk and severity of harm in multiple categories. The aim is to enrich datasets, presenting results graphically through a set of dashboards so that policymakers can explore trends and patterns to gain insights about the impacts of AI on society, which they could then use to inform policy decisions.
Over 4000 raw reports covering all 880 incidents in the AI Incident Database have been processed using an LLM and classified according to the MIT Risk Repository causal and domain taxonomies then scored for harm-severity on 10 different dimensions based on the CSET AI Harm Taxonomy, using a scale to reflect impact from zero to 'worst-case catastrophe'.
Important note on data and validity of analysis:
This classification database is intended to explore the potential capabilities and limitations of a scalable incident analysis framework. The classification analysis uses reports from the AI Incident Database (AIID) as input data which rely on submissions from the public and subject matter experts. The quality, reliability and depth of detail in the reports varies across the dataset. As the reporting is voluntary, the dataset is inevitably subject to some degree of sampling bias.
Therefore patterns and trends observed in the data should be taken as indicative and validated through further analysis.
The background to this work, the approach taken, preliminary results and next steps are discussed in this blog post.
All feedback welcome - to get in touch, help shape the direction of this work or sign up for updates, please use this feedback form.
Click through the links below to explore each of the interactive dashboards:
Key Insights:
|
|
Key Insights:
|
|
Key Insights:
|
|