Root Cause Analysis of AI Safety Incidents
Summary
The application of Root Cause Analysis tools that are widely used in other areas of engineering could provide insights into the causes of AI safety incidents that would reduce the likelihood of recurrence and support the prioritisation of risk mitigation and harm reduction measures.
This post proposes and demonstrates an approach for using language models to process incident reports and make inferences as to potential causes.
The approach, if used with due consideration to its capabilities and shortcomings, could offer a scalable methodology to aggregate causality data from historic incidents and potentially from modelled future scenarios, in order to contribute to a reduction in the numbers of future safety incidents and the severity of harm caused by them.
Note on feedback:
This is early stage work, based on my project within the BlueDot Impact AI Safety Fundamentals course. I will continue to develop it, testing out the validity of the approach and assessing the quality of the outputs.
Feedback is very welcome.
Problem Statement
Data-driven prioritisation of risk mitigation and harm reduction measures (technical and governance) would promote focusing effort and resources to have the greatest impact on reducing the likelihood and severity of harm from future AI safety incidents and systematically address the most significant vulnerabilities.
It is difficult to prioritise effort on developing interventions because there is currently limited data and analysis available on root causes of AI safety incidents.
The AI Incident Database (AIID) and other resources capture details of incidents - an opportunity exists to provide valuable insights through Root Cause Analysis.
Existing Causality Analysis
The AIID classifies failure cause according to the Goals, Methods, Failures (GMF) taxonomy providing a high level indication of cause areas.
From AI Incident Database: Known AI Technical Failures
Analysis at this level is valuable to identify high level cause areas on which to focus, however it does not support a systematic evaluation of causes that takes into consideration combinations of more than one cause contributing to incidents. It would be insightful to dig deeper into the reports to understand the make-up of the large group of causes labelled “All Others” - see note below on taxonomies.
Root Cause Analysis
Root Cause Analysis (RCA) is a methodical approach developed within a Systems Engineering framework. The intention is to explore and understand the causal factors that trigger unintended consequences. It is widely used in industries where failures could result in significant harm or safety issues and where system complexity makes problem diagnosis non-obvious.
Examples of RCA in ‘High-hazard industries’
Safety record improvements have been attributed to RCA in industries where significant safety risks need to be managed - a few examples include:
Rail - the RCA of the 1988 Clapham Junction rail crash triggered implementation of rigorous testing protocols for signalling equipment and the implementation of the 1994 Railways (Safety Case) Regulations as well as establishment of industry wide Safety Management Systems.
Automotive - ‘Unintended acceleration’ events in Toyota cars: RCA resulted in U.S. National Highway Traffic Safety Administration proposing new brake throttle override standards.
Nuclear - RCA following the 1979 partial meltdown of the Three Mile Island reactor resulted in improvements to process, design standards and regulations including the Nuclear Regulatory Commission introducing requirements for response plans, regular safety drills and on-site tech support centres.
Aerospace - following the Challenger Space Shuttle, NASA’s safety protocols were overhauled and the Office of Safety and Mission Assurance was created.
In these and many other industries (including aviation, chemical, oil & gas, healthcare), the application of Root Cause Analysis within a wider Systems Engineering approach has contributed to the adoption of a safety culture and a marked improvement in the safety record of these industries.
RCA Techniques
A number of techniques have been developed, each with different benefits and they can be used in a complementary manner within frameworks such as 8D. The ones I will focus on here are:
Five Whys Analysis - repeatedly asks 'why' in order to uncover potential causes at successively deeper levels. It is useful to probe issues that may not be immediately obvious, however only works in one dimension and does not take into consideration multiple interacting factors, which can lead to an oversimplified understanding of complex problems.
Ishikawa or Fishbone Diagram - is a simple visualisation used to explore multiple potential causes that may contribute to a failure or incident, categorised into cause-areas.
Fault Tree Analysis is a deductive method used to identify potential root causes by mapping out the various logical relationships between sub-events leading to a top-level incident. The logic gates (AND, OR) connect basic events to higher level events in a tree-like structure helping to visualize and analyze the pathways to the failure.
Opportunity
Having spoken with people responsible for incident databases and others working in AI Safety Policy and in research across the sector, it seems that whilst incident data is available, there is limited systematic work on causality analysis.
Creating a framework to apply these techniques to analyse AI safety incidents is intended to:
Improve the ability to identify actual and potential causes in order to avoid them in future
Generate collated datasets from multiple incidents to understand commonalities and themes
Visually present the output of the analysis in a consistent manner making it easier to extract salient points
Use the output of example analyses to create case-studies that can be used to communicate different types of risks, causes and explore potential interventions
Dependent on high quality input data, quantify likelihood of different potential causes and how they contribute in combination to likelihood of incidents.
To discover causal relationships that had not previously been identified.
Data Sources
Incident data has been collated and made public by a number of organisations including:
Methodologies used by incident databases vary - some relying on automated submission of news-reports, others with human editors reviewing reports submitted to them through networks of contributors.
A recurring theme is that there are backlogs of reported incidents awaiting review to ensure accurate/reliable data is posted.
Scalability
With the rapid growth in the deployment and use of AI systems, the number of reported safety incidents has increased enormously and can be expected to continue for the foreseeable future.
Any RCA methodology would be much more valuable if it is scalable in order to keep up with the increasing number of incidents. This has prompted exploration of the use of Language Models to perform RCA in place of a team of humans.
Theory of Change
This theory of change captures how the RCA framework could contribute to the high level goal of: Reduction in the number and severity of harm of future AI Safety Incidents
Assumptions (parentheses in the above diagram)
The proposed inputs, outputs, outcomes and goals in this Theory of Change are based on a number of assumptions which must be tested robustly to evaluate the validity and usefulness of this approach - further work to come on this. They include:
Tools and frameworks such as Fault Tree Analysis, Five Whys, Ishikawa Diagrams can actually provide insight to understanding AI Safety incidents. The quality of the output can be evaluated (e.g. compared with what a human RCA team would achieve).
Available incident reports contain adequate accurate and reliable details of incidents for analysis to be meaningful
Scenario modelling is sufficiently robust for RCA to be meaningful (high quality simulation)
A dataset of machine-generated Root Cause Analyses would be sufficient as inputs to a meta-analysis in order to gain some genuine understanding
Evidence is available mapping effectiveness of interventions against causes of harm
Confidence in the validity of the analyses would be sufficient for researchers or policymakers as a basis for further work or policy decisions
There is a need for this type of analysis that is not already being adequately covered/addressed
There is a demand for a tool/framework to allow others to more easily conduct this type of analysis
Developers/deployers are willing to share data about systems involved in incidents and/or use third party tools for RCA
Improved understanding of root causes translates into more effective policy
Improved understanding of causes of AI risks results in safer AI systems
Root Cause Analysis will contribute to an increase in overconfidence in the safety of systems which reduces the perceived imperative for governments and AI companies to act cautiously.
Proof of Concept
I have used GPT4o combined with a Python script and some visualisation tools to take as an input incident reports from the AI Incident Database.
The program runs Five Whys, Ishikawa and Fault Tree Analysis and outputs a report including some commentary:
Example outputs for each of the types of analysis are shown below, using as an example input, Incident 74 from the AI Incident Database.
Five Whys Analysis
Problem Statement:
Incident 74: The Detroit Police Department wrongfully arrested Robert Julian-Borchak Williams after facial recognition technology mistakenly identified him as a suspect.
Application of the Five Whys:
Why did the facial recognition technology mistakenly identify Robert as the suspect?
The facial recognition system had higher error rates when identifying black individuals.
Why did the system have higher error rates for black individuals?
The algorithm was trained on a dataset that did not have a sufficient variety of images of people from different racial backgrounds.
Why did the training dataset not have a sufficient variety of images?
The dataset was compiled without considering the need for racial diversity to ensure accuracy across different demographics.
Why was the need for racial diversity in the dataset not considered?
There was a lack of awareness or oversight regarding the impact of dataset diversity on the accuracy of facial recognition technology.
Why was there a lack of awareness or oversight regarding this impact?
The development and deployment guidelines lacked comprehensive standards and requirements that ensured ethical AI practices, including the necessity for diverse data to prevent bias.
Conclusion of Five Whys Analysis:
The wrongful arrest occurred primarily due to biased algorithmic output from the facial recognition technology, which in turn was caused by insufficiently diverse training data. This was a result of the lack of stringent standards and ethical considerations in the AI development and deployment process, particularly concerning the representation of different racial groups in training datasets. This analysis points to a need for better regulatory frameworks, greater awareness of AI ethics, and more rigorous testing against bias in AI systems used in law enforcement and other critical applications.
Ishikawa Diagram
Fault Tree Analysis
Is / Is Not…
This approach to Root Cause Analysis is not:
For the avoidance of doubt, I am not claiming that this type of analysis can definitively determine root causes for incidents.
The quality of a Root Cause Analysis depends on the quality of the information available and the knowledge and understanding of the team conducting the analysis. Typically an RCA following an incident would be done by a team including people with knowledge of the incident as well as (independent) subject matter experts. Using a language model instead, relies on the content of the input data plus the knowledge implicit in the language model.
This limitation is acknowledged and reviews of outputs will be necessary to ensure the analysis does not hallucinate assertions about a specific incident based on generalisations from training data.
This approach to Root Cause Analysis is/could be:
Rather, this type of analysis could be used to explore and capture factors that may have contributed. My hope is that it could be a shortcut to identifying potential causes that had previously not been identified, which can be used as a starting point for further analysis (e.g. by human teams).
Further development of the combinatorial decomposition from Fault Trees, to identify Minimal Cutsets (the smallest sets of base events necessary for an incident to occur) could be combined with a gap analysis in intervention measures to ensure that for every risk, at least one of the requisite vulnerabilities has been blocked (Swiss Cheese Model of Accident Causation).
Data Quality
Based on the quality and depth of information available in current publicly available reports which are frequently based on mainstream news feeds, the quality of the output will be limited.
Looking ahead it may be possible to work more closely with either AI companies, regulators or independent agencies (e.g. evaluation/audit agencies) and access far more and better quality data, enabling deeper analysis and higher confidence in the assertions made.
Assuming that only a portion of incidents are reported, the resulting data does not reflect the true frequency and severity of risks, leading to an underreporting bias. Where reports originate from mainstream media, we might expect the proportion of unreported incidents to increase as such stories become less newsworthy. This underreporting bias needs to be considered before drawing conclusions from aggregated data.
To maximise benefits from RCA of incidents using this approach, a culture that encourages transparency around potentially harmful AI incidents needs to be instilled within and outside the industry (e.g. the FAA’s voluntary reporting program for safety incidents in aviation). Legislation stipulating that incidents must be reported in detail by AI companies and third parties who become aware of them could go some way towards addressing the underreporting issues, as well as improving data quality if it comes from accountable sources.
As noted in the Scalability section above, the number of incidents is expected to keep increasing. At present, AIID has an editing team reviewing reports that come in ahead of classifying and posting incidents. This human intervention provides some assurance as to data quality and we have to recognise that the assertions made by the LM are based on the classifications and comments made by the editors.
It would be interesting to apply RCA to ‘unfiltered’ reports and assess whether the framework can recognise incomplete/inadequate input data or reports from unreliable sources and make only valid assertions, potentially including a confidence rating to indicate how well supported each point is by evidence from the dataset.
Taxonomies
The application of taxonomies, such as the CSET v1 / GMF taxonomies used by the AI Incident Database and the taxonomy used by the AVID database provides some consistency and supports high-level pattern identification. The proposed RCA approach could provide a wider range of classifications of potential causes of harm.
Cluster Analysis
The AI Incident database has already implemented Natural Language Processing to identify similarities in the text comprising incident reports in order to group them into a spatial visualisation.
By applying a similar cluster analysis to the outputs of the RCA framework applied to datasets comprised of numerous incident reports, there exists an opportunity to identify correlations and patterns in causes at a lower level. Data could be captured in a vector database, supporting a range of queries. Identification of outliers could suggest causal relationships that had not been previously understood, prompting further research.
An increased understanding of patterns where multiple causes interact to increase the likelihood of incidents would provide data on which to base future policy or intervention decisions. Whilst recognising that AI progress is non-linear, and that extrapolation from historic trends cannot be relied on to predict future scenarios, this type of analysis may still provide valuable early signals of changes in the distribution of causes of potential harm.
Analysis of Simulated Incidents/Scenarios
Besides reviewing historic data, there may also be value in applying machine-generated RCA to simulated future scenarios, particularly to understand combinatorial factors and whether interventions that prevent a single cause would prevent the top level incident from occurring. Once again data quality will be key to getting valuable outputs - this will require a high fidelity simulation model.
Intervention Modelling
By combining these analyses with data that assesses the efficacity of different interventions at preventing specific events (causes of AI safety incidents), it would be possible to model which interventions would have prevented the incident. This could feed into a cost-benefit analysis supporting prioritisation work to channel resources to development of the interventions that will have the greatest impact.
Unfortunately such data, mapping interventions against risks, appears to be sparse- I would be grateful to hear more about any systematic or quantitative work in this space.
Validation and Next Steps
In order to determine how useful this approach could be, I plan to compare LM generated RCAs with RCAs for the same incident reports conducted by human teams. It will be useful to understand where the strengths and weaknesses of the LM analysis lie.
It is worth bearing in mind that if there are flaws in the analysis using currently available language models, as capabilities and performance of LMs improve in the coming months and years, we can expect quality of analysis to improve and with it the value of insights that this approach could bring.Prompt engineering has already shown significant improvements in the quality of the outputs and I will continue to explore how to get the highest degree of confidence in analyses, as well as indications of where there was inadequate data to draw useful conclusions.
Besides testing out the assumptions in the Theory of Change above, I plan to sketch out how stakeholders could interact with this framework and explore with people working in the sector what would make it more useful to them, particularly with a view to informing policy decisions.
I will create a dataset of RCAs for a number of incidents and use cluster analysis to explore groupings, correlations and outliers.
Looking forward to receiving more feedback!
Appendix
Link to example Full RCA Report generated by the framework for AIID Incident 74, including:
Five Whys
Ishikawa Diagram
Fault Tree Analysis
Risk Mitigation and Harm Reduction Recommendations
Assessment of confidence in analysis and adequacy of data
The report should be seen as a demonstration of the framework and what is possible. The quality of the analysis itself and a review of ways to improve it is a subject for future work.
Revisions
11 June 2024 - Added example report as Appendix. Replaced FTA and Ishikawa diagrams in blog to match those in report for consistency.