Skip to main content
This is a research prototype. The data and analyses are preliminary and not yet validated — we'd welcome your .
Experimental. This compares how each corpus describes causes — not whether risk forecasts come true. Differences are driven as much by what researchers choose to study and how incidents get reported and coded as by anything substantive, so read them as differences in emphasis, not evidence that predictions are wrong.

Risks vs Incidents: Cause Mismatch

How two AI-risk corpora describe the causes of harm, per subdomain — the risks catalogued from academic and policy sources versus the incidents logged in real-world reports. The chart surfaces where the two place different emphasis on who caused a harm and with what intent. Percentages use coded-only denominators, excluding “Not coded” records.

Each subdomain row is a dumbbell on a shared 0–100% axis. For every causal value, a hollow dot marks the share among risks and a filled dot the share among incidents; the connecting line is the mismatch — the longer it is, the more the incident record emphasizes that cause differently from the risk literature. The strip at the top shows the systematic shift of each value across all reliable subdomains. Dot size reflects incident sample size; click any row for the full breakdown.
Show low sample sizes
HumanAI systemOther|RisksIncidents

Systematic shift, risks → incidents

median gap across 15 reliable subdomains
Human
-11.3pp
AI system
+28.8pp
Other
-15.9pp
← fewer in incidentsmore in incidents →

Discrimination & Toxicity

1.2 Toxic content

76 coded risks · 90 coded incidents · 66% coded

1.3 Unequal performance

17 coded risks · 34 coded incidents

1.1 Discrimination

82 coded risks · 118 coded incidents

Privacy & Security

2.2 AI security vulnerabilities

111 coded risks · 21 coded incidents

2.1 Loss of privacy

77 coded risks · 88 coded incidents

Misinformation

3.1 False information

53 coded risks · 187 coded incidents

Malicious Actors & Misuse

4.1 Disinformation & influence

82 coded risks · 135 coded incidents

4.2 AI weapons & cyberattacks

80 coded risks · 13 coded incidents

4.3 AI fraud & scams

77 coded risks · 394 coded incidents

Human-Computer Interaction

5.1 Overreliance & unsafe use

60 coded risks · 36 coded incidents

Socioeconomic & Environmental

6.1 Power centralization

51 coded risks · 6 coded incidents

6.2 Inequality & unemployment

54 coded risks · 6 coded incidents

6.3 Devaluation of human creativity

31 coded risks · 5 coded incidents

AI System Safety, Failures & Limitations

7.3 Capability & robustness

123 coded risks · 302 coded incidents

7.4 Transparency & interpretability

41 coded risks · 5 coded incidents

Key Takeaways

  • 1.1.2 Toxic content has the largest reliable mismatch (Unintentional +52.6pp in intent).
  • 2.Incidents lean more AI system relative to the risk literature in 80% of reliable subdomains (median gap: +28.8pp).
  • 3.Incidents lean more Intentional relative to the risk literature in 87% of reliable subdomains (median gap: +14.4pp).
  • 4.9 subdomains with fewer than 5 incidents are confidence-weighted. Percentages use coded-only denominators (excluding "Not coded" records).

Why the gaps aren’t “prediction errors.” The two corpora are sampled by opposite filters — the risk literature by research attention, the incident record by what gets publicly reported and submitted — and are coded by different teams against different source material. A risk is also an abstract claim while an incident is a specific event, so their distributions are not strictly commensurable. Timing is omitted because an incident is post-deployment by definition, leaving no meaningful comparison. Treat this as an exploratory map of where research concern and the documented incident record diverge, not as a measure of forecast accuracy.