When AI Generates Wrong Alt Text on Academic Figures, It's a Section 504 Risk
An AI scanner that hallucinates 'graph of student enrollment' on a chemistry diagram does not just fail accessibility — it actively misinforms one student in a way no other student is misinformed.
For procurement officers and general counsel evaluating accessibility-AI vendors, the most under-discussed failure mode has nothing to do with conformance benchmarks. It is the moment an AI tool generates plausible, fluent, well-formed alt text that misdescribes the figure it is attached to — and a disabled student receives that alt text as the operative description of an academic image their non-disabled peers are seeing in full. This is not an accessibility quality problem. It is, under the correct legal framing, a disparate-treatment problem under Section 504 of the Rehabilitation Act and Title II of the ADA. The broader argument that human-in-the-loop is the only defensible model rests in part on this specific failure mode — see the defensibility standard for the full thesis. This article narrows in on why the alt-text case is the strongest single example.
Consider three concrete cases. A chemistry textbook figure showing the Markovnikov addition of HBr to 2-methylpropene, with the carbocation intermediate clearly labelled, is described by a general-purpose vision model as "a diagram of molecular structures." A statistical chart showing a cohort retention curve is described as "a line graph showing data over time." An anatomical illustration of the ventricular conduction system is described as "an illustration of a human heart." Each description is grammatically fluent. Each is what a vision model trained primarily on photographs and stock illustrations is best at producing. And each is wrong in a specific, educationally consequential way: the disabled student receives a description that does not contain the information the figure was published to convey.
A scanner checking WCAG Success Criterion 1.1.1 (Non-text Content) sees alt text present. Conformance is registered. The accessibility metric reports green. The student reading the alt text gets a sentence that misrepresents the figure. The compliance score and the educational outcome have diverged — and only the educational outcome is what disability-rights enforcement reviews in any complaint investigation.
What hallucination looks like in academic content
The pattern is consistent across content types. STEM figures hit it hardest because their information density is highest and their visual conventions are most specialised. The DOJ's April 2026 Interim Final Rule preamble explicitly cites the position that "generative AI...cannot reliably automate the remediation of STEM materials at scale, and human oversight is required to ensure accessibility." The IFR's footnotes go further, citing Northeastern's TEALab finding that image-generation models "do not output alternative (alt) text with their images, rendering them largely inaccessible to screen reader users." Both passages are now in the federal regulatory record.
The cases extend beyond STEM. A historical photograph carrying a caption with information about the depicted figure may be described by a model as "a black-and-white photograph of two people," omitting the historical context the caption supplies. A musical score is described as "sheet music," omitting the key signature, time signature, or notation conventions a music student needs. A graph in an economics textbook is described as "a line chart" without the axis labels that distinguish a Phillips curve from a Laffer curve. In each case, the alt text passes the conformance check but fails the substantive test of conveying what the figure exists to convey.
What the AI tool does not produce — and cannot produce reliably from the figure alone — is content-aware description. A correct alt text for the chemistry figure requires the model to recognise both the molecular structures and the reaction mechanism being illustrated. A correct alt text for the statistical chart requires the model to extract the actual cohort percentages, retention timeline, and any inflection points the chart was published to highlight. The model has no access to the syllabus context that would tell it which features matter. It generates a plausible-sounding sentence, the scanner accepts it, and the failure becomes invisible.
Why this is active harm, not passive failure
The legal frame matters here. A wholly missing alt text is a passive failure: the disabled student receives no information, the institution has not met its obligation, and the remedy is to add an alt text. A hallucinated alt text is a different category. The disabled student is not deprived of information — they are given misleading information they reasonably trust. The non-disabled student looking at the figure receives the figure's actual content. The disabled student receives a sentence that misrepresents that content. The two students are not equally served, and the disability is the operative cause of the differential.
Under Section 504 (29 USC § 794) and ADA Title II, the affirmative obligation is to ensure that a qualified individual with a disability is "afforded the opportunity to acquire the same information, engage in the same interactions, and enjoy the same benefits and services" as a non-disabled peer. The Office for Civil Rights uses essentially this language across its resolution agreements. When an AI tool generates an alt text that misdescribes the figure, the resulting equality-of-information condition is not met — the disabled student has been provided with worse information. The differential is institutional, the cause is the institution's tooling choice, and the harm is concrete and ongoing.
This frame produces a procurement consequence that conformance-score metrics obscure. A vendor demonstrating "automated alt-text generation at 99% scanner conformance" is demonstrating a metric that does not address the substantive question. The substantive question is whether the alt text accurately describes the figure. Scanner-conformance metrics measure presence, not accuracy. A complaint investigation will not ask whether alt text was present. It will ask whether the alt text correctly conveyed what the figure showed.
What procurement should require
The procurement criterion that follows is concrete: any tool whose alt-text generation step is not gated by trained human review should be evaluated on how its outputs perform against actual academic-figure samples, not against the scanner-conformance benchmarks the tool's marketing material reports. A vendor demonstration on stock photography is not evidence about performance on chemistry diagrams. A vendor offering accuracy benchmarks should be asked to disclose the test-set composition; if the test set is dominated by photographs and absent of academic content, the benchmark does not measure the failure mode that matters.
Three contract clauses follow directly. First, an SLA on alt-text accuracy measured against an institutionally-defined sample of academic figures, evaluated by trained reviewers — not against vendor-supplied benchmarks. Second, a requirement that AI-proposed alt text be reviewed by a trained accessibility reviewer before delivery to students; faculty review is insufficient because faculty are not trained to detect plausible-sounding but inaccurate descriptions. Third, indemnification language that addresses AI-introduced errors specifically — not just remediation failures, but errors the AI affirmatively created in attempting to remediate.
The strongest defensible posture against a disability-discrimination claim is not a high conformance score on alt-text presence. It is a documented record of human review attesting to alt-text accuracy. Tools that produce that record survive an investigation. Tools that produce a conformance score in lieu of that record will not, regardless of the percentage. For the broader procurement framework these clauses fit into, see the defensibility standard.

Aelira Team
•Accessibility EngineersThe Aelira team is building AI-powered accessibility tools for higher education. We're on a mission to help universities meet WCAG 2.1 compliance before the DOJ ADA Title II deadline (April 26, 2027 for large public entities).
Related Articles
What Recent OCR Resolution Agreements Reveal About How Regulators Define 'Compliance'
A pattern across recent OCR settlements: regulators care more about documented process than automated scores. What that means for university accessibility programs.
Why Your LMS Accessibility Checker Is Not Enough
Canvas, Blackboard, and Brightspace all have built-in accessibility checkers. Here is what they catch, what they miss, and why you need more.
The Real Cost of Manual Accessibility Remediation: Australian University Edition
Manual remediation for a mid-size Australian university costs $1.5-2M AUD. Here's the complete cost breakdown—and why automation is the only viable path.
Ready to achieve accessibility compliance?
Join the pilot program for early access to Aelira's AI-powered accessibility platform
Apply for Pilot