Compliance11 min read

The Defensibility Standard: Why Human-in-the-Loop AI Is the Only Compliant Model for University Accessibility Programs

After the April 2026 IFR, university accessibility compliance is judged on documented process, not scanner scores. Here is what that means for procurement.

RD (Reg) Crampton•May 10, 2026

Share:LinkedIn X

§i — The post-IFR regulatory landscape: what defensibility means now

For the procurement officers, general counsel, and accessibility-program risk officers responsible for choosing how their universities will meet ADA Title II web-accessibility obligations, the Department of Justice's April 2026 Interim Final Rule (RIN 1190-AA82) introduces a question that none of the prior procurement playbooks address. The deadlines moved — large public entities now have until April 26, 2027, smaller entities until April 26, 2028 — but the technical standard, WCAG 2.1 Level AA, did not. What changed is more important than what stayed the same: the IFR's preamble describes, in DOJ's own words, why a year of additional time was necessary, and the answer it gives forecloses an entire category of compliance strategy.

The Department of Justice wrote that "advanced technology, such as generative AI, does not yet reliably automate the remediation of inaccessible content at scale, and staff resources and availability continue to pose significant challenges." Discussing science and engineering content in particular, the IFR's preamble cites the position that "generative AI...cannot reliably automate the remediation of STEM materials at scale, and human oversight is required to ensure accessibility." This is not a vendor's marketing assertion. It is the federal regulator's own characterisation, written into the record that will be available to any court, any Office for Civil Rights investigator, and any complainant from April 20, 2026 forward.

The corollary is that the substantive obligations of the 2024 rule — non-discrimination, equal access, ongoing remediation of barriers as they are identified — are unchanged. The IFR moved compliance dates only. A complaint filed against a covered institution on May 15, 2026 is evaluated against the same WCAG 2.1 AA conformance standard that applied on April 23, 2026; what shifts is the latest date by which the programme of remediation must reach a defined level of completeness. Defensibility, in that sense, is now a continuous condition rather than a deadline-day target. An institution either has a documented programme that is producing remediated content in a way the regulator recognises, or it does not.

The procurement consequence follows directly. Any compliance strategy that depends on a tool processing inaccessible content without human review is, as of the date of the IFR, a strategy DOJ has already labelled unreliable. Any university selecting such a tool today does so against the grain of the regulator's stated position. That is the new procurement problem: defensibility no longer means "we bought a scanner and remediated to a high score." It means "we adopted a process the federal record recognises as adequate, and we can produce the documentation to prove it."

The original 2024 Final Rule (28 CFR Part 35, Subpart H) sets the technical standard but speaks little to method. The IFR fills that gap by acknowledging what does not work. The remainder of this article maps the strategy that does: a model in which AI handles the volume that a human reviewer cannot, a human handles the judgment that AI cannot, and the system records the trail of decisions that an investigation requires. It is not a marketing claim. It is the only operating model whose components are individually defensible against the regulatory record now in front of every general counsel.

§ii — Why AI-only remediation fails

The strongest argument against AI-only accessibility tooling is not that AI scores poorly on benchmarks. It is that AI tools fail in a different way than the testing regime is built to detect. Consider the three failure modes that recur across deployment data and that the IFR's preamble obliquely confirms.

The first is hallucinated alt text on academic content. A general-purpose vision model trained on photographs and stock illustrations encounters an organic-chemistry mechanism diagram, a phylogenetic tree, or a force-vector diagram and produces alt text that is fluent, grammatical, and wrong. "A diagram showing molecular structures" is not equivalent to "Markovnikov addition of HBr to 2-methylpropene, with the carbocation intermediate labelled." A scanner checking WCAG SC 1.1.1 (Non-text Content) sees alt text present and registers conformance. The disabled student receives a sentence that misrepresents the figure. The compliance score and the educational outcome diverge — and only the educational outcome is what disability-rights litigation reviews. The IFR's footnote citing Northeastern's TEALab observation that image-generation models "do not output alternative (alt) text with their images, rendering them largely inaccessible to screen reader users" is the regulator confirming that this gap is in the official record.

The second is broken table semantics. An AI remediator working from inaccessible source documents — a scanned-and-OCRed PDF, a Word file with manually typed pseudo-tables built from spaces, an exported HTML page where the visual layout is achieved with nested divs — has to infer table structure from layout. It guesses at header rows, at scope, at row-and-column spans. When the guess is wrong, it produces a table that conforms to SC 1.3.1 (Info and Relationships) at the markup level but misrepresents the relationship between columns of data. A faculty member reviewing the remediated document sees plausible structure and approves. A screen-reader user navigating row-by-row hears the wrong header announced for each cell.

The third — and the one most damaging in regulatory review — is the false-positive fix. An AI tool, trained to maximise its scanner score, may add an ARIA role that conflicts with the underlying element, or generate a heading hierarchy that satisfies SC 1.3.1 superficially while misrepresenting the document's true structure. The remediation introduces a new accessibility issue, but does so beneath the threshold of the same automated scanner that judged the document compliant. There are now two failures: the original, which the scanner did not catch because it was content-specific; and the introduced one, which the scanner does not flag because it created the conformance signal itself.

Across all three patterns, the through-line is consistency: AI-only tooling produces failures that automated testing cannot detect, in a regulatory environment that increasingly evaluates the outcome a disabled student experiences rather than the conformance score the institution reports. The DOJ's preamble language about AI's unreliability is not a soft observation — it is the predicate for procurement decisions that survive an investigation.

Compounding all three is the human factor: faculty members reviewing AI-generated remediation, asked to approve a fluent-sounding alt text or a structurally plausible table, lack the accessibility training to distinguish a correct fix from a confident-sounding wrong one. A workflow that places faculty as the only human checkpoint after AI generation is a workflow whose review step adds documentation overhead without adding accuracy. The procurement criterion that follows is concrete: any tool whose human-review step is performed by content authors rather than trained reviewers does not produce the evidentiary record OCR is asking for.

§iii — Why manual-only remediation fails

The mirror argument also has to be made. If AI-only tooling fails on accuracy grounds, manual-only remediation fails on arithmetic ones. Three numbers determine whether manual-only is a viable strategy for a university approaching its 2027 or 2028 deadline.

The first is archive size. A mid-size US university LMS holds, conservatively, 50,000 to 200,000 distinct documents — course readings, lecture slides, assessments, archived past-semester materials. The number is not theoretical; institutional data from any large Canvas or Brightspace tenant produces a count in this range without including faculty-uploaded media. The second is per-page remediation cost: published industry surveys put expert-reviewer remediation between $35 and $150 per page, depending on document complexity. A 30-page scanned PDF requiring full re-tagging is not at the bottom of that range. The third is reviewer throughput: a trained accessibility specialist can fully remediate a complex academic document in roughly half a day to a day, or three to six in a standard work-week.

The deadline math follows. A 100,000-document archive at a mid-range $80 per page and an average 15 pages per document is a $120 million budget — a number that exceeds the annual operating budget of most US university accessibility programmes by an order of magnitude. At the throughput end, four reviewers working steadily produce roughly 1,000 fully remediated documents per year — a 100-year backlog. Scaling up reviewers helps linearly, but linearly is the wrong answer to a problem whose archive size grows with each new semester. Even discounting heroically — assume the archive is half that size, the per-page cost half that figure, and a team of forty reviewers funded at full burdened cost — the math still does not close before April 2028 for any institution that started its programme in earnest after the IFR. And the institution that succeeds in 2028 by force of expenditure has no answer for 2029, when the next semester's content has arrived and the same throughput problem repeats.

The IFR did not extend deadlines because regulators expected universities would use the year for slower manual remediation. The preamble explicitly cites resource constraints and staffing limitations as reasons for the extension. The plain reading: the regulator believes there is no manual-only path that meets the standard, even with the additional year. Universities that respond to the IFR by hiring a few more remediators and continuing as before are pursuing a strategy the federal record has already characterised as inadequate to the task.

The procurement implication is symmetrical to the AI-only one. Manual-only is not just slow; it is non-completing. A budget that funds linear human review against a non-linear archive size produces a programme whose end state, when reached, is "we ran out of time" rather than "we reached compliance." That outcome is no more defensible to OCR than the AI-only one. The remaining question is what configuration of AI and human review produces the throughput math of the first and the accuracy of the second.

§iv — What human-in-the-loop means technically

"Human-in-the-loop" is a phrase the disability-tech vendor market has overloaded. To be useful in a procurement context, it has to be defined more narrowly than the marketing usage. A defensible human-in-the-loop programme has four mandatory components — and it is the combination of all four, not any one in isolation, that produces the documented process OCR investigators look for.

The first is a per-document review queue with explicit reviewer actions. AI proposes a fix; a named human reviewer accepts, rejects, or overrides it. "Overrides" is not a placeholder for "edits the alt text" — it is a discrete recorded action, with the original AI suggestion, the reviewer's substitute, and the reason captured. A queue without rejections, or with a rejection rate close to zero, is evidence of a rubber-stamp process. The queue's data shape is what makes it auditable.

The second is a per-document audit trail. The Office for Civil Rights resolution agreement with Ohio State University (OCR Docket No. 15-16-2108) requires that "all problems identified through the Audit will be documented, evaluated, and, if necessary, remediated within a reasonable period of time." That language — documented, evaluated, remediated — appears in different forms across recent OCR resolution agreements. The minimum the trail must record is the document identifier, the issues flagged at scan time, the fix proposed, the reviewer name, the review timestamp, the action taken, and the retention period. A tool that cannot export this record in a format procurement can hand to outside counsel produces compliance theatre, not compliance.

The third is fix verification. After the human-reviewed fix is applied, the document is re-scanned to confirm that the issue is closed and that no new issue was introduced. Without this loop, the false-positive failure mode from §ii is invisible. With it, every fix carries its own conformance evidence.

The fourth is traceable retention. FERPA-protected and accreditor-relevant materials carry retention obligations of their own; the audit trail must respect those rather than override them. A vendor that retains audit logs for 30 days and then deletes is unable to support a complaint investigation that arrives a year after the original review — a routine timing. The minimum retention period that survives realistic investigation timelines is the longer of five years or the institution's own records-retention policy for the underlying course materials, whichever is greater. Vendors that decline to commit to this retention floor at contract time are vendors whose product cannot serve the institutional purpose the contract is supposed to address.

The pattern across recent OCR higher-ed resolution agreements is consistent: agreements with Brown, SUNY Albany, Fairleigh Dickinson, Ohio State, and Framingham State all require a designated coordinator, a written policy, audits whose findings are recorded, third-party flow-down, training, and reporting. The word score does not appear in any of them. Every recorded compliance signal is process- and documentation-shaped, not metric-shaped. Human-in-the-loop is the architecture that produces those signals.

§v — What procurement and general counsel should require

The procurement consequence of the regulatory record is that the contract clauses universities sign with their accessibility-AI vendors should look different from those signed even eighteen months ago. Six requirements, derived directly from the IFR preamble and the OCR resolution-agreement pattern, distinguish vendors selling defensibility from vendors selling scores.

First, an SLA on human-review accuracy — not on scanner-score uplift. The vendor commits, in writing, to an accuracy floor on AI-proposed fixes that are accepted in human review without modification, measured against a defined sample. Scanner-score SLAs reward gaming the scanner. Reviewer-acceptance SLAs reward producing fixes that survive a domain expert's eye.

Second, an audit-trail export specification. The contract names the data fields the institution can export, the format (machine-readable, not PDF), and the retention window the vendor is obliged to maintain. The trail is the institution's evidence in any future investigation; an export the vendor controls is not the institution's evidence.

Third, an inference-location disclosure. Many AI accessibility tools route document content through third-party inference endpoints whose data residency and logging the contract does not specify. For materials that fall under FERPA's "education records" definition (20 USC § 1232g; 34 CFR Part 99), this gap is a procurement defect. The contract should require disclosure of where inference happens, what is logged at the endpoint, and whether the endpoint is covered by the institution's data-protection agreement. Vendors that subprocess inference through a foundation-model provider are passing institutional content to a fourth party whose retention policy the institution has not negotiated. Self-hosted inference — running the model on the institution's own infrastructure or a vendor environment that the data-protection agreement covers in full — eliminates the question. The same point applies to alt-text generation specifically: a vision model that processes an exam-paper figure should not be sending that figure outside the institutional perimeter under any circumstance.

Fourth, indemnification language for AI-introduced errors. Vendors selling AI-assisted remediation should accept liability for accessibility issues their AI introduces, as distinct from those it failed to fix. This clause separates vendors that have confidence in their accuracy from vendors that do not.

Fifth, exit clauses including full data export. The audit trail outlasts the vendor relationship. Contract-end provisions should require the vendor to deliver, within a defined window, every per-document record in a format the institution can ingest into a successor system. A vendor whose exit clause does not address audit-trail portability has built a switching-cost moat at the institution's expense.

Sixth, model-change notification. The accuracy of AI-proposed fixes is a function of the underlying model. When the vendor swaps models — common as foundation-model providers update their offerings — the institution learns, with enough notice to re-baseline its review-acceptance metrics. A model change that arrives in production without notification breaks the SLA's evidentiary basis.

These six are the minimum the regulatory record now supports. They map directly to the gaps the IFR's preamble identifies, and to the documentation and process requirements OCR has built into every recent higher-ed resolution agreement. A vendor whose contract refuses any of them is selling a product that will not survive a complaint investigation, regardless of the scanner score the product produces.

Aelira was built around exactly this configuration: AI-proposed fixes that a human reviewer accepts, rejects, or overrides; a per-document audit trail exportable in machine-readable form; self-hosted inference that keeps FERPA-covered content under the institution's data-protection agreement; and the procurement-clause language to back each of those commitments.

If you are a procurement officer, general counsel, or accessibility-programme director evaluating accessibility-AI vendors under the new compliance dates and want a starting-point contract template that bakes in these six requirements, you can reach the team. The template is shareable inside your institution's procurement workflow, revisable for your standards, and carries no obligation to proceed to a vendor evaluation.

RD (Reg) Crampton

•Founder & CEO

Founder, CEO & lead developer of Aelira. Passionate about making education accessible to everyone. Building the tools universities need to meet accessibility compliance.

Compliance•6 min read

What Procurement Should Demand in an Accessibility AI Contract: A Due-Diligence Checklist

A line-by-line checklist for procurement and general counsel evaluating AI-assisted accessibility vendors. Eleven contract requirements grounded in the IFR and recent OCR settlements.

Compliance•6 min read

FERPA, Cloud Inference, and the Data-Sovereignty Question Procurement Should Be Asking

Most AI accessibility vendors run inference in the cloud. If a university's student-content data flow passes through that endpoint, the institutional DPA had better cover it. Most do not.

Compliance•6 min read

Why Compliance Scores Aren't Defensible: The Case for Human-Review Audit Trails

Scanner scores measure conformance under ideal conditions. What regulators actually want to see is a per-document record of human review. Here is what that record needs to contain.

Ready to achieve accessibility compliance?

Join the pilot program for early access to Aelira's AI-powered accessibility platform

§i — The post-IFR regulatory landscape: what defensibility means now

§ii — Why AI-only remediation fails

§iii — Why manual-only remediation fails

§iv — What human-in-the-loop means technically

§v — What procurement and general counsel should require

RD (Reg) Crampton

Related Articles

What Procurement Should Demand in an Accessibility AI Contract: A Due-Diligence Checklist

FERPA, Cloud Inference, and the Data-Sovereignty Question Procurement Should Be Asking

Why Compliance Scores Aren't Defensible: The Case for Human-Review Audit Trails

Ready to achieve accessibility compliance?