Why Compliance Scores Aren't Defensible: The Case for Human-Review Audit Trails
Scanner scores measure conformance under ideal conditions. What regulators actually want to see is a per-document record of human review. Here is what that record needs to contain.
For accessibility-programme directors and risk officers preparing institutional documentation against the IFR-extended ADA Title II compliance dates, the most consequential procurement decision is rarely about which scanner to standardise on. It is about which records the institution will be able to produce when an investigator asks for them. Scanner-conformance scores and human-review audit trails are not the same artefact, and only one of them is what survives a complaint investigation. The defensibility framework these records anchor is set out in the defensibility standard; this article narrows in on the audit trail itself — its data shape, its evidentiary purpose, and the gameability problem that makes scanner scores an inadequate substitute.
The dichotomy is simple. A scanner-conformance score is a metric: a percentage or a count derived by running automated rules against content. A human-review audit trail is a per-document record: who reviewed what, when, with what action, and against what underlying issue. The metric reports a current state; the trail records a process. The Office for Civil Rights resolution agreements analysed in the OCR-pattern article are unanimous on which artefact regulators want, and the answer is the trail.
How scanner scores are gameable
Automated accessibility scanners detect approximately 30 to 40 per cent of WCAG conformance failures — the technical, structural issues their rule sets are written to catch. The remaining 60 to 70 per cent require human judgment: alt-text accuracy on academic figures, semantic correctness of heading hierarchies in complex layouts, the quality of caption synchronisation in lecture videos, the discriminability of decorative-versus-meaningful images. A 100 per cent scanner-conformance score therefore tells the institution that one third of the WCAG surface area passes; it says nothing about the other two thirds.
This statistical reality interacts badly with how scanners are typically deployed. A vendor optimising its tool to maximise the score it reports will detect the rules its scanner checks and apply automated fixes that satisfy those rules — without addressing the issues outside the rule set. An institution shopping by score is therefore optimising for the variable that is most easily inflated and least correlated with substantive accessibility. The scanner reports green; the underlying content fails for screen-reader users in ways the scanner does not measure.
Worse, AI-generated remediation can introduce new accessibility issues that the same scanner does not detect — improper ARIA roles applied to elements that already had implicit semantics, heading hierarchies that satisfy structural rules superficially while misrepresenting document organisation, alt-text that conforms to SC 1.1.1 by being present while misdescribing the figure (the failure mode covered in hallucinated alt text as a Section 504 risk). The scanner that judged the original content compliant cannot see the fault its own remediation introduced. The score remains green; the actual outcome for disabled users degrades.
The gameability problem is not a vendor moral failing. It is a structural property of metrics whose definition is controlled by the entity being measured. The remedy is not a better scanner. The remedy is to evaluate against artefacts the institution cannot manipulate after the fact — which is what the audit trail is.
What an OCR investigator actually requests
OCR investigations of higher-ed accessibility complaints follow a documented pattern across the resolution agreements on file. The investigator does not ask for a scanner-conformance percentage. The request is for the institution's documented process: the policy adopted, the coordinator named, the audit cadence run, the findings recorded, and the remediation applied. The Ohio State agreement (OCR Docket No. 15-16-2108) captures the language with unusual clarity: "all problems identified through the Audit will be documented, evaluated, and, if necessary, remediated within a reasonable period of time." The compliance signal is the existence of that document trail.
What an investigator looking at a particular complaint requests is even more granular: the per-document history of the content the complainant identified. Which scan flagged the issue. Which reviewer evaluated the proposed fix. When the action was taken. What the fix was. Whether the document was re-scanned afterward and the verification result. An institution that cannot produce these per-document records is producing evidence of absence — the absence of a process for handling identified issues, regardless of how high its conformance score reads.
For programmes built on tools that emit per-scan PDFs but do not retain the underlying review actions in queryable form, this evidentiary requirement is the operational gap. The PDFs may attest to a state; they do not attest to the process by which the state was reached.
The minimum viable audit trail
A defensible per-document record contains, at minimum, eight fields. Each maps to a question an investigator asks. Each is querying — and exporting — an institution should be able to do without engaging the vendor.
First, the document identifier. A stable, institution-controlled identifier (typically scan ID or LMS document ID) that ties the trail row to the underlying content. Without it the trail is unjoinable to the content the complaint references.
Second, the scan timestamp. When the issue was identified — establishing the institution's awareness date and the start of the "reasonable period of time" referenced in the resolution-agreement language.
Third, the issues flagged at scan time. Which WCAG Success Criteria the scanner identified failures against, and at what severity. This anchors the per-fix actions to the specific problems they address.
Fourth, the fix proposed. The remediation the AI tool (or human reviewer) generated. Including this enables retrospective evaluation of fix quality, which is the basis for the alt-text-accuracy SLA discussed in hallucinated alt text as a Section 504 risk.
Fifth, the reviewer name. A specific human, not "system" — someone who can attest to the review under their own name. Programmes whose audit rows are predominantly attributed to "system" or "auto-approved" are programmes whose human-review step is not actually present.
Sixth, the review timestamp. When the review action occurred. Investigations rely on this to evaluate whether the institution's response time satisfies "reasonable period of time."
Seventh, the action taken. Accept, reject, override-with-substitute, escalate, or defer. Each is a discrete recorded action. Programmes whose action distribution is overwhelmingly "accept" without a meaningful rejection rate produce evidence of a rubber-stamp process rather than substantive review.
Eighth, the fix verification. After the action, was the document re-scanned and did the original issue resolve cleanly without introducing new ones? This loop closes the gameability problem from the earlier section: an audit trail that records the verification step makes false-positive fixes visible.
A ninth field — retention metadata — sits at the trail-as-a-whole level rather than per row. The institution's records-retention policy, the relevant FERPA retention obligations (20 USC § 1232g; 34 CFR Part 99) where student work is implicated, and any accreditor-specific timelines all bear on how long the trail itself must be preserved. A trail retained for 30 days does not survive an investigation that arrives a year after the original review — a routine timing.
Talking about audit trails in a procurement RFP
The procurement implication is concrete. RFP language for any accessibility-AI vendor should include a non-negotiable clause requiring per-document audit-trail export in a machine-readable format (JSON or CSV — not PDF, which is for the human consumer rather than the regulator), covering the eight fields above plus retention adequate to outlast the longest realistic investigation timeline. Vendors who refuse this clause are vendors whose product cannot serve the procurement purpose, regardless of marketing claims about conformance score uplift. Vendors who accept it are vendors whose product is shaped by the same evidentiary requirements OCR has built into the resolution-agreement record.
For the broader contract-clause framework these requirements fit into, see the defensibility standard's procurement section. The audit-trail clause is the load-bearing one — the others build on the assumption that the trail exists and is exportable on demand.

Aelira Team
•Accessibility EngineersThe Aelira team is building AI-powered accessibility tools for higher education. We're on a mission to help universities meet WCAG 2.1 compliance before the DOJ ADA Title II deadline (April 26, 2027 for large public entities).
Related Articles
The 2027/2028 Math: Why Manual-Only Remediation Cannot Meet the Extended Deadline
The April 2026 IFR gave universities an extra year. The math says it does not help — manual-only remediation cannot meet 2027/2028 at any institution with a serious archive.
Reading the April 2026 IFR: What the DOJ Said About AI-Assisted Compliance (and What It Didn't)
The DOJ's April 2026 IFR extended the ADA Title II deadlines but tightened the standard for what counts as compliance. A close reading for university counsel.
The Defensibility Standard: Why Human-in-the-Loop AI Is the Only Compliant Model for University Accessibility Programs
After the April 2026 IFR, university accessibility compliance is judged on documented process, not scanner scores. Here is what that means for procurement.
Ready to achieve accessibility compliance?
Join the pilot program for early access to Aelira's AI-powered accessibility platform
Apply for Pilot