Why Auto-Captions Aren't WCAG Compliant (And What to Do Instead)

"We've been told auto captions aren't acceptable."

This quote from a frustrated faculty member on Reddit captures a growing realization across higher education: the auto-captions your LMS generates aren't enough for WCAG 2.1 compliance.

But why? And what does "acceptable" actually mean?

What WCAG 2.1 Actually Requires for Captions

WCAG 2.1 Success Criterion 1.2.2 (Captions - Prerecorded) requires:

"Captions are provided for all prerecorded audio content in synchronized media."

Sounds simple. But the key word is "captions"—not "auto-generated text that approximates speech."

The WCAG Definition of Captions

According to WCAG, captions must:

Be synchronized with the audio content
Include speaker identification when multiple speakers are present
Describe relevant sound effects (e.g., [applause], [phone ringing])
Be accurate enough to convey the same information as the audio

Auto-captions fail on multiple counts.

Why Auto-Captions Fail WCAG Compliance

1. Accuracy Rates Are Unacceptable

Industry studies show auto-caption accuracy rates:

| Platform | Average Accuracy | WCAG-Acceptable? | |----------|------------------|------------------| | YouTube Auto-Captions | 70-80% | No | | Zoom Auto-Transcription | 75-85% | No | | Teams Live Captions | 70-80% | No | | Panopto Auto-Captions | 75-85% | No |

WCAG doesn't specify an accuracy percentage, but accessibility experts generally agree that 95%+ accuracy is the minimum for meaningful access.

A 75% accuracy rate means 1 in 4 words is wrong. For a 50-minute lecture, that's approximately 1,500 errors.

2. Technical Terms Are Massacred

Auto-captions struggle with:

Discipline-specific vocabulary: "mitochondria" becomes "my toe Andrea"
Names and proper nouns: "Professor Chakrabarti" becomes "Professor chalk robbery"
Acronyms: "WCAG" becomes "W cag" or "double you cag"
Mathematical expressions: "x squared" becomes "ex squared" or worse
Foreign language terms: Completely garbled

For STEM courses, the problem is catastrophic. A chemistry lecture where "benzene ring" becomes "ben's green ring" is useless to a deaf student.

3. No Speaker Identification

When multiple people speak in a lecture:

What auto-captions show: \\\ So the answer to question three is... Actually I disagree with that interpretation... Can you explain why? \\\

What WCAG-compliant captions show: \\\ [Professor Smith] So the answer to question three is... [Student] Actually I disagree with that interpretation... [Professor Smith] Can you explain why? \\\

Without speaker identification, deaf students can't follow discussions, Q&A sessions, or debates.

4. No Sound Effect Descriptions

Auto-captions only transcribe speech. They miss:

[Video clip plays]
[Laughter]
[Alarm sound indicating time's up]
[Background music during demonstration]
[Phone notification sound - professor checks phone]

These audio cues provide context that deaf students need.

5. Timing and Synchronization Issues

Auto-captions often:

Lag behind the speaker by 2-5 seconds
Bunch multiple sentences into single caption blocks
Flash too quickly to read (under 1 second display time)
Overlap with visual content on screen

WCAG requires captions to be synchronized so users can follow along in real-time.

The Legal Reality

What Courts and Regulators Say

US (DOJ/OCR): The Department of Justice has consistently held that auto-captions alone don't satisfy ADA requirements. OCR resolution agreements specifically require "accurate captions" and have rejected auto-caption defenses.

Australia (AHRC): The Australian Human Rights Commission has found auto-captions insufficient under the DDA 1992. Multiple complaints have been upheld against universities relying solely on auto-generated transcripts.

UK (EHRC): The Equality and Human Rights Commission guidance states that captions must be "accurate and properly synchronized" to meet the Equality Act 2010.

Real Enforcement Examples

Example 1: US Community College (2024)

Issue: Relied on Zoom auto-captions for all online courses
OCR finding: Auto-captions "do not provide equal access"
Resolution: Re-caption 3,000+ hours of content
Cost: $1.2M

Example 2: Australian University (2023)

Issue: Used Panopto auto-captions for lecture recordings
AHRC finding: Auto-captions "materially inaccurate" for STEM content
Resolution: Manual captioning requirement + compensation
Cost: $380K AUD

What "Acceptable" Captions Actually Look Like

The 99% Accuracy Standard

While WCAG doesn't specify a percentage, the industry standard for professional captioning is 99% accuracy. This means:

For a 50-minute lecture (~7,500 words): Maximum 75 errors
Technical terms spelled correctly
Proper nouns capitalized
Speaker identification included
Sound effects described

Caption Quality Checklist

✅ Accuracy: 99%+ verbatim transcription ✅ Synchronization: Captions appear within 0.5 seconds of speech ✅ Readability: 1-3 lines, 32 characters max per line ✅ Duration: Each caption visible for 1-6 seconds ✅ Speaker ID: [Professor], [Student], [Guest Speaker] ✅ Sound effects: [applause], [video plays], [inaudible] ✅ Technical terms: Verified spelling for discipline vocabulary

The Solutions: How to Get Compliant Captions

Option 1: Professional Captioning Services

Cost: $1-3 per minute of video Accuracy: 99%+ Turnaround: 24-72 hours

Best for: High-stakes content, recorded lectures used across multiple semesters

Providers:

Rev.com ($1.50/min)
3Play Media ($2-3/min)
Verbit ($1-2/min)

Option 2: AI-Enhanced Captioning (Human Review)

Cost: $0.50-1 per minute Accuracy: 95-99% Turnaround: 2-24 hours

How it works:

AI generates initial transcript
Human editor corrects errors
Final review for technical terms

Best for: High volume, moderate budget

This is Aelira's approach: We use Whisper AI for initial transcription, then apply AI-powered cleanup that catches technical terms, adds speaker identification, and fixes timing issues.

Option 3: Faculty-Edited Captions

Cost: Faculty time only Accuracy: 90-99% (varies by effort) Turnaround: 1-4 hours per lecture

How it works:

Generate auto-captions
Faculty edits in LMS (Canvas, Blackboard)
Fix technical terms, add speaker IDs

Best for: Small volume, budget-constrained departments

Problem: Faculty rarely have time, and editing captions is tedious.

Option 4: Live Captioning (CART)

Cost: $100-200 per hour Accuracy: 98-99% Use case: Live lectures, real-time events

Best for: Synchronous classes with deaf students enrolled

The Aelira Approach: AI + Quality Assurance

Aelira's video accessibility tool takes a middle path:

Whisper AI transcription (base layer, 85-90% accurate)
Domain-specific vocabulary correction (STEM, medical, legal term libraries)
Speaker diarization (automatic speaker identification)
Sound effect detection (AI identifies non-speech audio)
Human review queue (flag low-confidence segments)
Export to VTT/SRT (standard caption formats)

Result: 95-98% accuracy at $0.25-0.50 per minute, with flagged segments for optional human review.

Action Plan for Universities

Immediate (This Week)

Audit your current state: Pick 10 random lecture videos, check caption accuracy
Identify high-risk content: STEM courses, high-enrollment courses
Stop claiming auto-captions are compliant: Update any policies that suggest otherwise

Short-Term (This Month)

Prioritize: Caption highest-enrollment courses first
Choose a solution: Professional captioning, AI-enhanced, or hybrid
Train faculty: How to edit captions in your LMS

Ongoing

New content: Require caption review before publishing
Quality checks: Random audits of caption accuracy
Student feedback: Make it easy to report caption errors

The Bottom Line

Auto-captions are a starting point, not a solution.

They're useful for:

Live caption approximation during synchronous lectures
Initial transcript for editing
Informal, low-stakes content

They're not acceptable for:

Recorded lectures (WCAG 1.2.2 compliance)
Assessment content (equal access to instructions)
Any content where accuracy matters

The good news: Getting to compliant captions doesn't require manual transcription of every video. AI-enhanced solutions like Aelira can get you to 95%+ accuracy at a fraction of the cost of professional services.

The deadline is real. The standard is clear. Auto-captions aren't enough.

Learn how Aelira handles video accessibility or join the pilot program.