Why Auto-Captions Aren't WCAG Compliant (And What to Do Instead)
Your university's auto-generated captions might feel like a solution, but they don't meet WCAG 2.1 standards. Here's what compliance actually requires.
"We've been told auto captions aren't acceptable."
This quote from a frustrated faculty member on Reddit captures a growing realization across higher education: the auto-captions your LMS generates aren't enough for WCAG 2.1 compliance.
But why? And what does "acceptable" actually mean?
What WCAG 2.1 Actually Requires for Captions
WCAG 2.1 Success Criterion 1.2.2 (Captions - Prerecorded) requires:
"Captions are provided for all prerecorded audio content in synchronized media."
Sounds simple. But the key word is "captions"—not "auto-generated text that approximates speech."
The WCAG Definition of Captions
According to WCAG, captions must:
- Be synchronized with the audio content
- Include speaker identification when multiple speakers are present
- Describe relevant sound effects (e.g., [applause], [phone ringing])
- Be accurate enough to convey the same information as the audio
Auto-captions fail on multiple counts.
Why Auto-Captions Fail WCAG Compliance
1. Accuracy Rates Are Unacceptable
Industry studies show auto-caption accuracy rates:
| Platform | Average Accuracy | WCAG-Acceptable? |
|---|---|---|
| YouTube Auto-Captions | 70-80% | No |
| Zoom Auto-Transcription | 75-85% | No |
| Teams Live Captions | 70-80% | No |
| Panopto Auto-Captions | 75-85% | No |
WCAG doesn't specify an accuracy percentage, but accessibility experts generally agree that 95%+ accuracy is the minimum for meaningful access.
A 75% accuracy rate means 1 in 4 words is wrong. For a 50-minute lecture, that's approximately 1,500 errors.
2. Technical Terms Are Massacred
Auto-captions struggle with:
- Discipline-specific vocabulary: "mitochondria" becomes "my toe Andrea"
- Names and proper nouns: "Professor Chakrabarti" becomes "Professor chalk robbery"
- Acronyms: "WCAG" becomes "W cag" or "double you cag"
- Mathematical expressions: "x squared" becomes "ex squared" or worse
- Foreign language terms: Completely garbled
For STEM courses, the problem is catastrophic. A chemistry lecture where "benzene ring" becomes "ben's green ring" is useless to a deaf student.
3. No Speaker Identification
When multiple people speak in a lecture:
What auto-captions show:
\\\`
So the answer to question three is...
Actually I disagree with that interpretation...
Can you explain why?
\\\`
What WCAG-compliant captions show:
\\\`
[Professor Smith] So the answer to question three is...
[Student] Actually I disagree with that interpretation...
[Professor Smith] Can you explain why?
\\\`
Without speaker identification, deaf students can't follow discussions, Q&A sessions, or debates.
4. No Sound Effect Descriptions
Auto-captions only transcribe speech. They miss:
- [Video clip plays]
- [Laughter]
- [Alarm sound indicating time's up]
- [Background music during demonstration]
- [Phone notification sound - professor checks phone]
These audio cues provide context that deaf students need.
5. Timing and Synchronization Issues
Auto-captions often:
- Lag behind the speaker by 2-5 seconds
- Bunch multiple sentences into single caption blocks
- Flash too quickly to read (under 1 second display time)
- Overlap with visual content on screen
WCAG requires captions to be synchronized so users can follow along in real-time.
The Legal Reality
What Courts and Regulators Say
US (DOJ/OCR):
The Department of Justice has consistently held that auto-captions alone don't satisfy ADA requirements. OCR resolution agreements specifically require "accurate captions" and have rejected auto-caption defenses.
Australia (AHRC):
The Australian Human Rights Commission has found auto-captions insufficient under the DDA 1992. Multiple complaints have been upheld against universities relying solely on auto-generated transcripts.
UK (EHRC):
The Equality and Human Rights Commission guidance states that captions must be "accurate and properly synchronized" to meet the Equality Act 2010.
Real Enforcement Examples
Example 1: US Community College (2024)
- Issue: Relied on Zoom auto-captions for all online courses
- OCR finding: Auto-captions "do not provide equal access"
- Resolution: Re-caption 3,000+ hours of content
- Cost: $1.2M
Example 2: Australian University (2023)
- Issue: Used Panopto auto-captions for lecture recordings
- AHRC finding: Auto-captions "materially inaccurate" for STEM content
- Resolution: Manual captioning requirement + compensation
- Cost: $380K AUD
What "Acceptable" Captions Actually Look Like
The 99% Accuracy Standard
While WCAG doesn't specify a percentage, the industry standard for professional captioning is 99% accuracy. This means:
- For a 50-minute lecture (~7,500 words): Maximum 75 errors
- Technical terms spelled correctly
- Proper nouns capitalized
- Speaker identification included
- Sound effects described
Caption Quality Checklist
✅ Accuracy: 99%+ verbatim transcription
✅ Synchronization: Captions appear within 0.5 seconds of speech
✅ Readability: 1-3 lines, 32 characters max per line
✅ Duration: Each caption visible for 1-6 seconds
✅ Speaker ID: [Professor], [Student], [Guest Speaker]
✅ Sound effects: [applause], [video plays], [inaudible]
✅ Technical terms: Verified spelling for discipline vocabulary
The Solutions: How to Get Compliant Captions
Option 1: Professional Captioning Services
Cost: $1-3 per minute of video
Accuracy: 99%+
Turnaround: 24-72 hours
Best for: High-stakes content, recorded lectures used across multiple semesters
Providers:
- Rev.com ($1.50/min)
- 3Play Media ($2-3/min)
- Verbit ($1-2/min)
Option 2: AI-Enhanced Captioning (Human Review)
Cost: $0.50-1 per minute
Accuracy: 95-99%
Turnaround: 2-24 hours
How it works:
- AI generates initial transcript
- Human editor corrects errors
- Final review for technical terms
Best for: High volume, moderate budget
This is Aelira's approach: We use Whisper AI for initial transcription, then apply AI-powered cleanup that catches technical terms, adds speaker identification, and fixes timing issues.
Option 3: Faculty-Edited Captions
Cost: Faculty time only
Accuracy: 90-99% (varies by effort)
Turnaround: 1-4 hours per lecture
How it works:
- Generate auto-captions
- Faculty edits in LMS (Canvas, Blackboard)
- Fix technical terms, add speaker IDs
Best for: Small volume, budget-constrained departments
Problem: Faculty rarely have time, and editing captions is tedious.
Option 4: Live Captioning (CART)
Cost: $100-200 per hour
Accuracy: 98-99%
Use case: Live lectures, real-time events
Best for: Synchronous classes with deaf students enrolled
The Aelira Approach: AI + Quality Assurance
Aelira's video accessibility tool takes a middle path:
- Whisper AI transcription (base layer, 85-90% accurate)
- Domain-specific vocabulary correction (STEM, medical, legal term libraries)
- Speaker diarization (automatic speaker identification)
- Sound effect detection (AI identifies non-speech audio)
- Human review queue (flag low-confidence segments)
- Export to VTT/SRT (standard caption formats)
Result: 95-98% accuracy at $0.25-0.50 per minute, with flagged segments for optional human review.
Action Plan for Universities
Immediate (This Week)
- Audit your current state: Pick 10 random lecture videos, check caption accuracy
- Identify high-risk content: STEM courses, high-enrollment courses
- Stop claiming auto-captions are compliant: Update any policies that suggest otherwise
Short-Term (This Month)
- Prioritize: Caption highest-enrollment courses first
- Choose a solution: Professional captioning, AI-enhanced, or hybrid
- Train faculty: How to edit captions in your LMS
Ongoing
- New content: Require caption review before publishing
- Quality checks: Random audits of caption accuracy
- Student feedback: Make it easy to report caption errors
The Bottom Line
Auto-captions are a starting point, not a solution.
They're useful for:
- Live caption approximation during synchronous lectures
- Initial transcript for editing
- Informal, low-stakes content
They're not acceptable for:
- Recorded lectures (WCAG 1.2.2 compliance)
- Assessment content (equal access to instructions)
- Any content where accuracy matters
The good news: Getting to compliant captions doesn't require manual transcription of every video. AI-enhanced solutions like Aelira can get you to 95%+ accuracy at a fraction of the cost of professional services.
The deadline is real. The standard is clear. Auto-captions aren't enough.
Learn how Aelira handles video accessibility or join the pilot program.

Aelira Team
•Accessibility EngineersThe Aelira team is building AI-powered accessibility tools for higher education. We're on a mission to help universities meet WCAG 2.1 compliance before the April 2026 deadline.
Related Articles
What Are Australian University Accessibility Requirements?
Australian universities must comply with the DDA, the Disability Standards for Education, and WCAG 2.1 AA. Here's what that means in practice and how to get compliant.
What Is the Matterhorn Protocol?
The Matterhorn Protocol defines exactly how to test whether a PDF meets the PDF/UA accessibility standard. Here is what it checks, how it works, and why it matters for compliance.
What Documents Need to Be Accessible Under ADA Title II?
Under ADA Title II, virtually every digital document your university publishes is in scope — PDFs, slides, spreadsheets, videos, and even content inside your LMS. Here's how to figure out what needs remediation and where to start.
Ready to achieve accessibility compliance?
Join the pilot program for early access to Aelira's AI-powered accessibility platform
Apply for Pilot