Why Most PDF Accessibility Tools Only Find Problems
The accessibility industry built great scanners. But scanning isn't fixing. Here's why detection is the easy part — and what actual remediation requires.
The accessibility tool market has a dirty secret: most tools don't fix anything.
They scan. They report. They generate dashboards with red and yellow indicators. And then they hand you a list of problems and say, essentially, good luck.
This isn't a criticism of any specific vendor. It's a structural problem with how the industry has evolved — and it's worth understanding if you're evaluating tools for a compliance deadline.
The Scan-and-Report Model
Here's how most accessibility tools work:
- You upload a PDF (or connect your LMS)
- The tool scans it against WCAG 2.1 criteria
- You get a report: "This document has 34 accessibility issues"
- You fix them manually, or hire someone to do it
Step 3 is where the tool's job ends. Step 4 is where your problems begin.
The implicit assumption is that detection is the hard part and that once you know what's wrong, fixing it is straightforward. For a handful of documents, that might be true. For a university with 15,000 PDFs to remediate before a deadline, it's a fantasy.
Why Detection Is the Easy Part
Scanning a PDF for accessibility issues is fundamentally pattern matching:
- Missing alt text? Check if images have an
/Alttag. Binary yes/no. - No document title? Check the metadata. Binary yes/no.
- Missing language tag? Check
/Langin the structure tree. Binary yes/no. - Colour contrast failure? Extract foreground/background colours and calculate the ratio. Pure maths.
These checks are well-defined, deterministic, and automatable. A good scanner can evaluate a PDF against 100+ rules in seconds. This is a solved problem.
Remediation is not pattern matching. It's reconstruction.
What "Fixing" Actually Requires
When a scanner says "incorrect reading order," fixing that means:
- Extracting every content block from the PDF with its bounding box
- Determining the visual layout — is this single-column, two-column, mixed?
- Detecting headers and footers that repeat across pages
- Establishing the correct reading sequence based on visual flow
- Rewriting the PDF's internal structure tree to reflect that sequence
- Marking repeated headers/footers as artifacts so screen readers skip them
That's not a report item. That's a document reconstruction task. And reading order is just one of dozens of issue types.
Table remediation requires detecting table boundaries, extracting cell structures, identifying which cells are headers vs. data, handling merged cells, and inserting the full semantic tag hierarchy (THead, TBody, TR, TH, TD) with proper Scope attributes.
Structure tagging requires analysing visual formatting — font sizes, weights, spacing — to infer heading levels, paragraph breaks, list structures, and block quotes, then building a complete tag tree from scratch.
Alt text generation requires understanding image content in the context of the surrounding document — not just "what is this image" but "what is this image communicating in this context."
Each of these is a genuinely hard engineering problem. That's why most tools don't attempt them.
The Labour Arbitrage Model
Instead of solving remediation technically, much of the industry relies on labour arbitrage. The tool scans the documents, generates work orders, and routes them to human remediators — often offshore teams working at lower hourly rates.
This works, up to a point. But it has fundamental scaling problems:
- Cost per document remains high — typically $50-150 per PDF depending on complexity
- Throughput is limited by human capacity — you can't remediate 15,000 documents in a month with manual labour
- Quality varies — different remediators make different decisions about the same content
- It's not sustainable — new content is created every semester, so the backlog never ends
For a single compliance audit, outsourced remediation might work. As an ongoing operational model, it breaks down.
The Risk of Bad Auto-Fixes
There's a reason some vendors avoid automated remediation: wrong fixes can be worse than no fixes.
An auto-generated alt text that says "image of a graph" is technically present (passes the scanner check) but practically useless (a student learns nothing from it). A heading hierarchy that guesses wrong creates a misleading document structure. Table headers applied to the wrong cells make data actively confusing.
This is a real risk, and it's the main argument against automated remediation. But the answer isn't to avoid automation entirely — it's to score the confidence of each fix and route uncertain ones to human review.
A rule-based fix like "add a missing document title from the filename" is nearly certain to be correct. An AI-generated alt text for a complex STEM diagram is a best guess that needs expert review. These shouldn't be treated the same way.
Effective automated remediation requires:
- Tiered confidence scoring — rule-based fixes (high confidence) applied automatically, AI-generated fixes (lower confidence) flagged for review
- Transparent fix descriptions — show exactly what was changed and why, so reviewers can make informed decisions quickly
- Post-fix validation — run the scanner again after remediation to prove the fixes actually resolved the issues
Without these safeguards, automated remediation is a liability. With them, it's a force multiplier.
Validation as Proof
This is the piece most tools skip entirely. After your documents are "fixed" — whether by humans or AI — how do you know they're actually compliant?
The answer is independent validation against published standards:
- Matterhorn Protocol — the machine-checkable subset of PDF/UA, covering structure trees, language tags, alt text, headings, tables, and role mappings
- veraPDF — the most comprehensive PDF/UA validator available, with 108 rules covering edge cases that simpler checkers miss
Running validation after remediation gives you something no scan report can: proof that the fixes worked. Not "we identified the issues" or "we attempted to fix them" — a verifiable record that the output document meets the standard.
If your current tool can't show you post-remediation validation results, you're taking its word that the work was done correctly.
Open Source and Model Choice
There's one more dimension worth considering: can you see what the tool is doing to your documents?
Closed-source remediation tools are black boxes. You feed in a PDF, you get back a modified PDF, and you trust that the modifications were correct. For compliance purposes — especially if you're subject to audit — that trust model may not be sufficient.
Open-source remediation engines let you audit the logic. You can read exactly what rules are applied, how decisions are made, and what changes are written to the document structure.
On the AI side, vendor lock-in is a growing concern. If your remediation tool depends on a single AI provider, you inherit their pricing changes, their data handling policies, and their availability. Tools that support pluggable AI providers — letting you choose between hosted APIs and self-hosted open-source models — give you control over both cost and data sovereignty.
For universities with strict data handling requirements, the ability to run the entire pipeline on-premises with open-source models isn't a nice-to-have. It's a requirement.
What to Ask When Evaluating Tools
If you're comparing accessibility tools, these questions separate scanners from remediators:
- Does it fix issues, or only report them? If the output is a report with no modified document, it's a scanner.
- What percentage of issues can it auto-fix? If the answer is zero or vague, remediation is manual.
- How does it handle uncertain fixes? If there's no confidence scoring or review workflow, either it's not automating fixes or it's applying them blindly.
- Can it validate its own output? If there's no post-fix validation, you can't verify the work.
- Can you audit the remediation logic? If the tool is closed-source, you're trusting the vendor's implementation.
- Can you choose your AI provider? If you're locked to a single vendor's API, consider the long-term implications.
The accessibility tool market is mature at detection and immature at remediation. As compliance deadlines approach and document volumes grow, the gap between "finding problems" and "fixing problems" becomes the most important factor in your evaluation.
Aelira is an open-core accessibility platform that scans and remediates documents — not just one or the other. See how it works or learn more about the platform.

Aelira Team
•Accessibility EngineersThe Aelira team is building AI-powered accessibility tools for higher education. We're on a mission to help universities meet WCAG 2.1 compliance before the April 2026 deadline.
Related Articles
What Is the Best PDF Remediation Tool?
A practical guide to evaluating PDF remediation tools — from manual editors to outsourced services to AI-powered platforms. What actually matters when choosing how to fix inaccessible documents at scale.
How Do I Remediate Thousands of PDFs at Scale?
Universities face backlogs of 10,000 to 50,000+ inaccessible PDFs. Manual remediation is impossible at that volume. Here's a practical framework for triaging, automating, and validating document accessibility at institutional scale.
What Is the Difference Between Scanning and Remediation?
Scanning finds accessibility problems. Remediation fixes them. Most tools only do one of these — and the difference matters more than you think.
Ready to achieve accessibility compliance?
Join the pilot program for early access to Aelira's AI-powered accessibility platform
Apply for Pilot