Why Most PDF Accessibility Tools Only Find Problems

The accessibility tool market has a dirty secret: most tools don't fix anything.

They scan. They report. They generate dashboards with red and yellow indicators. And then they hand you a list of problems and say, essentially, good luck.

This isn't a criticism of any specific vendor. It's a structural problem with how the industry has evolved — and it's worth understanding if you're evaluating tools for a compliance deadline.

The Scan-and-Report Model

Here's how most accessibility tools work:

You upload a PDF (or connect your LMS)
The tool scans it against WCAG 2.1 criteria
You get a report: "This document has 34 accessibility issues"
You fix them manually, or hire someone to do it

Step 3 is where the tool's job ends. Step 4 is where your problems begin.

The implicit assumption is that detection is the hard part and that once you know what's wrong, fixing it is straightforward. For a handful of documents, that might be true. For a university with 15,000 PDFs to remediate before a deadline, it's a fantasy.

Why Detection Is the Easy Part

Scanning a PDF for accessibility issues is fundamentally pattern matching:

Missing alt text? Check if images have an /Alt tag. Binary yes/no.
No document title? Check the metadata. Binary yes/no.
Missing language tag? Check /Lang in the structure tree. Binary yes/no.
Colour contrast failure? Extract foreground/background colours and calculate the ratio. Pure maths.

These checks are well-defined, deterministic, and automatable. A good scanner can evaluate a PDF against 100+ rules in seconds. This is a solved problem.

Remediation is not pattern matching. It's reconstruction.

What "Fixing" Actually Requires

When a scanner says "incorrect reading order," fixing that means:

Extracting every content block from the PDF with its bounding box
Determining the visual layout — is this single-column, two-column, mixed?
Detecting headers and footers that repeat across pages
Establishing the correct reading sequence based on visual flow
Rewriting the PDF's internal structure tree to reflect that sequence
Marking repeated headers/footers as artifacts so screen readers skip them

That's not a report item. That's a document reconstruction task. And reading order is just one of dozens of issue types.

Table remediation requires detecting table boundaries, extracting cell structures, identifying which cells are headers vs. data, handling merged cells, and inserting the full semantic tag hierarchy (THead, TBody, TR, TH, TD) with proper Scope attributes.

Structure tagging requires analysing visual formatting — font sizes, weights, spacing — to infer heading levels, paragraph breaks, list structures, and block quotes, then building a complete tag tree from scratch.

Alt text generation requires understanding image content in the context of the surrounding document — not just "what is this image" but "what is this image communicating in this context."

Each of these is a genuinely hard engineering problem. That's why most tools don't attempt them.

The Labour Arbitrage Model

Instead of solving remediation technically, much of the industry relies on labour arbitrage. The tool scans the documents, generates work orders, and routes them to human remediators — often offshore teams working at lower hourly rates.

This works, up to a point. But it has fundamental scaling problems:

Cost per document remains high — typically $50-150 per PDF depending on complexity
Throughput is limited by human capacity — you can't remediate 15,000 documents in a month with manual labour
Quality varies — different remediators make different decisions about the same content
It's not sustainable — new content is created every semester, so the backlog never ends

For a single compliance audit, outsourced remediation might work. As an ongoing operational model, it breaks down.

The Risk of Bad Auto-Fixes

There's a reason some vendors avoid automated remediation: wrong fixes can be worse than no fixes.

An auto-generated alt text that says "image of a graph" is technically present (passes the scanner check) but practically useless (a student learns nothing from it). A heading hierarchy that guesses wrong creates a misleading document structure. Table headers applied to the wrong cells make data actively confusing.

This is a real risk, and it's the main argument against automated remediation. But the answer isn't to avoid automation entirely — it's to score the confidence of each fix and route uncertain ones to human review.

A rule-based fix like "add a missing document title from the filename" is nearly certain to be correct. An AI-generated alt text for a complex STEM diagram is a best guess that needs expert review. These shouldn't be treated the same way.

Effective automated remediation requires:

Tiered confidence scoring — rule-based fixes (high confidence) applied automatically, AI-generated fixes (lower confidence) flagged for review
Transparent fix descriptions — show exactly what was changed and why, so reviewers can make informed decisions quickly
Post-fix validation — run the scanner again after remediation to prove the fixes actually resolved the issues

Without these safeguards, automated remediation is a liability. With them, it's a force multiplier.

Validation as Proof

This is the piece most tools skip entirely. After your documents are "fixed" — whether by humans or AI — how do you know they're actually compliant?

The answer is independent validation against published standards:

Matterhorn Protocol — the machine-checkable subset of PDF/UA, covering structure trees, language tags, alt text, headings, tables, and role mappings
veraPDF — the most comprehensive PDF/UA validator available, with 108 rules covering edge cases that simpler checkers miss

Running validation after remediation gives you something no scan report can: proof that the fixes worked. Not "we identified the issues" or "we attempted to fix them" — a verifiable record that the output document meets the standard.

If your current tool can't show you post-remediation validation results, you're taking its word that the work was done correctly.

Open Source and Model Choice

There's one more dimension worth considering: can you see what the tool is doing to your documents?

Closed-source remediation tools are black boxes. You feed in a PDF, you get back a modified PDF, and you trust that the modifications were correct. For compliance purposes — especially if you're subject to audit — that trust model may not be sufficient.

Open-source remediation engines let you audit the logic. You can read exactly what rules are applied, how decisions are made, and what changes are written to the document structure.

On the AI side, vendor lock-in is a growing concern. If your remediation tool depends on a single AI provider, you inherit their pricing changes, their data handling policies, and their availability. Tools that support pluggable AI providers — letting you choose between hosted APIs and self-hosted open-source models — give you control over both cost and data sovereignty.

For universities with strict data handling requirements, the ability to run the entire pipeline on-premises with open-source models isn't a nice-to-have. It's a requirement.

What to Ask When Evaluating Tools

If you're comparing accessibility tools, these questions separate scanners from remediators:

Does it fix issues, or only report them? If the output is a report with no modified document, it's a scanner.
What percentage of issues can it auto-fix? If the answer is zero or vague, remediation is manual.
How does it handle uncertain fixes? If there's no confidence scoring or review workflow, either it's not automating fixes or it's applying them blindly.
Can it validate its own output? If there's no post-fix validation, you can't verify the work.
Can you audit the remediation logic? If the tool is closed-source, you're trusting the vendor's implementation.
Can you choose your AI provider? If you're locked to a single vendor's API, consider the long-term implications.

The accessibility tool market is mature at detection and immature at remediation. As compliance deadlines approach and document volumes grow, the gap between "finding problems" and "fixing problems" becomes the most important factor in your evaluation.

Aelira is an open-core accessibility platform that scans and remediates documents — not just one or the other. See how it works or learn more about the platform.

Why Most PDF Accessibility Tools Only Find Problems

The Scan-and-Report Model

Why Detection Is the Easy Part

What "Fixing" Actually Requires

The Labour Arbitrage Model

The Risk of Bad Auto-Fixes

Validation as Proof

Open Source and Model Choice

What to Ask When Evaluating Tools

Aelira Team

Related Articles

Batch Processing vs Manual Remediation: A Time and Cost Comparison

Can I Self-Host an Accessibility Tool?

Is Automated Accessibility Remediation Reliable?

Ready to achieve accessibility compliance?