How Aelira Actually Fixes Your PDFs (Not Just Flags Them)

You upload a PDF. Your accessibility tool scans it. The result: "47 issues found."

Now what?

Most tools stop there. You get a report full of problems — missing structure tags, incorrect reading order, unlabelled table headers — and the implicit message is: go fix these yourself. Manually. One by one. At 30-60 minutes per file.

If you have 500 PDFs to remediate before a compliance deadline, that maths doesn't work.

Aelira takes a different approach. When you upload a PDF, you get back a fixed PDF — not just a report. Here's what actually happens.

Step 1: Structure Analysis

The first thing Aelira does is analyse your PDF's internal structure. Most PDFs created by "Save as PDF" or scanning look fine visually, but internally they're a mess. Screen readers don't see what you see — they see the raw data layer, which often has:

No heading hierarchy — everything is flat text
No reading order — content might be read in the wrong sequence
Tables without headers — data grids with no context for what each column means
Images without alt text — invisible to anyone using a screen reader

Aelira maps out all of these structural gaps before applying any fixes.

Step 2: Reading Order

This is one of the hardest problems in PDF accessibility, and where most tools give up entirely.

A two-column academic paper looks obvious to a sighted reader: left column first, then right column. But PDFs don't store content that way. The internal data might have the right column's first paragraph immediately after the left column's title, because that's how the authoring tool happened to write it.

Aelira uses a dual strategy to get reading order right:

For standard layouts (single column, two-column papers, slide handouts), a heuristic engine analyses the visual layout. It detects columns by clustering content blocks by their horizontal position, identifies headers and footers that repeat across pages, and establishes the correct top-to-bottom, left-to-right reading sequence. Headers and footers get marked as artifacts so screen readers skip them automatically.

For complex layouts (mixed columns, sidebars, pull quotes, unusual designs), Aelira uses AI vision. It renders the page as an image and asks an AI model to determine the correct reading sequence based on the visual layout — the same way a human reader would naturally scan the page.

The heuristic approach handles the majority of documents with high confidence. AI vision kicks in only when the layout is too complex for rule-based analysis.

Step 3: Table Remediation

Tables are everywhere in academic documents — grade rubrics, data tables, comparison charts, lab results. An untagged table is almost useless to a screen reader. The reader sees a stream of disconnected values with no way to know which column or row they belong to.

Aelira detects tables in your PDF, then:

Extracts the table structure — rows, columns, cell boundaries, merged cells
Identifies headers — analyses the first row and column for short, distinct text that looks like labels
Optionally confirms with AI vision — for ambiguous tables, sends a snapshot to AI for a second opinion on which cells are headers
Applies proper tags — creates the full semantic structure: THead, TBody, TR, TH, and TD elements with Scope attributes so screen readers can announce "Column: Grade, Row: Assignment 3"

Merged cells, irregular grids, and nested tables all get handled. The more complex the table, the lower the confidence score — which brings us to the next step.

Step 4: Confidence Scoring

Here's where Aelira diverges most from other tools. Automated fixes aren't all created equal. Adding a missing document title is a near-certain fix. Generating alt text for a complex educational diagram is a judgment call.

Aelira scores every fix on a confidence scale:

| Fix Type | Confidence | What Happens | |----------|-----------|--------------| | Rule-based fixes (title, language, bookmarks) | ~0.95 | Applied automatically | | Heuristic fixes (heading hierarchy, reading order) | ~0.70 | Applied, flagged if complex | | AI text fixes (alt text from context) | ~0.60 | Flagged for review | | AI vision fixes (alt text from image analysis) | ~0.55 | Flagged for review |

Fixes scoring above 0.85 are applied automatically. You don't need to review them — they're structural, deterministic, and well-understood.

Fixes scoring below 0.85 are flagged for your review. You'll see exactly what Aelira changed and why, and you can accept, modify, or reject each one.

This means you spend your time on the 10% that needs human judgment — educational images, discipline-specific diagrams, context-dependent decisions — instead of manually tagging hundreds of headings.

Step 5: Validation

After applying fixes, Aelira doesn't just assume the PDF is now accessible. It validates.

Matterhorn Protocol — 15 machine-checkable conditions from the PDF/UA standard. Structure tree, language tags, alt text, heading hierarchy, table structure, role mappings.

veraPDF (optional, 108 rules) — The most comprehensive PDF/UA validator available. Covers edge cases that simpler validators miss.

You get a compliance report showing exactly which checks passed and which still need attention. Not "we think it's fixed" — proof it's fixed.

The AI Layer: Your Choice

For free and demo accounts, Aelira uses Google Gemini for AI-powered features like vision-based reading order, table header confirmation, and alt text generation.

For department and institutional plans, you can connect your own models — open-source options like Llama, Qwen, or Mistral running on your infrastructure via Ollama. Your documents never leave your servers.

And since Aelira's core is open source (MIT + AGPL), self-hosted users can run any model they choose. No vendor lock-in on the AI layer. Universities with data sovereignty requirements keep everything on-premises.

What This Looks Like in Practice

Before Aelira: You have a 20-page PDF lecture handout. An accessibility checker tells you it has 34 issues. You open Adobe Acrobat, start manually adding structure tags, fixing reading order, adding alt text. An hour later, you're on page 6.

After Aelira: You upload the PDF. 30 seconds later, you get it back with 31 issues fixed automatically and 3 flagged for your review — two images that need discipline-specific alt text and one complex table where AI wasn't sure about the header row. You spend 5 minutes reviewing those three items. Done.

That's the difference between a tool that finds problems and a tool that fixes them.

Under the Hood (For the Technically Curious)

Aelira's PDF pipeline is built on:

pikepdf for low-level PDF structure manipulation (structure trees, tag insertion, reading order rewriting)
PyMuPDF for content extraction (text blocks, table detection, bounding boxes)
Tesseract 5 for OCR on scanned documents
LuaLaTeX + tagpdf for producing PDF/UA-1 compliant output from LaTeX source
Matterhorn Protocol validator — a native implementation checking 15 PDF/UA conditions
veraPDF REST API — optional integration for 108-rule deep validation
Gemini / Ollama — pluggable AI provider for vision and text generation

The entire remediation pipeline is open source. You can read exactly what it does to your documents, audit the logic, or contribute improvements.

PDF/UA-1 and PDF/UA-2 are both supported, with automatic version detection from document metadata.

Try It

Upload a PDF to the demo and see the pipeline in action. No signup required for the first scan.

If you're evaluating tools for your department, request a pilot — we'll process a batch of your real documents so you can see exactly what the output looks like.

Aelira is an open-core accessibility platform built for higher education. Learn more or view pricing.

How Aelira Actually Fixes Your PDFs (Not Just Flags Them)

Step 1: Structure Analysis

Step 2: Reading Order

Step 3: Table Remediation

Step 4: Confidence Scoring

Step 5: Validation

The AI Layer: Your Choice

What This Looks Like in Practice

Under the Hood (For the Technically Curious)

Try It

Aelira Team

Related Articles

What Procurement Should Demand in an Accessibility AI Contract: A Due-Diligence Checklist

FERPA, Cloud Inference, and the Data-Sovereignty Question Procurement Should Be Asking

Why Compliance Scores Aren't Defensible: The Case for Human-Review Audit Trails

Ready to achieve accessibility compliance?