PDFs of scanned documents are often just images of documents, similar to digital photographs, and typically do not contain readable text. Unless the PDF is created as a text-based file or processed using Optical Character Recognition (OCR), screen readers and many other types of assistive technology cannot interpret the content.

Why Text-Based PDFs Matter

Text-based PDFs are essential for accessibility, as they allow assistive technologies such as screen readers, text-to-speech software, and magnifiers to interpret and interact with the content. Text-based PDFs also allow the user to adjust the text as needed, for example, changing size, color, font, etc. They also improve searchability, allowing search engines to index the document’s text. Simply printing a document to PDF or scanning without text recognition or OCR usually results in an image-based, non-accessible file.

Text Recognition / Optical Character Recognition (OCR): OCR is a technology that converts printed or handwritten text in scanned documents into machine-readable text. When a document is scanned using text recognition or OCR software, the system recognizes the characters and creates a digital version that can be read by assistive technology. Many modern scanners include text recognition or OCR functionality — refer to your device's manual to confirm this feature. In some cases, processing a scanned document using OCR will still pose accessibility issues.

Issues with Scanned PDFs

Magnification Causes Distortion

When a scanned PDF does not have an electronic text ( eText) layer, it cannot be used with assistive technology such as text-to-speech and screen reader technologies. Even if scanned PDFs do have an eText layer, it is typically placed under the scanned image. For people who need to zoom-in on the text, the output becomes pixelated and distorted because the eText is under the image of the text.

screenshot. See caption for image description.
A screenshot of a reading at 100% may seem very legible to most upon first glance.
screenshot. See caption for image description.
When the same document (as above) is magnified, the text gets increasingly more pixelated and distorted.

Electronic Text Accuracy

If your scanned PDF does have an eText layer through text recognition, because the text is under the image, it is extremely difficult to tell if this eText is an accurate reflection of what is visually represented. This means that inaccurate eText provides users who rely on assistive technology, such as text-to-speech or screen readers, with a less accurate version of the document than those who simply read visually from the image layer.

screenshot. See caption for image description.
Sidebar text with a bulleted list visually appears legible and well-formatted.
screenshot. See caption for image description.
The same text (first bullet in image above), when pasted into Word, reveals gibberish in the underlying text layer.

Customization and Reflow

By their nature, PDFs provide a static layout that restricts users from customizing important features such as font and color from within the program. Additionally, the standard (free) version of Adobe Acrobat has no built-in mechanisms for reliably exporting text into another more flexible format. These restrictions limit the ability of those who need such customizations to read effectively. Adobe Reader does; however, provide users with a “reflow” view. This view hypothetically allows users to zoom into their document as one column of text without the need to pan from left to right, much like an eBook or web page. Beyond more seamless text enlarging, reflow allows documents to be read easily on a user’s preferred device, such as a mobile phone or tablet. Unfortunately, with scanned PDFs this feature is difficult to ensure.

screenshot. See caption for image description.
Reading pane of a PDF document shows standard formatting at 100% in standard view.
screenshot. See caption for image description.
The reading pane of the same PDF at 200% in "reflow" view. Rather than creating one block of text, both pages are preserved as columns and as the zoom increases (and/or the screen width decreases), the letters and words become increasingly crowded together to the point of illegibility.

HOW TO: Create Accessible PDFs from Scanned Documents

Scanned PDFs always require some reformatting before they can be posted online as an accessible document. Even though we have worked to find a solution that is as simple as possible for as many cases as possible, this reformatting can require extra effort, which is why we always encourage you to opt for easier methods of obtaining accessible documents.

Before choosing to use a scanned document, we encourage you to: 

  1. Supply a link to an online version of the document (e.g., an HTML version hosted on a journal or ebook’s website).
  2. Work with JMU Libraries in advance to procure electronic documents from publishers and vendors.
  3. Use well-formatted PDFs that are NOT scanned PDFs (e.g., a journal article PDF supplied alongside the HTML version. In this case, you will be able to not only select the text but zoom into the text without it pixelating.)

SensusAccess

If you are unable to find or procure a document in an accessible format and must work from a scanned PDF, you can use SensusAccess to help create an accessible version.

  1. In a web browser, navigate to JMU’s SensusAccess Scroll down until you see “Begin Converting Documents.”
  2. In the “Source” menu, make sure that the “File” radio button is checked.
  3. Under “Step 1 – Upload your document,” click, “Choose Files.” In the pop-up window, navigate to and select one or more files that you wish to process. Hit “Open,” and then back on the SensusAccess page press the “Upload” button.
  4. Under “Step 2 – Select your output format,” check the “Accessibility Conversion” radio button.
  5. Under “Step 3 – Specify accessibility conversion options,” choose either “docx – Word Document” or “PDF – Tagged PDF (text over image).” The benefit of choosing a Word document is that it allows you to manually correct any errors you may wish to fix from the text recognition process.
  6. Under “Step 4 – Enter email address and submit request,” type in your JMU email address and press the “Submit” button. In a few minutes, you will receive an email with your converted document/s attached. Please note that text recognition is prone to errors and only as good as the original scan quality. If you have opted to convert the document into a Word document, you will have the ability to fix any errors that may have occurred during this automated conversion.
  7. Provide the converted document only. Providing the original scanned PDF, even alongside the fixed version, likely will deny equal access.

As mentioned above, text recognition is only as good as the quality of the original scan. Even then, it is prone to errors, especially with text in multiple languages, STEM content, or with unique visual formatting. In such cases, the process above may not be adequate, and it may be necessary to seek out a different solution.

While not intuitive, it may save time and effort to re-scan old, poor quality scanned PDFs. Generally, 300dpi resolution is adequate for standard-sized text, but 400-600dpi resolution is more appropriate for smaller text. Greater resolution (beyond this) tends to have diminishing returns in text recognition accuracy.  “Grayscale” or “color” scans are generally far more reliable than “black & white” scans. For more scanning tips and further information please see:

WCAG Technical Standards

WCAG are web standards designed with web pages in mind. This said, their use in law has been incorporated into information and communication technology (ICT) more broadly for decades. This sometimes includes more interpretation of the success criteria based on context than is typical with traditional web pages and is noted below where applicable.

Scanned PDFs broadly and most directly violate Success Criterion 1.4.5 Images of Text (AA).

Inaccurate eText under the image layer breaks one or both of the following success criteria (based on interpretation):

The inability customize text may violate the following success criteria indirectly due to lack of eText availability and restriction of customization:

Text that is not reflowable (within some parameters) breaks the following success criteria:

Back to Top