Go to Scholars Lab home

Contact: (434) 243-8800 or Send a Message

Directions | Fax & Mail | Reservations | Policies


Text Scanning: A Basic Helpsheet

See the image scanning help sheet for information about graphics creation.

Optical Character Recognition: The Process

Optical Character Recognition (OCR) converts scanned images into text. It works well on most 20th-century and 19th-century typefaces. With earlier printed material, or with poor reproductions of any typeface, the OCR software begins to encounter time-consuming obstacles. Broken letters, ligatures, digraphs, uneven inking, and antiquated letterforms may be unrecognized by the software, and each unrecognized character adds time to the proofing and correction stage of your project.

Try a test scan before going ahead with any large amount of text. A little experimenting at first can result in a lower error rate (and therefore less to correct in proofreading). Your results should be good with most modern type faces, but even with clean text of a decent type size there will be occasional errors; this error rate increases as the text's size and clarity decreases. Altering the brightness and resolution can improve results, but little can be done with a badly faded photocopy or a 17th or 18th century typeface.

Anything that disrupts the integrity of the letter's shape can be a potential cause of an error, although the software has some ability to compensate. Breaks in letters (and sometimes ornate italics) can cause what you will come to recognize as distinctive OCR errors -- a d getting read as cl, a 1 or ! as l, an m as in, or an e as c.

Optical Character Recognition: Some Sample Scans

If you are new to OCR scanning, you might want to look at sample scans, comprised of digital images of pages and the results of the OCR process.

Optical Character Recognition: Scholars' Lab Equipment

The Scholars' Lab currently has two Epson flat-bed scanners, two Fujitsu sheet-feed scanners, two Fujitsu duplex scanners, and one Fujitsu large-format scanner, using Epson Scan or ScandAll software for graphics scanning and OmniPage Pro or ABBYY FineReader for text scanning (optical character recognition, or OCR). PDF creation is also available using Adobe Acrobat. In the Electronic Text Center, the icons for these applications are located on the desktop. Additionally, the two Fujitsu scanners have an automatic document feeder attachment for processing large amounts of text.

OCR Software Helpsheets



University of Virginia Library
PO Box 400113, Charlottesville, VA 22904-4113
ph: (434) 924-3021, fax: (434) 924-1431, library@virginia.edu

Libraries   |   Depts./Contacts   |  U.Va. Home   |   ITC

Website Feedback   |   Search   |   Questions? Ask a Librarian   |   Hours   |   Map   |   Policies   |   Press   |   Jobs

© 2007 by the Rector and Visitors of the University of Virginia