Title: Processing PDF: How to Go from PDF to E-text to Audio
1Processing PDFHow to Go from PDF toE-text to
Audio
- Gaeir Dietrich
- DirectorHigh Tech Center Training Unitof the
California Community CollegesFoothill Community
College District
2PDF from Publishers
- Portable document format (PDF)
- Reads the same on any computer
- Looks like the book
- Smaller than TIFFs
- Contains all the text
- Always check to make sure the book is the right
one! - Easy for publishers
3Requesting through ATN
- Access Text Network
- Now free for requesting files from ATN-member
publishers - Paid membership to exchange files
- www.accesstext.org
- Not all publishers
- But ATN does have the largest ones
4Other Resources at ATN
- Accessible Textbook Finder
- http//www.accesstext.org/atf.php
- Link to Publisher Lookup
- http//www.publisherlookup.org/
- Will have to contact non-ATN member publishers
directly
5Using Publisher PDFs
- Sometimes students can use files directly
- Often files will need further processing for
student use - At the very least, large files may need to be
broken into chapters
6PDF Strengths
- Good format for large print
- Cropping
- Fit to page on large pages
- Print sections on large pages (tiling)
- Adobe Reader has some nice features
- Change colors
- Reflow
- Limited voicing
- Works on both Mac and PC
- Easy for most publishers to create
7PDF Weaknesses
- Not always fully accessible
- Screen readers do not always like themeven when
they are text-based - Reading order can be problematic
- May be graphics (pictures of text)
- May have too much security
8As an Aside
- When faculty create PDFs
- The PDF always started as something elseusually
a Word file - Try to get the starting document if the student
prefers audio - Security concerns?
- Word files can be password protected
- Button gt Prepare gt Encrypt
9Types of PDF Documents
- Text-based
- Text can be selected
- Graphical
- Picture of text (i.e., a graphic)
- Text cannot be selected
- Use text-select tool to tell the difference
- Files may be locked
10Processing PDFs
- Adobe Acrobat Professional
- Check on College Buys for discount
- Good OCR program
- Abbyy FineReader
- Nuance OmniPage
- IF you are a Kurzweil campus, you will also need
Kurzweil
11Adobe Tools
- Adobe Reader
- Free
- Useful for students who need minimalaccessibility
features - http//www.adobe.com/products/reader/
- Adobe Acrobat Professional
- Essential for alt media specialists
- Extract text, create accessible PDFs, enabled
Adobe Reader features - www.uscollegebuy.com Discounted Price
12Acrobat Reader
- Reads aloud
- But does not highlight or track
- Enlarges text
- Nice reflow feature
- Changes text/background colors
- Text highlighting, sticky notes, and comments
- Access for text-based PDFs
13Production Features in Reader
- Really designed for reading, not reformatting
- Export PDF
- Subscription service (about 20/year)
- Upload PDF file, service auto-converts to Word,
download
14Process with Acrobat Pro
- Cropping
- Enlargement for printing
- Tiling
- Extracting/deleting pages
- Combining/inserting pages
- Text extraction
- Works best with text-based PDF
- Does have built-in OCR capability
15Customize Quick Tools
- Click on the gear
- View gt Show/hide gt Toolbar Items gt Quick Tools
16Quick Tools Menu
17Customize
18Please Note
- To enable single-key shortcuts
- Open Preferences dialog box Ctrl K
- Under General gt select Use Single-Key
Accelerators To Access Tools (first checkbox
under Basic Tools)
19Cropping
- Tools gt Pages gt Crop
- Shortcut C
- (Please note This shortcut brings up the
mouse-driven cropping toolmust double click to
open the dialog box!)
20Crop Tool
21Crop Toolbox
22Enlarging
- Choose paper size/printer
- File gt Print gt Sizeto Fit
- Shortcut Ctrl P (tab through)
- Tip Crop document before enlarging
23Print to Fit
24Tiling
- Choose paper size/printer
- File gt Print gt Poster gt Tile Scale and Overlap
- Shortcut Ctrl P (tab through)
- Tip Crop document before tiling
25Enlarge with Tiling
26Extracting Pages
- Tools gt Pages gt Extract
- Delete Shortcut Ctrl Shift D
- Extract Pages Shortcut Alt V T P (opens
Pages pane F6 focuses in pane and can arrow down)
27Extraction Tool
28Tips for Extracting Chapters
- Crop on complete file before extracting
- Work on a copy!!!!!
- Extract from end toward front!
- Use table of contents to help
- Place focus on first page of chapter to extract
(beginning with last)
29Starting from the Back
30Combining
- File gt Pages gt Insert
- OR
- Create gt Combine files
31Inserting Pages
32Combining Pages
33Auto Extracting Text
- File gt Save As gt MS Word
- Retains styles and paragraphs
- File gt Save As gt More options
- Text (Accessible)
- Lose styles, places hard returns at end of line
- Text (Plain)
- Lose styles, keeps paragraphs
- Shortcut Alt F A
34Save As Options
35Better Text Extraction
- OCR programs analyze text and structure
- Acrobat Pro has built-in OCR, but other programs
provide more control - Can control which text to include
36More Control over Text
- For graphical PDFs
- Or
- To maintain more control over extracting text
from text-based PDFs - Use an OCR program!
37Processing Graphical PDFs
- Must run optical character recognition (OCR)
- Computers cannot read pictures
- OCR programs recognize the characters in the
picture - How you process the file depends on the end
format the student wants!
38Want to Stay in PDF?
- Sometimes students do want a text-based PDF
- Can OCR in Adobe Pro
- Toolsgt Recognize Text
39Under Tools
40Want Text Out
- OmniPage or FineReader
- FineReader generally easier to learn
- Save to Word or HTML or Text based on student
preference - Use virtual printer with Kurzweil
- Create KESI files
- RW
- Save as Word
41Which One When?
- Want a Word file?
- Best choice is OmniPage or FineReader
- Want a Kurzweil document?
- Use Kurzweil to process the PDF
- For students to do themselves?
- Whichever program they prefer
42Why?
- OCR programs are designed to make extraction and
editing easy - Document readers (RW, Kurzweil, etc.) are
designed to make reading easyNOT editing.
43NEVER!!!
- Do NOT run OCR with FineReader or OmniPagesave
to PDFand then take into Kurzweil, RW, etc. - Kurzweil, RW, WYNN will run their own OCR on the
PDF! - Wastes time, adds error to do OCR twice
44OCR Programs
- Treat PDFs the same as a TIFF
- If you OCR scanned documents, use the same
process - Load image file
- Select zones
- Create templates as needed
45OCR Process Details
- Crop before loading into OCR engine
- Turn on multiple languages as needed
- If doing math, turn on Greek
- Only turn on the languages you need
- Edit in the OCR program
- Some OCR programs have font matching features
- Save to Word
46Captions and Such
- For students who want audio or who are using
screen readers - Separate the main body of the text and the
ancillary text (captions, sidebars, footnotes) - Create two documents 00 Chapter and 00A Chapter
- Allows the student to hear main text uninterrupted
47Two Doc Workflow
- Open PDF in OCR Program
- Analyze layout for entire document
- Save a copy
- On one copydelete all ancillary text
- Save to Word as 00 Chapter
- On other copydelete all main body text
- Save as 00A Chapter
- Keep page numbers in both documents!
48Once in Word
- Learn to use show hidden
- Ctrl Shift 8
- Beware of the optional hyphen
- Search and replace to delete
- Search for - replace with nothing
- Run spell check
- Use styles to structure files for braille program
49Converting Files
50Mobile Readers?
- Check formats that device can handle
- Some handle PDF and DOC, some do not
- All readers handle TXT
- Also called text, ASCII
- Can save from Word as plain text
51Magic Conversion Tool
- Calibre
- Converts to and from many formats
- Fairly intuitive
- Free!
- http//calibre-ebook.com/
52Another Conversion Tool
- TechAdapt
- http//www.techadapt.com/
- TechAdapt Accessible Media Center (TAMC)
- For converting NIMAS and DAISY
- DAISY to
- RTF
- HTML
53File Transfer
- Can use DropBox or Box to transfer files for most
readers - Kindle and iPad can often use e-mail
54Resource Info
- Gaeir Dietrich
- gdietrich_at_htctu.net
- 408-996-6047
- www.htctu.net
- Alt media listserv
- Manuals online