Batch OCR with Open Source Tools - PowerPoint PPT Presentation

About This Presentation
Title:

Batch OCR with Open Source Tools

Description:

Batch OCR with Open Source Tools Jonathan Brinley Adelie Design (ex-Ball State University) – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 9
Provided by: jbri60
Learn more at: https://code4lib.org
Category:
Tags: ocr | batch | open | source | tools

less

Transcript and Presenter's Notes

Title: Batch OCR with Open Source Tools


1
Batch OCR withOpen Source Tools
  • Jonathan Brinley
  • Adelie Design
  • (ex-Ball State University)

2
  • http//whatever.scalzi.com/2006/09/13/clearly-you-
    people-thought-i-was-kidding/

3
  • Tesseract
  • http//code.google.com/p/tesseract-ocr/
  • OCRopus
  • http//code.google.com/p/ocropus/

4
How to OCR an Image
  • ocroscript recognize /path/to/file.png gt
    /path/to/output.html

5
hOCR
  • ltbodygt
  • ltdiv class"ocr_page" title"bbox 0 0 2548 3300
    image /path/to/scanned/image.png"gt
  • ltspan class"ocr_line" title"bbox 659 143 863
    177"gtSome Textlt/spangt
  • ltspan class"ocr_line" title"bbox 723 275 916
    324"gtMore Textlt/spangt
  • lt/divgt
  • lt/bodygt

6
  • http//www.brainofshawn.com/2007/08/05/givin-scalz
    i-a-hand/

7
HocrConverter.py
  • from HocrConverter import HocrConverter
  • hocr HocrConverter("myHocrFile.html")
  • hocr.to_text("output.txt")
  • hocr.to_pdf("myImageFile.png", "output.pdf")

8
Learn More or Get the Code
  • http//xplus3.net/2009/04/02/convert-hocr-to-pdf/
  • jonathanbrinley_at_gmail.com
Write a Comment
User Comments (0)
About PowerShow.com