Camera Based Document Image Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Camera Based Document Image Analysis

Description:

Camera Based Document Image Analysis David Doermann University of Maryland, College Park What defines the problem? Traditional Document Analysis Deals primarily with ... – PowerPoint PPT presentation

Number of Views:316
Avg rating:3.0/5.0
Slides: 49
Provided by: cedarBuf
Category:

less

Transcript and Presenter's Notes

Title: Camera Based Document Image Analysis


1
Camera Based Document Image Analysis
  • David Doermann
  • University of Maryland, College Park

2
What defines the problem?
  • Traditional Document Analysis
  • Deals primarily with paper representations
  • Acquired with flatbed or sheetfed scanners
  • Camera Based Analysis
  • Clearly defined by the acquisition device, its
    properties, the impacts of it use, etc. but
  • The devices open up a wide range of new and
    interesting applications (and problems) and
    extends what we may consider document analysis.

3
Scanner Acquisition
  • Advantages
  • Reasonable quality
  • Controlled lighting, high resolution, fixed
    imaging plane
  • Rapid Acquisition
  • Relatively cheap
  • Disadvantages
  • Specialized Device
  • Fixed Documents must come to device
  • Requires handing of documents or documents in a
    sheet form for feeders

4
Book Camera/Scanner Acquisition
  • One step removed from traditional scanners
  • Controlled environment lighting, image plane,
    orientation
  • Changes the nature of the content
  • Easier to image atypical documents
  • R are, Historic, fragile
  • Often very expensive
  • Can be relatively slow
  • sheet scanners are hundreds of pages per minute
  • although robotic cameras can image 10s of pages
    a minute.

5
Industrial Cameras
  • Removes the constraints on configuration
  • Often still in a controlled environment
  • Custom (and expensive) solutions are common
  • Processing power bounded only by cost
  • Usage
  • Postal applications, document inspection
    (newspapers, etc), industrial applications

6
(Portable) Digital Cameras
  • Provide a much greater flexibility then scanners
  • Multiple uses
  • Devices goes to documents
  • Potentially removes the bottleneck of acquisition
    for simple tasks
  • fewer constraints (Lighting, image plane,
    focus,)
  • increases complexity of resulting image and
    image processing
  • yet allows a wider variation of applications
  • A significant tradeoff

7
A broader view of a document?
  • Conceptually
  • The realization of a visual language for
    communicating information, typically meant for
    human consumption
  • Practically
  • Text, graphics, drawings, symbologies,
  • Properties
  • Exists over both space and time
  • Relatively easy to produce
  • Intended by the author to be read.

8
Roadmap
  • Discussion of some key related research and
    applications of non-scanner DIA
  • Influences of mobile devices on applications
  • Issues with processing traditional documents vs
    processing text
  • Future of camera based capture.
  • Open issues
  • New opportunities

9
What has been done?
  • Applications primarily centered on Image Text
  • Text in Video Graphics
  • Text from WWW pages
  • Text in Scenes
  • Some work on key challenges
  • Imaging of text in controlled environments such
    as parking lots, meeting rooms, assembly lines,
    etc
  • Limited work on actually processing traditional
    documents.

10
Video Text Recognition
  • Indexing content from graphic or scene text in
    videos used to supplement speech, closed
    captions,
  • Countless papers published
  • Challenges are well known
  • Low Resolution, Complex background, Different
    font style and size, Lighting, Camera motion,
    Text/Object motion, Occlusion/distortion, all
    magnified for scene text
  • Benefits of multiple frames, repeated content

11
WWW Text Image Analysis
  • Applications
  • Identifying graphic text for indexing and
    retrieval
  • Identification of SPAM email in attachments
  • Uncovering hidden information.
  • Issues
  • Text style variations (font, font style,
    orientation).
  • Text Quality (Color, Size, Anti-aliasing)
  • Image resolution

12
Visual Input
  • Applications
  • General input for computer systems
  • Passive verification of signatures from cameras
    mounted over the writing surface
  • Has general implications for mobile devices that
    dont have traditional keyboard input
  • Challenges
  • Pen tip tracking
  • Identifying the temporal relations
  • Online recognition

13
Whiteboard Reading
  • Reading handwritten and printed material for
    meeting scenarios
  • Challenges
  • Must deal with unconstrained handwriting
  • Distinguish text from graphics and sketches
  • Parse and Interpret graphics (electronic ink)
  • Content can appear and disappear dynamically
    produced

14
Meeting and Lecture Processing
  • Meetings
  • Reading name plates and tags
  • Identifying and linking references to documents
  • Processing whiteboards
  • Lectures
  • Reading text on projected presentations
  • Detection, normalization and matching of text
    with source content (PowerPoint)
  • Challenges
  • Variable content
  • Animations

15
License Plate Reading
  • Applications
  • Parking lot tracking
  • Red-light and Speed Camera
  • Vehicle Surveillance
  • Challenges
  • Moving Vehicles
  • Complex plates
  • Night and all weather imaging
  • Limited use of context

16
Road Sign Recognition
  • Applications
  • Driver Assisted Systems, Automated Mapping, Sign
    guideline enforcement (location, quality)
  • Challenges
  • Low resolution, motion blur
  • Real-time systems
  • Detecting signs under a variety of conditions

17
Sign Recognition and Translation
  • Application Integrated identification,
    recognition and translation text found on
    foreign signs, maps, menus, transportation
    schedules, etc
  • Extremely useful for other character sets
  • Primarily PDA or Mobile Phone Based Hardware
  • Networked or Standalone solutions have are being
    marketed
  • Ultimately software solutions are desirable.

18
Systems for the Visually Impaired
  • Allows legally blind consumers access to a
    variety of information sources
  • Transportation, shopping,
  • System builds end to end application of
    detection, enhancement, recognition and speech
    transcription

A. Zandifar, A. Chahine, R. Duraiswami and L.S.
Davis, A Video-based interface to textual
information for the visually
impaired , IEEE Computer Society ICMI 2002, pp
325-330.
19
Commonalities
  • Most of these systems can be/have been engineered
    and with the right constraints more are
    technically feasible but perhaps not cost
    effective.
  • But what is the catalyst that will promote more
    general applications?
  • Mobile devices and wireless networking are
    providing a platform which no longer requires
    special hardware

20
Mobile Devices
  • Examples PDAs, Digital Cameras, Cellular Phones
  • Devices are becoming common and pervasive
  • They are becoming increasingly powerful
    (processor, memory, power, resolution)
  • G3 networks promise multimedia support
  • They are easy to use
  • Devices go to the documents
  • Rapid and Flexible Acquisition
  • Acquisition becomes just another application of
    the device

21
How do they compare? (subjectively)
  • Scanner Camera
  • Resolution Adequate(?) Improving
  • 150-600dpi
  • Distortion Minimal Lens/Perspective
  • Lighting Controlled Sensor and
  • Environment
  • Background Domain Often Complex
  • Dependent
  • Zoom/Focus N/A Variable
  • Blur N/A Motion, focus
  • Noise Minimal Sensor

22
Are Digital Cameras being used for text?
  • Yes.
  • Active capture of information sources Paris 03
  • Note taking during presentations ICDAR 03
  • Japan Signs prohibit the use of digital cameras
    in bookstores! They are being used as portable
    photocopiers.
  • How about hardcopy documents?
  • Falcon MT system currently testing high
    resolution cameras for input to standard OCR
    systems
  • What are the challenges of imaging traditional
    documents?

23
Resolution and Large Documents
  • Related Work
  • Super-resolution
  • Irani (1991), Patti (1997), Capel (2000), Fekri
    (2000)
  • Mosaicing
  • Taylor (1997, 1999), Mirmehdi (2001),
  • State of the Art
  • Digital Cameras gt 6 megapixels (can provide
    effective 300dpi)
  • PDAs gt 1.3 megapixels
  • Mobile Phones 1 megapixel
  • But better cameras are on the way.
  • (4 megapixels phones by 2005)

24
Blur from Focus/Depth of Field
  • The imaging plane may not be parallel documents
    resulting in increased blur
  • Frequency domain strategy
  • Tsai (1984), Tom (1994), Kim (1990, 1993),
  • Iterative solution
  • Stark (1989), Tekalp (1992), Irani (1991),
  • Bayesian methods
  • Schultz (1994, 1995),

25
Lighting
  • Natural lighting can be uneven
  • Providing lighting can be challenging
  • Lighting correction
  • Global brightness / contrast
  • Uneven brightness
  • Adaptive thresholding
  • Too many to list

26
Motion Blur
  • Controlled with adequate lighting and shutter
    speed

27
Warping and Perspective Distortion
  • Completely arbitrary viewing angles may not be
    realistic however.
  • Remove perspective distortion of plane document
    pages
  • Clark (2000, 2001),
  • Unwarp curl pages using 3D shape
  • Brown (2001), Pilu (2001),

28
  • imaging plane is not guaranteed to be smooth
  • But we simply enhance results and use existing
    tools?

29
For controlled imaging.
From scanner 300 dpi
From camera 200 dpi
30
Commercial OCR
  • OCR is almost identical.

From scanner
From camera
31
More Typical Example
32
OCR Result
33
Simple Rectification
Original
Rectified
34
Better OCR Results
Original
Rectified
35
Simple Unwarping
  • Text line straightening
  • Unwarping book-spine type deformation

36
Curved Surface
37
OCR Result
t(,_.t catcgo'r'jZ(tN071 Cap("rinI,ClLt c'. et
,io-r1. "i,,tO 1 fl classes A hdd 10 zr
for n.tc goriz r (l f n(,-l'(Lll fSet '
illi111t of t11-'1epo7't 011, OUT
eaPc?'i,rn.ets ), 'J1r n ij11111/iii'
c(ItPy..i,(Lt-io?L of optically jfl address
the 1111119 l1fs' . the of feCts OCR errors
ifl(Ly )LL v(" Olt 11Et1t qI.d-at197,,Sionahi,
ty red ction. (Lnd c,teynr.izfL_ iTII1i d t, 1
eprt on 'uvaqs that cateyvri.zatiorf, Iff l.
f o recti,orL and rctrie11al cf
fectVCflF,ss. f11f1, help f,foJ _iiiCtion
38
Rough Text Extraction
Original
Edge Areas
Text Areas
Threshold Surface
39
Local Direction Detection
40
Cylindrical Model
41
Text Line Tracing
42
Straightened Text Lines
43
OCR Result
  • 4 successful text categorizataon, capcri'raent
    divides
  • tcrtual collection into pre-defined clu,.sses. A
    true
  • ,lirrgtsentatzve for eachh class is generally
    obtained
  • tiiryttg training of the ca.tcgorzzer.
  • jri this paper, Yale report on oa.r
    caperiro.erats o-n,
  • t,pivtrtg and categorization of optically
    recognized
  • (pr tartaerrts. In partzctalar. ure in'll address
    the is-
  • ,ie,s regarding the of fcct.s OCR, cr-ror.s m ay
    have on
  • 10itaing. dirraensionality reduction, gnd
    categoriza_
  • tip. lire further report on ways that,
    categorization
  • pla,Uhelp error correctlor( and retrieval
    effectiveness.

44
Line Extraction
  • Using Extrapolation with overlapping direction
    estimates

45
Improved OCR Result
  • A successful text categorization experiment
    divides
  • textual collection into pre-defined. classes. A
    true
  • presentative for each class is generally
    obtained
  • ing training of the categorizer.
  • In this paper, we report on our experiments on
  • joining and categorization of optically
    recognized
  • pcuments. In particular, we will address the is-
  • oes regarding the effects OCR errors may have on
  • joining, dimensionality reduction, and
    categoriza-
  • hon. We further- report on ways that
    categorization
  • moy help error correction and retrieval
    effectiveness.

46
Open Questions
  • Are existing tools good enough?
  • Can we simply enhance the images
  • or do we need to develop new tools.
  • or new constraints?
  • Can we make use of degradation knowledge?
  • apply constraints from clear parts of the
    document to recognize similar text blurred by
    perspective?

47
  • Will such devices replace scanners? No
  • Will they open up the market to new applications?
  • They already have
  • Integrated Information Services
  • Map locations
  • Tourist information
  • Promise of DIA on text captured with digital
    camera (business cards, nametags, pages of notes,
    )
  • Grand Challenges
  • Image Quality
  • Immediate feedback for processability
  • Moving processing to the device
  • Killer Applications

48
MyLifeLog
  • Record and Index anything and everything
  • Text from everywhere you have been
  • Identification of every document you have ever
    looked at (not necessarily read)
  • Recall of everything you have ever written
  • Is it possible?

49
IJDAR Special Issue
  • Special Issue on Camera Based Text and Document
    Recognition
  • Papers Due November 2003
  • http//ijdar.cfar.umd.edu/special_issues/TD-SI.htm
    l
Write a Comment
User Comments (0)
About PowerShow.com