Title: Camera Based Document Image Analysis
1Camera Based Document Image Analysis
- David Doermann
- University of Maryland, College Park
2What defines the problem?
- Traditional Document Analysis
- Deals primarily with paper representations
- Acquired with flatbed or sheetfed scanners
- Camera Based Analysis
- Clearly defined by the acquisition device, its
properties, the impacts of it use, etc. but - The devices open up a wide range of new and
interesting applications (and problems) and
extends what we may consider document analysis.
3Scanner Acquisition
- Advantages
- Reasonable quality
- Controlled lighting, high resolution, fixed
imaging plane - Rapid Acquisition
- Relatively cheap
- Disadvantages
- Specialized Device
- Fixed Documents must come to device
- Requires handing of documents or documents in a
sheet form for feeders
4 Book Camera/Scanner Acquisition
- One step removed from traditional scanners
- Controlled environment lighting, image plane,
orientation - Changes the nature of the content
- Easier to image atypical documents
- R are, Historic, fragile
- Often very expensive
- Can be relatively slow
- sheet scanners are hundreds of pages per minute
- although robotic cameras can image 10s of pages
a minute.
5Industrial Cameras
- Removes the constraints on configuration
- Often still in a controlled environment
- Custom (and expensive) solutions are common
- Processing power bounded only by cost
- Usage
- Postal applications, document inspection
(newspapers, etc), industrial applications
6(Portable) Digital Cameras
- Provide a much greater flexibility then scanners
- Multiple uses
- Devices goes to documents
- Potentially removes the bottleneck of acquisition
for simple tasks - fewer constraints (Lighting, image plane,
focus,) - increases complexity of resulting image and
image processing - yet allows a wider variation of applications
- A significant tradeoff
7A broader view of a document?
- Conceptually
- The realization of a visual language for
communicating information, typically meant for
human consumption - Practically
- Text, graphics, drawings, symbologies,
- Properties
- Exists over both space and time
- Relatively easy to produce
- Intended by the author to be read.
8Roadmap
- Discussion of some key related research and
applications of non-scanner DIA - Influences of mobile devices on applications
- Issues with processing traditional documents vs
processing text - Future of camera based capture.
- Open issues
- New opportunities
9What has been done?
- Applications primarily centered on Image Text
- Text in Video Graphics
- Text from WWW pages
- Text in Scenes
- Some work on key challenges
- Imaging of text in controlled environments such
as parking lots, meeting rooms, assembly lines,
etc - Limited work on actually processing traditional
documents.
10Video Text Recognition
- Indexing content from graphic or scene text in
videos used to supplement speech, closed
captions, - Countless papers published
- Challenges are well known
- Low Resolution, Complex background, Different
font style and size, Lighting, Camera motion,
Text/Object motion, Occlusion/distortion, all
magnified for scene text - Benefits of multiple frames, repeated content
11WWW Text Image Analysis
- Applications
- Identifying graphic text for indexing and
retrieval - Identification of SPAM email in attachments
- Uncovering hidden information.
- Issues
- Text style variations (font, font style,
orientation). - Text Quality (Color, Size, Anti-aliasing)
- Image resolution
12Visual Input
- Applications
- General input for computer systems
- Passive verification of signatures from cameras
mounted over the writing surface - Has general implications for mobile devices that
dont have traditional keyboard input - Challenges
- Pen tip tracking
- Identifying the temporal relations
- Online recognition
13Whiteboard Reading
- Reading handwritten and printed material for
meeting scenarios - Challenges
- Must deal with unconstrained handwriting
- Distinguish text from graphics and sketches
- Parse and Interpret graphics (electronic ink)
- Content can appear and disappear dynamically
produced
14Meeting and Lecture Processing
- Meetings
- Reading name plates and tags
- Identifying and linking references to documents
- Processing whiteboards
- Lectures
- Reading text on projected presentations
- Detection, normalization and matching of text
with source content (PowerPoint) - Challenges
- Variable content
- Animations
15License Plate Reading
- Applications
- Parking lot tracking
- Red-light and Speed Camera
- Vehicle Surveillance
- Challenges
- Moving Vehicles
- Complex plates
- Night and all weather imaging
- Limited use of context
16Road Sign Recognition
- Applications
- Driver Assisted Systems, Automated Mapping, Sign
guideline enforcement (location, quality) - Challenges
- Low resolution, motion blur
- Real-time systems
- Detecting signs under a variety of conditions
17Sign Recognition and Translation
- Application Integrated identification,
recognition and translation text found on
foreign signs, maps, menus, transportation
schedules, etc - Extremely useful for other character sets
- Primarily PDA or Mobile Phone Based Hardware
- Networked or Standalone solutions have are being
marketed - Ultimately software solutions are desirable.
18Systems for the Visually Impaired
- Allows legally blind consumers access to a
variety of information sources - Transportation, shopping,
- System builds end to end application of
detection, enhancement, recognition and speech
transcription
A. Zandifar, A. Chahine, R. Duraiswami and L.S.
Davis, A Video-based interface to textual
information for the visually
impaired , IEEE Computer Society ICMI 2002, pp
325-330.
19Commonalities
- Most of these systems can be/have been engineered
and with the right constraints more are
technically feasible but perhaps not cost
effective. - But what is the catalyst that will promote more
general applications? - Mobile devices and wireless networking are
providing a platform which no longer requires
special hardware
20Mobile Devices
- Examples PDAs, Digital Cameras, Cellular Phones
- Devices are becoming common and pervasive
- They are becoming increasingly powerful
(processor, memory, power, resolution) - G3 networks promise multimedia support
- They are easy to use
- Devices go to the documents
- Rapid and Flexible Acquisition
- Acquisition becomes just another application of
the device
21How do they compare? (subjectively)
- Scanner Camera
- Resolution Adequate(?) Improving
- 150-600dpi
- Distortion Minimal Lens/Perspective
- Lighting Controlled Sensor and
- Environment
- Background Domain Often Complex
- Dependent
- Zoom/Focus N/A Variable
- Blur N/A Motion, focus
- Noise Minimal Sensor
22Are Digital Cameras being used for text?
- Yes.
- Active capture of information sources Paris 03
- Note taking during presentations ICDAR 03
- Japan Signs prohibit the use of digital cameras
in bookstores! They are being used as portable
photocopiers. - How about hardcopy documents?
- Falcon MT system currently testing high
resolution cameras for input to standard OCR
systems - What are the challenges of imaging traditional
documents?
23Resolution and Large Documents
- Related Work
- Super-resolution
- Irani (1991), Patti (1997), Capel (2000), Fekri
(2000) - Mosaicing
- Taylor (1997, 1999), Mirmehdi (2001),
- State of the Art
- Digital Cameras gt 6 megapixels (can provide
effective 300dpi) - PDAs gt 1.3 megapixels
- Mobile Phones 1 megapixel
- But better cameras are on the way.
- (4 megapixels phones by 2005)
24Blur from Focus/Depth of Field
- The imaging plane may not be parallel documents
resulting in increased blur - Frequency domain strategy
- Tsai (1984), Tom (1994), Kim (1990, 1993),
- Iterative solution
- Stark (1989), Tekalp (1992), Irani (1991),
- Bayesian methods
- Schultz (1994, 1995),
25Lighting
- Natural lighting can be uneven
- Providing lighting can be challenging
- Lighting correction
- Global brightness / contrast
- Uneven brightness
- Adaptive thresholding
- Too many to list
26Motion Blur
- Controlled with adequate lighting and shutter
speed
27Warping and Perspective Distortion
- Completely arbitrary viewing angles may not be
realistic however. - Remove perspective distortion of plane document
pages - Clark (2000, 2001),
- Unwarp curl pages using 3D shape
- Brown (2001), Pilu (2001),
28- imaging plane is not guaranteed to be smooth
- But we simply enhance results and use existing
tools?
29For controlled imaging.
From scanner 300 dpi
From camera 200 dpi
30Commercial OCR
From scanner
From camera
31More Typical Example
32OCR Result
33Simple Rectification
Original
Rectified
34Better OCR Results
Original
Rectified
35Simple Unwarping
- Text line straightening
- Unwarping book-spine type deformation
36Curved Surface
37OCR Result
t(,_.t catcgo'r'jZ(tN071 Cap("rinI,ClLt c'. et
,io-r1. "i,,tO 1 fl classes A hdd 10 zr
for n.tc goriz r (l f n(,-l'(Lll fSet '
illi111t of t11-'1epo7't 011, OUT
eaPc?'i,rn.ets ), 'J1r n ij11111/iii'
c(ItPy..i,(Lt-io?L of optically jfl address
the 1111119 l1fs' . the of feCts OCR errors
ifl(Ly )LL v(" Olt 11Et1t qI.d-at197,,Sionahi,
ty red ction. (Lnd c,teynr.izfL_ iTII1i d t, 1
eprt on 'uvaqs that cateyvri.zatiorf, Iff l.
f o recti,orL and rctrie11al cf
fectVCflF,ss. f11f1, help f,foJ _iiiCtion
38Rough Text Extraction
Original
Edge Areas
Text Areas
Threshold Surface
39Local Direction Detection
40Cylindrical Model
41Text Line Tracing
42Straightened Text Lines
43OCR Result
- 4 successful text categorizataon, capcri'raent
divides - tcrtual collection into pre-defined clu,.sses. A
true - ,lirrgtsentatzve for eachh class is generally
obtained - tiiryttg training of the ca.tcgorzzer.
- jri this paper, Yale report on oa.r
caperiro.erats o-n, - t,pivtrtg and categorization of optically
recognized - (pr tartaerrts. In partzctalar. ure in'll address
the is- - ,ie,s regarding the of fcct.s OCR, cr-ror.s m ay
have on - 10itaing. dirraensionality reduction, gnd
categoriza_ - tip. lire further report on ways that,
categorization - pla,Uhelp error correctlor( and retrieval
effectiveness.
44Line Extraction
- Using Extrapolation with overlapping direction
estimates
45Improved OCR Result
- A successful text categorization experiment
divides - textual collection into pre-defined. classes. A
true - presentative for each class is generally
obtained - ing training of the categorizer.
- In this paper, we report on our experiments on
- joining and categorization of optically
recognized - pcuments. In particular, we will address the is-
- oes regarding the effects OCR errors may have on
- joining, dimensionality reduction, and
categoriza- - hon. We further- report on ways that
categorization - moy help error correction and retrieval
effectiveness.
46Open Questions
- Are existing tools good enough?
- Can we simply enhance the images
- or do we need to develop new tools.
- or new constraints?
- Can we make use of degradation knowledge?
- apply constraints from clear parts of the
document to recognize similar text blurred by
perspective?
47- Will such devices replace scanners? No
- Will they open up the market to new applications?
- They already have
- Integrated Information Services
- Map locations
- Tourist information
- Promise of DIA on text captured with digital
camera (business cards, nametags, pages of notes,
) - Grand Challenges
- Image Quality
- Immediate feedback for processability
- Moving processing to the device
- Killer Applications
48MyLifeLog
- Record and Index anything and everything
- Text from everywhere you have been
- Identification of every document you have ever
looked at (not necessarily read) - Recall of everything you have ever written
- Is it possible?
49IJDAR Special Issue
- Special Issue on Camera Based Text and Document
Recognition - Papers Due November 2003
- http//ijdar.cfar.umd.edu/special_issues/TD-SI.htm
l