Document Image Analysis for Digital Libraries - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Document Image Analysis for Digital Libraries

Description:

Document Image Analysis for Digital Libraries – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 81
Provided by: manidi
Category:

less

Transcript and Presenter's Notes

Title: Document Image Analysis for Digital Libraries


1
Document Image Analysis for Digital Libraries
  • Prateek Sarkar
  • Perceptual Document Analysis
  • Palo Alto Research Center
  • California, USA

2
Palo Alto Research Center (PARC)
  • 1970s
  • First laser printer (Xerox)
  • First object-oriented programming language,
    Smalltalk (ParcPlace)
  • Personal distributed computing/client server
    architecture
  • Ethernet, graphical user interface, and pop-up
    menus (Xerox, Apple)
  • Page Description Languages (Adobe), Bravo text
    editor (MSWord)
  • DFB solid state diode laser (SDL)
  • 1980s
  • Optical LAN (Synoptics), document management
    software (Documentum)
  • Linguistic technology in spell checkers
    (Microlytics)
  • Collective programming/multi-user domains
    invented
  • Liveboard (LiveWorks), information visualization
    (InXight)
  • Multibeam lasers (Xerox), a-Si TFTs in displays,
    medical imagers (dpiX)
  • 1990s
  • Encryption (Semaphore)
  • Ethnographic studies of field service (Xerox)
  • Constraint-based scheduler (Xerox)

3
PARC Research Labs
  • Computing Sciences
  • Ad Hoc and Sensor Networks Security Privacy
    Interoperability Ubiquitous Computing
    Applications User-centered Design Workscapes
    Organizations
  • Intelligent Systems
  • Embedded Reasoning Human Information
    Interaction Natural Language Processing
    Perceptual Document Analysis User Interfaces
    Visualization
  • Electronic Materials and Devices
  • Class III-V Materials Large Area Electronics
    MEMS Optoelectronics Optical Detectors
  • Hardware Systems
  • Biomedical Systems Biodefense Particle
    Manipulation Piezo Materials and Devices
    Printing Systems Renewable Energy

4
Perceptual document analysis
  • Develop probabilistic modeling frameworks for
    perceptual analysis of documents

5
(No Transcript)
6
21 300 92 9.00-03
SALE INVOICE 1NNoICE NO
INCE DATE DUE DATE
FAX TO RILL TO REMIT
TO ACCOUNT NO
PO
NUMBER LOCATION OF UNITS SAME AS ABOVE
UNIT NO. SERIAL NO. 26.361.00 UNIT
NO. SERIAL NO 35.351.00 DOWN
PAYMENT BUILDING DELIVERY BUILDING DELIVERY BLOCK
AND LEVEL BLOCKAND LEVEL ANCHORMIEDOWN DECKING
ELECTRICAL PLUMBING INSTALLATION SITE MANAGEMENT
SKIRTING VINYL 0.00 0.00 400.00 0.00 2,100.00
780,00 960,00 1.350 00 3.925 00 1,100.00 1,300.00
TOTAL DUE THIS INVOICE 63,767.00
7
(No Transcript)
8
Document imaging services
  • Xerox document image scanning centres
  • Americas, Europe, Asia
  • Millions of pages scanned everyday
  • These images need
  • Indexing
  • Storage
  • Classification and sorting
  • Functional Role Labeling
  • and much more

9
Document image libraries subtasks
  • Document layout understanding
  • Character recognition
  • Functional role labeling
  • Image cleanup/enhancement
  • Indexing
  • Organizing
  • Restructuring
  • Summarizing
  • Cross-linking
  • Redaction
  • Privacy management
  • Distribution
  • Interaction searching, browsing, learning,
    annotation, authoring, publishing

10
PARC research on Digital libraries
  • Digital repositories by intent
  • Information dissemination
  • Record keeping
  • Institutional document repositories/archives
  • Personal document collections
  • Collaborative collections

11
PARC research on Digital Libraries
  • Digital Libraries by content
  • Textual
  • Structured/semi-structured (databases, email)
  • Images with symbolic content
  • Images with natural content
  • Music
  • Video

12
Robust OCR Using Fisher Kernels
DICE Document Image Classification Engine
Provably optimal Functional Role Labeling
Contour labeling Using MRFs
SenseMaking
Ubitext
UpLib Personal DL
Beyond text and keywords
13
Provably optimal algorithm for Functional Role
Labeling
14
(No Transcript)
15
21 300 92 9.00-03
SALE INVOICE 1NNoICE NO
INCE DATE DUE DATE
FAX TO RILL TO REMIT
TO ACCOUNT NO
PO
NUMBER LOCATION OF UNITS SAME AS ABOVE
UNIT NO. SERIAL NO. 26.361.00 UNIT
NO. SERIAL NO 35.351.00 DOWN
PAYMENT BUILDING DELIVERY BUILDING DELIVERY BLOCK
AND LEVEL BLOCKAND LEVEL ANCHORMIEDOWN DECKING
ELECTRICAL PLUMBING INSTALLATION SITE MANAGEMENT
SKIRTING VINYL 0.00 0.00 400.00 0.00 2,100.00
780,00 960,00 1.350 00 3.925 00 1,100.00 1,300.00
TOTAL DUE THIS INVOICE 63,767.00
16
(No Transcript)
17
Functional Role Labeling
18
Functional Role Labeling
  • Identify salient groups in images and label them
  • Morphological cues
  • Proximity, color similarity
  • Perceptual cues
  • Alignments, curvilinear continuity, closure
  • Semantic cues
  • Semantic validation of a grouping hypothesis

19
Functional Role Labeling
20
Functional Role Labeling - Why it is a
difficult problem
21
Table parsing
22
Table parsing
Vertical Alignment
23
Table parsing
Symmetry
24
Table parsing
25
Table parsing
Horizontal Alignment (revisited)
26
Image Parsing
  • Parsing images unlike text parsing
  • No natural ordering in 2D
  • Segmentation or grouping is intractable
  • Polynomial complexity with restrictions
  • Factored HMMs, XY-grammars

27
Image parsing
Simplify Let perceptual grouping principles
generate grouping hypothesis. Focus on the
labeling problem.
28
The labeling problem
Labels
Groups/Regions/Segments
29
(No Transcript)
30
The labeling problem - individual labeling
  • Individual labeling (simple case)
  • Computational complexity CN
  • Number of objects to label N
  • Number of labels C

31
The labeling problem - joint labeling
32
Classification with complex joints
  • The joint distribution is a complex mess with all
    kinds of dependence factors.
  • The general problem to solve
  • (y1, y2 yN) argmax P(y1yN, x1xN)
  • Have to search over CN interpretations
  • Compare to simple case
  • (y1, y2 yN) argmax P(y1,x1)P(y2,x2)P(yN,xN)

33
A search factorizable bounds
  • Identify a factorizable upper bound
  • The general problem to solve
  • f(y1,x1)f(y2,x2)f(yN,xN) gt P(y1yN, x1xN)
  • Sort interpretations by upper bound (easy!)
  • Evaluate an interpretation only if its upper
    bound is higher than best interpretation so far.
  • Excellent average case performance.

34
Sorting by the upper bound
F(y1,y2,,yN) f(1c1) . f(2 c2) . f(N cN)
  • (C,B) (B,B) (C,A)
  • (B,B) (C,A) (A,B) (B,A)
  • (C,A) (A,B) (B,A) (C,C)

35
Drawing titles are close to drawing numbers
Drawing numbers are visually salient wrt
surroundings
There is at most one drawing number on a sheet.
36
Variable format engineering drawings examples
37
Variable format engineering drawingsexample
38
Variable format engineering drawings segmentation
39
Variable format engineering drawings feature
measurements
Mutual exclusion constraint
40
Variable format engineering drawings results
  • CFive labels of interest
  • DN, DT, DNI, DTI, Other
  • N50 to 80 text-regions to be labeled
  • Number of possible labelings 504
  • 1000 images tested
  • Drawing number correctly identified in about 80
    of cases
  • Errors mostly OCRsegmentationgtlabeling

41
Variable format engineering drawings results
  • Typically only 200-1000 of 504 label hypothesis
    are explored
  • No heuristic pruning of search space
  • Best labeling guaranteed
  • Application benefits
  • Faster search
  • Research benefits
  • Enables complex model design
  • If in trouble, fix the models, not the search

42
Document Image Classification Engine DICE
43
Visual classification of document images
44
Document Image Classification
  • Goal Sort and classify diverse scanned and
    electronic documents into generic categories and
    specific known templates.

Document Classification Module
Document category database
Text-based features
OCR
Electronic documents
Layout analysis
Layout-based features
Classifier engine
Paper documents
Free text analysis redaction
Letters
Statements
Acct. anonymization
Example Processing Architecture
Personal information filtering
Forms
Slide Courtesy Eric Saund, PARC
45
Existing methods
  • Template matching
  • Universal
  • Explicit search over space of variations
  • High level features
  • Layout-based features
  • Dubiel, Dengel. FormClas. (DAS, 1998)
  • Hu, Kashi, Wilfong. (ICDAR, 1999)
  • Shin, Doermann, Rosenfeld (IJDAR, 2001)
  • OCR followed by text categorization
  • Feature extraction is a bottleneck
  • Key-feature based methods
  • Rule based, design by trial and error
  • Special mention
  • Diligenti et al., Hidden Tree Markov Models. PAMI
    2003

46
Expected within class variability
47
Expected within class variability
48
Our approach key ideas
  • Compute low-level visual-features on document
    image
  • e.g., locations of large intensity variations
  • A document image produces a scatter-plot of these
    features
  • A document-image category is represented by a
    probability distribution that would generate such
    a scatter plot

49
Choice of features
  • Haar filter features
  • 6 FilterTypes
  • 100 different window sizes
  • Each filter is applied to an image
  • at multiple scales
  • at all locations of an image.
  • If filter response exceeds preset thresholds a
    5-dimensional feature fires.
  • FeatureType, log(filterWidth),
    log(filterHeight), x, y

f
w
h
x
y
50
A document image as a scatter-plot
51
Choice of probability model
  • Latent Conditional Independence
  • p(x1, x2, , xn) Sk p(k) p(x1k) p(x2k)
    p(xnk)
  • (generative model)

52
Latent Conditional Independence
p(x,y)Sk p(k) p(xk) p(yk)
p(k)
x
y
p(xk)
k
p(yk)
Nm
53
Latent Conditional Independence (LCI)
54
LCI model one per image category
p(kc)
c
p(fk,c)
p(wk,c)
f
w
h
x
y
p(hk,c)
p(xk,c)
k
n1..Nm
p(yk,c)
m1..M
55
Per category LCI model training
Use EM algorithm for parameter estimation
p(kc)
c
p(fk,c)
p(wk,c)
f
w
h
x
y
p(hk,c)
p(xk,c)
k
n1..Nm
p(yk,c)
m1..M
56
Per category LCI model testing
Use Maximum Likelihood classification
p(kc)
c
p(fk,c)
p(wk,c)
f
w
h
x
y
p(hk,c)
p(xk,c)
k
n1..Nm
p(yk,c)
m1..M
57
NIST Database of Tax Forms
20 form categories
5590 images with category labels
58
NIST Tax Forms Data
  • Fixed layout forms
  • Various degrees of markup (filling in)
  • Small translations and skew
  • Lightness/Darkness variations
  • Obscured corners

59
Example feature distributions
60
Experiments on NIST forms data
  • 5-dim Haar filter features
  • FeatureType, log(filterWidth),
    log(filterHeight), x, y
  • One feature dimension is discrete
  • 10,000 features for each image
  • Train on 10 images per category
  • K100 components
  • 15 EM iterations
  • 10 classification errors in 5390 test samples.
  • 1-1.5 sec per image (US Letter, 300dpi)
  • Java implementation, 3GHz Pentium

ML Classification
Feature extraction
61
Looking forward
  • 7 out of 10 errors were on a single category
  • K100 for that category was overkill
  • Automatically identify K or adopt a
    non-parametric model
  • Also applied to telling first page from second
    page of journal articles
  • Non-rigid layout categories require deformable
    models
  • Handled within the same graphical models
    framework
  • Incorporating appropriate hidden variables
  • Parts models, Displacement models
  • Acknowledgments Mithun Das Gupta, Univ. Illinois

Csurka et al. ECCV (2004) Sudderth et al. CVPR
(2005)
62
Clustering by appearance
Use EM algorithm for parameter estimation
p(kc)
c
c
p(fk,c)
p(wk,c)
f
w
h
x
y
p(hk,c)
p(xk,c)
k
n1..Nm
p(yk,c)
m1..M
63
Robust OCR Using Fisher Kernels
DICE Document Image Classification Engine
Provably optimal Functional Role Labeling
Contour labeling Using MRFs
SenseMaking
Ubitext
UpLib Personal DL
Beyond text and keywords
64
My personal digital Library in UpLib
65
Images I keep in my UpLib
  • Articles I read
  • My notes on paper
  • Letters from family and friends
  • Receipts, bills and copies of paperwork
  • Family medical prescriptions
  • Cartoons I like
  • Sudoku-s I enjoyed
  • Artwork of my four-year old daughter

66
Robust OCR Using Fisher Kernels
DICE Document Image Classification Engine
Provably optimal Functional Role Labeling
Contour labeling Using MRFs
SenseMaking
Ubitext
UpLib Personal DL
Beyond text and keywords
67
Fisher kernels Robust OCR
OCR Independent pixel noise does not
matter Test your system on correlated noise
(interference)
68
Fisher kernels Robust OCR
  • Generative models
  • Learn p(features class) from data
  • Compute p(class features) using Bayes rule
  • Modular, composable
  • Rejection criterion
  • Discriminative models
  • Learn p(class features) directly from data
  • Learn the boundaries only
  • Often, lower error rate with small samples

69
Fisher kernels Robust OCR
70
Fisher kernels Robust OCR
  • Fisher kernel
  • Observation likelihood
  • Maximum likelihood parameter estimate
  • Fisher kernel

71
Fisher kernels Robust OCR
72
Fisher kernels Robust OCR
73
The Bit Flip Model (DID)
  • For glyph c of size (w, h)

74
Fisher kernels Robust OCR
ABBYY
Fisher
DID
75
Ubitext Paper to PDA
76
Degree of interest tree
77
ScentIndex
  • End of book indices are carefully designed by
    author/editor/publisher
  • Make use of this index to provide an active
    dynamic clickabel interface to users
  • On search, show search terms as well as related
    terms in rearranged clickable index.

78
Digital library content
  • Text, email, web-history, pdf
  • Photos, Music, Audio, Video, Movies
  • Addresses, bookmarks, calendars, quick notes
  • Recipes, travelogues, manuals, org-charts,
    genealogy
  • Computer code
  • Graphs and charts
  • Data astronomical, genomic, medical,
    geopolitical,
  • Reference dictionaries, encyclopedia, handbooks,
  • Design VLSI, airplane, building architecture,
    fashion,
  • Collaborative artifacts tags, annotations,
    ratings, links,

79
What do we want beyond search
  • User interface
  • Sub-second interactions
  • visual attention
  • Real-time interactions
  • search responsiveness
  • incremental query building
  • Long-term interaction
  • Support for information foraging
  • User modeling
  • Knowledge tools (annotation, content authoring,
    excerpting, collaborative tagging)

80
What do we want beyond search
  • The lay-of-the-land
  • Browsing
  • Search by scent and example
  • Knowledge discovery tools and aids
Write a Comment
User Comments (0)
About PowerShow.com