Document Analysis Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Document Analysis Systems

Description:

Document Analysis Systems. Bedola Roberto, Bordoni Davide, Franc Vojtech ... Extraction of relevant information from documents (letters, forms, engineering ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 17
Provided by: use579
Category:

less

Transcript and Presenter's Notes

Title: Document Analysis Systems


1
Document Analysis Systems
Bedola Roberto, Bordoni Davide, Franc
Vojtech Supervised by Luca Lombardi
  • Overview
  • Introduction
  • Our implementation of Wahl et al framework
  • Multiresolution Approach for page segmentation
  • Conclusions

2
www.ip2001.5u.com
  • Id ip2001.5u.com
  • Pass 2001ip

3
Introduction
  • Document understanding
  • Extraction of relevant information from documents
    (letters, forms, engineering drawings, etc. )

Graphics Processing
Digitalized Image
Page Segmentation
Textual Processing (OCR)
4
Wahl Our implementation
Segmentation
Connected Component Extraction
Region merging
Block Characterization
Text extraction
5
Example 1
After binarization
After segmentation
Original doc.
6
Example 2
Labelled image Before merging
Labelled image After merging
Classified image
7
Multiresolution
  • Principles
  • The use of various filter permit to find the
    different zones.
  • Techniques
  • The use of some parameter permit to cut some
    pixel.

8
Multiresolution
9
Mean Variance
  • Mean
  • Variance

10
Background Condition
  • We can find the background
  • 240 lt Mean lt 255
  • 225 lt Mean lt 255 image with noise
  • 0 lt Variance lt 15
  • 0 lt Variance lt 25 image with noise

11
Median Threshold
  • Median
  • Threshold

12
Image and Graphics
  • 4 steps
  • Middletone Segmentation
  • Median filter
  • Threshold filter
  • Pixel counting
  • If threshold ? 170 ? counter
  • Classification
  • If counter/area ? 0.7 ? Graphic
  • else ? Pictures

13
Text
  • 3 steps
  • Segmentation of mean image
  • Mean filter
  • Pixel text counting
  • If 240? mean ? 210 variance ? 10 ? counter
  • Classification
  • If counter/area ? 0.7 ? Text
  • else if counter/area ? 0.3 ? analyse a more
    detailed image
  • else ? Unclassifiable

14
Experimental Results
15
Conclusions
  • Wahl et al. method
  • Simple and accurate, but slower.
  • We finish to implement this method
  • Multiresolution
  • Complex and faster, but need many parameters
  • Only partial implementation

16
Goodbay
Meat or fish
Write a Comment
User Comments (0)
About PowerShow.com