Document Examiner Feature Extraction: Thinned vs Skeletonised Images - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Document Examiner Feature Extraction: Thinned vs Skeletonised Images

Description:

Example of variation in letter formation styles in 10 letters from 9 different writers. ... 100 samples of grapheme 'th' drawn from 20 different writers ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 18
Provided by: ntu1
Category:

less

Transcript and Presenter's Notes

Title: Document Examiner Feature Extraction: Thinned vs Skeletonised Images


1
Document Examiner Feature Extraction Thinned vs
Skeletonised Images
  • Vladimir Pervouchine and Graham Leedham

Forensics and Security Laboratory School of
Computer Engineering Nanyang Technological
University Singapore
2
Outline
  • Forensic handwriting examination
  • The need for accurate stroke extraction
  • Thinning based method
  • Vector skeletonisation method
  • Feature extraction
  • From thinned images
  • From vector skeletons
  • Writer classification method
  • Results
  • Conclusions

3
Variation of the word the written by 8
different writers. Source Harrison, 1981
Forensic handwriting examination
4
Forensic handwriting examination
  • Variation of the letters G and R written by
    15 different writers.
  • Source Harrison, 1981

5
Forensic handwriting examination
  • Example of variation in letter formation styles
    in 10 letters from 9 different writers.
  • Source Harrison, 1981

6
Current Methods used by Forensic Document
Examiners
  • Primarily involves manual extraction and
    comparison of various global and local visible
    features.
  • They are usually doing a comparison test between
    a Questioned Document and a set of Known
    Documents.
  • The objective is to determine whether the
    Questioned Document was, or was not, written by
    a particular individual.
  • The Questioned Document may be in disguised
    handwriting.

7
Forgery / Disguise / Alteration
  • Is the writing GENUINE? (the author is who he
    claims to be)
  • Is the writing FORGED? (the author is not who he
    claims to be and is attempting to assert the
    writing is the same as someone elses) or
  • Is the writing DISGUISED? (the author wishes to
    deny doing the writing at a later date) or
  • Is the writing ALTERED? (Has someone modified or
    altered the original document?)

8
Extraction of handwritten strokes from images
  • Forensic document examiners analyse the pen tip
    trajectory
  • The trajectory is not readily available from the
    grayscale handwriting images
  • To mimic extraction of document examiner features
    it is necessary to approximate pen trajectory
  • We need to preserve individual information in
    character shapes
  • Many algorithms have been proposed for a similar
    problem in offline handwriting recognition, but
    they do not need to preserve the individual
    traits of characters

9
Thinning based stroke approximation
Original image
  • Matlab Image Processing toolbox thinning (Zhang
    and Suen thinning algorithm) is used for the
    first approximation
  • Post processing is applied to
  • remove extra branches
  • remove spurious loops
  • remove small connected components
  • Feature extraction attempts to overcome remaining
    artifacts

Binarisation
Thinning
Remove small connected components
Find junction points
Find end points
Correct spurious loops
While changes are made
Prune short branches
10
Thinning based stroke approximation
11
Vector skeletonisation method
Original image
  • 1st stage vectorisation. Spline-approximated
    skeletal branches are formed
  • 2nd stage minimum cost configuration of branch
    interconnections is found. Branches are grouped
    into strokes
  • For each retraced segment of stroke restoration
    of hidden loop is attempted
  • 3rd stage Near-junction and loop spline knots
    are adjusted to make strokes smoother

Vectorisation
Binary encoding of junction points configuration
GA optimisation to find configuration with lowest
cost
Adjustment of loop and near-junction knots
12
Vector skeletonisation method
3. Strokes with retraced segments and loops
13
Feature extraction list of features
  • Features extracted from both raster and vector
    skeletons
  • Height
  • Width
  • Height to width ratio
  • Distance HC
  • Distance TC
  • Distance TH
  • Angle between TH and TC
  • Slant of stem of t
  • Slant of stem of h
  • Position of t-bar
  • Connected/disconnected t and h
  • Average stroke width
  • Average pseudo-pressure
  • Standard deviation of average pseudo-pressure
  • Features extracted from vector skeleton only
  • Standard deviation of stroke width
  • Number of strokes
  • Number of loops and retraced branches
  • Straightness of t-stem
  • Straightness of t-bar
  • Straightness of h-stem
  • Presence of loop at top of t-stem
  • Presence of loop at top of h-stem
  • Maximum curvature of h-knee
  • Average curvature of h-knee
  • Relative size (diameter) of h-knee

14
Feature extraction
  • Position of t-bar feature is binary 1 if t-bar
    crosses stem and 0 if touches or is separated or
    missing
  • Size of h-knee is measured parallel to a
    horizontal line
  • Pseudo-pressure is measured as the gray level
    normalised to 1.
  • Straightness is measured as the ratio of the
    stroke length to the distance between its ends

h-knee
t-bar
t-stem
h-stem
15
Writer classification scheme
  • Constructive ANN with spherical threshold units
    (DistAl) was used as classifier
  • 100 samples of grapheme th drawn from 20
    different writers
  • 5-fold cross-validation method is used to
    evaluate classification accuracy
  • Three experiments
  • Original feature set (features 1-14), features
    extracted using raster skeleton
  • Original feature set, features extracted using
    vector skeleton
  • Extended feature set (features 1-25),features
    extracted from vector skeleton
  • Additionally, accuracy of feature extraction was
    measured

16
Results accuracy of feature extraction
  • Extraction software performed analysis of shape
    to detect various parts of character
  • Analysis was performed step by step
  • At each step some feature was extracted
  • If at least one feature was not extracted or
    extracted incorrectly, the sample was counted as
    failure

Input original image, binarised image, skeleton
Feature vector
Height, width, height to width ratio
Analysis of branches originating from top end
points
Stem features
Search for t-bar
17
Results accuracy of writer classification
  • Conclusions
  • Use of vector skeleton results in less feature
    extraction failures
  • Use of vector skeleton produces higher writer
    classification accuracy even on the same feature
    set this indicates that feature values are
    measured more accurately
  • Vector skeletonisation enables extraction of more
    structural features, which, in turn, increases
    writer classification accuracy
Write a Comment
User Comments (0)
About PowerShow.com