Document Examiner Feature Extraction: Thinned vs Skeletonised Images - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Document Examiner Feature Extraction: Thinned vs Skeletonised Images

Description:

Example of variation in letter formation styles in 10 letters from 9 different writers. ... 100 samples of grapheme 'th' drawn from 20 different writers ... – PowerPoint PPT presentation

Number of Views:105

Avg rating:3.0/5.0

Slides: 18

Provided by: ntu1

Category:

more less

Transcript and Presenter's Notes

Title: Document Examiner Feature Extraction: Thinned vs Skeletonised Images

1
Document Examiner Feature Extraction Thinned vs
Skeletonised Images

Vladimir Pervouchine and Graham Leedham

Forensics and Security Laboratory School of
Computer Engineering Nanyang Technological
University Singapore
2
Outline

Forensic handwriting examination
The need for accurate stroke extraction
Thinning based method
Vector skeletonisation method
Feature extraction
From thinned images
From vector skeletons
Writer classification method
Results
Conclusions

3
Variation of the word the written by 8
different writers. Source Harrison, 1981
Forensic handwriting examination
4
Forensic handwriting examination

Variation of the letters G and R written by
15 different writers.
Source Harrison, 1981

5
Forensic handwriting examination

Example of variation in letter formation styles
in 10 letters from 9 different writers.
Source Harrison, 1981

6
Current Methods used by Forensic Document
Examiners

Primarily involves manual extraction and
comparison of various global and local visible
features.
They are usually doing a comparison test between
a Questioned Document and a set of Known
Documents.
The objective is to determine whether the
Questioned Document was, or was not, written by
a particular individual.
The Questioned Document may be in disguised
handwriting.

7
Forgery / Disguise / Alteration

Is the writing GENUINE? (the author is who he
claims to be)
Is the writing FORGED? (the author is not who he
claims to be and is attempting to assert the
writing is the same as someone elses) or
Is the writing DISGUISED? (the author wishes to
deny doing the writing at a later date) or
Is the writing ALTERED? (Has someone modified or
altered the original document?)

8
Extraction of handwritten strokes from images

Forensic document examiners analyse the pen tip
trajectory
The trajectory is not readily available from the
grayscale handwriting images
To mimic extraction of document examiner features
it is necessary to approximate pen trajectory
We need to preserve individual information in
character shapes
Many algorithms have been proposed for a similar
problem in offline handwriting recognition, but
they do not need to preserve the individual
traits of characters

9
Thinning based stroke approximation
Original image

Matlab Image Processing toolbox thinning (Zhang
and Suen thinning algorithm) is used for the
first approximation
Post processing is applied to
remove extra branches
remove spurious loops
remove small connected components
Feature extraction attempts to overcome remaining
artifacts

Binarisation
Thinning
Remove small connected components
Find junction points
Find end points
Correct spurious loops
While changes are made
Prune short branches
10
Thinning based stroke approximation
11
Vector skeletonisation method
Original image

1st stage vectorisation. Spline-approximated
skeletal branches are formed
2nd stage minimum cost configuration of branch
interconnections is found. Branches are grouped
into strokes
For each retraced segment of stroke restoration
of hidden loop is attempted
3rd stage Near-junction and loop spline knots
are adjusted to make strokes smoother

Vectorisation
Binary encoding of junction points configuration
GA optimisation to find configuration with lowest
cost
Adjustment of loop and near-junction knots
12
Vector skeletonisation method
3. Strokes with retraced segments and loops
13
Feature extraction list of features

Features extracted from both raster and vector
skeletons
Height
Width
Height to width ratio
Distance HC
Distance TC
Distance TH
Angle between TH and TC
Slant of stem of t
Slant of stem of h
Position of t-bar
Connected/disconnected t and h
Average stroke width
Average pseudo-pressure
Standard deviation of average pseudo-pressure

Features extracted from vector skeleton only
Standard deviation of stroke width
Number of strokes
Number of loops and retraced branches
Straightness of t-stem
Straightness of t-bar
Straightness of h-stem
Presence of loop at top of t-stem
Presence of loop at top of h-stem
Maximum curvature of h-knee
Average curvature of h-knee
Relative size (diameter) of h-knee

14
Feature extraction

Position of t-bar feature is binary 1 if t-bar
crosses stem and 0 if touches or is separated or
missing
Size of h-knee is measured parallel to a
horizontal line
Pseudo-pressure is measured as the gray level
normalised to 1.
Straightness is measured as the ratio of the
stroke length to the distance between its ends

h-knee
t-bar
t-stem
h-stem
15
Writer classification scheme

Constructive ANN with spherical threshold units
(DistAl) was used as classifier
100 samples of grapheme th drawn from 20
different writers
5-fold cross-validation method is used to
evaluate classification accuracy
Three experiments
Original feature set (features 1-14), features
extracted using raster skeleton
Original feature set, features extracted using
vector skeleton
Extended feature set (features 1-25),features
extracted from vector skeleton
Additionally, accuracy of feature extraction was
measured

16
Results accuracy of feature extraction

Extraction software performed analysis of shape
to detect various parts of character
Analysis was performed step by step
At each step some feature was extracted
If at least one feature was not extracted or
extracted incorrectly, the sample was counted as
failure

Input original image, binarised image, skeleton
Feature vector
Height, width, height to width ratio
Analysis of branches originating from top end
points
Stem features
Search for t-bar
17
Results accuracy of writer classification

Conclusions
Use of vector skeleton results in less feature
extraction failures
Use of vector skeleton produces higher writer
classification accuracy even on the same feature
set this indicates that feature values are
measured more accurately
Vector skeletonisation enables extraction of more
structural features, which, in turn, increases
writer classification accuracy