Title: Towards Performance Evaluation of Symbol Recognition
1Towards Performance Evaluation of Symbol
Recognition Spotting Systems in a Localization
Context
- Mathieu Delalandre
- CVC, Barcelona, Spain
- EuroMed Meeting
- LORIA, Nancy city, France
- Monday 18th of May 2009
2Introduction
3Introduction
Performance evaluation Information Retrieval
Salton1992, Computer Vision Thacker2005, CBIR
Muller2001, DIA Haralick2000
Case of symbol recognition spotting
Ezra2008Delalandre2008
Training data
System
Groundtruthing
Characterisation
Performance evaluation
4Plan
- Groundtruth and test documents
- Performance characterization
- Conclusions and perspectives
5Groundtruth and test documents Overview of
approaches
1. Overview of approaches 2. Existing datasets
Speed Realism Reliability Symbol Connected Noise
Dosch06 - - - many yes no
Yan04 - - - many yes no
Rusinol09 - -- many yes no
Aksoy00 - - many no yes
Zhai03 - - one no yes
Valveny07 - - one no yes
Delalandre08 many yes no
- - weak good
real approach
synthetic approach
6Groundtruth and test documents Overview of
approaches
1. Overview of approaches 2. Existing datasets
Speed Realism Reliability Symbol Connected Noise
Dosch06 - - - many yes no
Yan04 - - - many yes no
Rusinol09 - -- many yes no
Aksoy00 - - many no yes
Zhai03 - - one no yes
Valveny07 - - one no yes
Delalandre08 many yes no
- - weak good
real approach
synthetic approach
7Groundtruth and test documents Overview of
approaches
1. Overview of approaches 2. Existing datasets
Delalandre2008
Speed Realism Reliability Symbol Connected Noise
Dosch06 - - - many yes no
Yan04 - - - many yes no
Rusinol09 - -- many yes no
Aksoy00 - - many no yes
Zhai03 - - one no yes
Valveny07 - - one no yes
Delalandre08 many yes no
- - weak good
real approach
To use a same background layer with different
symbol layers
synthetic approach
8Groundtruth and test documents Overview of
approaches
1. Overview of approaches 2. Existing datasets
Delalandre2008
Speed Realism Reliability Symbol Connected Noise
Dosch06 - - - many yes no
Yan04 - - - many yes no
Rusinol09 - -- many yes no
Aksoy00 - - many no yes
Zhai03 - - one no yes
Valveny07 - - one no yes
Delalandre08 many yes no
- - weak good
real approach
synthetic approach
9Groundtruth and test documents Overview of
approaches
1. Overview of approaches 2. Existing datasets
Delalandre2008
Speed Realism Reliability Symbol Connected Noise
Dosch06 - - - many yes no
Yan04 - - - many yes no
Rusinol09 - -- many yes no
Aksoy00 - - many no yes
Zhai03 - - one no yes
Valveny07 - - one no yes
Delalandre08 many yes no
- - weak good
real approach
synthetic approach
10Groundtruth and test documents Existing datasets
1. Overview of approaches 2. Existing datasets
datasets images symbols degradations models
GREC03 30 3000 3000 10 5-50
GREC05 16 1000 1000 6 25-150
GREC07 6 2100 2100 6 50-150
ICPR00 9 450 11250 9 25
bags 16 1600 15046 none 25-150
floorplans 10 1000 26830 none 16
diagrams 10 1000 14100 none 21
queries 6 6000 6000 none 16-21
Rusinol09 1 42 344 none 38
GREC
ICPR
SESYD
Others
11Groundtruth and test documents Existing datasets
1. Overview of approaches 2. Existing datasets
datasets images symbols degradations models
GREC03 30 3000 3000 10 5-50
GREC05 16 1000 1000 6 25-150
GREC07 6 2100 2100 6 50-150
ICPR00 9 450 11250 9 25
bags 16 1600 15046 none 25-150
floorplans 10 1000 26830 none 16
diagrams 10 1000 14100 none 21
queries 6 6000 6000 none 16-21
Rusinol09 1 42 344 none 38
GREC
ICPR
SESYD
Others
12Groundtruth and test documents Existing datasets
1. Overview of approaches 2. Existing datasets
datasets images symbols degradations models
GREC03 30 3000 3000 10 5-50
GREC05 16 1000 1000 6 25-150
GREC07 6 2100 2100 6 50-150
ICPR00 9 450 11250 9 25
bags 16 1600 15046 none 25-150
floorplans 10 1000 26830 none 16
diagrams 10 1000 14100 none 21
queries 6 6000 6000 none 16-21
Rusinol09 1 42 344 none 38
GREC
ICPR
SESYD
Others
13Groundtruth and test documents Existing datasets
1. Overview of approaches 2. Existing datasets
datasets images symbols degradations models
GREC03 30 3000 3000 10 5-50
GREC05 16 1000 1000 6 25-150
GREC07 6 2100 2100 6 50-150
ICPR00 9 450 11250 9 25
bags 16 1600 15046 none 25-150
floorplans 10 1000 26830 none 16
diagrams 10 1000 14100 none 21
queries 6 6000 6000 none 16-21
Rusinol09 1 42 344 none 38
GREC
ICPR
SESYD
Others
14Groundtruth and test documents Existing datasets
1. Overview of approaches 2. Existing datasets
datasets images symbols degradations models
GREC03 30 3000 3000 10 5-50
GREC05 16 1000 1000 6 25-150
GREC07 6 2100 2100 6 50-150
ICPR00 9 450 11250 9 25
bags 16 1600 15046 none 25-150
floorplans 10 1000 26830 none 16
diagrams 10 1000 14100 none 21
queries 6 6000 6000 none 16-21
Rusinol09 1 42 344 none 38
GREC
ICPR
SESYD
Others
15Groundtruth and test documents Existing datasets
1. Overview of approaches 2. Existing datasets
datasets images symbols degradations models
GREC03 30 3000 3000 10 5-50
GREC05 16 1000 1000 6 25-150
GREC07 6 2100 2100 6 50-150
ICPR00 9 450 11250 9 25
bags 16 1600 15046 none 25-150
floorplans 10 1000 26830 none 16
diagrams 10 1000 14100 none 21
queries 6 6000 6000 none 16-21
Rusinol09 1 42 344 none 38
GREC
ICPR
SESYD
Others
16Groundtruth and test documents Existing datasets
1. Overview of approaches 2. Existing datasets
datasets images symbols degradations models
GREC03 30 3000 3000 10 5-50
GREC05 16 1000 1000 6 25-150
GREC07 6 2100 2100 6 50-150
ICPR00 9 450 11250 9 25
bags 16 1600 15046 none 25-150
floorplans 10 1000 26830 none 16
diagrams 10 1000 14100 none 21
queries 6 6000 6000 none 16-21
Rusinol09 1 42 344 none 38
GREC
ICPR
SESYD
Others
17Groundtruth and test documents Existing datasets
1. Overview of approaches 2. Existing datasets
1. Random selection of a document 2. Radom
selection of a symbol
datasets images symbols degradations models
GREC03 30 3000 3000 10 5-50
GREC05 16 1000 1000 6 25-150
GREC07 6 2100 2100 6 50-150
ICPR00 9 450 11250 9 25
bags 16 1600 15046 none 25-150
floorplans 10 1000 26830 none 16
diagrams 10 1000 14100 none 21
queries 6 6000 6000 none 16-21
Rusinol09 1 42 344 none 38
Groundtruth
Generator of queries
GREC
ICPR
3. Random crop
SESYD
Others
18Groundtruth and test documents Existing datasets
1. Overview of approaches 2. Existing datasets
datasets images symbols degradations models
GREC03 30 3000 3000 10 5-50
GREC05 16 1000 1000 6 25-150
GREC07 6 2100 2100 6 50-150
ICPR00 9 450 11250 9 25
bags 16 1600 15046 none 25-150
floorplans 10 1000 26830 none 16
diagrams 10 1000 14100 none 21
queries 6 6000 6000 none 16-21
Rusinol09 1 42 344 none 38
GREC
ICPR
SESYD
Others
19Plan
- Groundtruth and test documents
- Performance characterization
- Conclusions and perspectives
20Performance characterization Introduction
- Performance characterisation (segmented symbols)
- Valveny2004 Dosch2006 Valveny2007,2008a,2008b
- Recognition rate
- Precision/Recall
- Homogeneity
- Separability
Performance characterisation (real context)
21Performance characterization About mapping
Mapping cases
Single a model line matches only with one
detected line. Split two model lines match
with one detected line. Merge a model line
matches with two detected lines.
False alarm a detected line doesn't match with
any model lines. Miss a model line doesn't
match with any detected lines.
truth results
Symbol spotting Rusinol2009
g1
g2
Groundtruth
Results
r
Mapping
c1
c2
22Performance characterization Mapping,
application to symbol
Which representation ?
How to define the regions ?
Compatibility with recognition systems ?
Lot of systems use sliding windows to detect
symbols providing only points Adam2001
Dosh2004 Rusinol2007
Lot of systems use sliding windows to detect
symbols providing only points Adam2001
Dosh2004 Rusinol2007
How to define local thresholds
point
the polarized pat of the capacitor belong to the
symbol ?
Systems providing region of interest can tune
their results, how to limit the over segmentation
cases ?
the precision will depend of the model
wrapper box, ellipsis
groundtruth
Same for the moving area of the door ?
segmentation
convex polygon
could be of weak precision
precise but comparison is time consuming
concave polygon
23Performance characterization Work in progress
Comparison of some criteria System of
Qureshi08 , 100 floorplans (2521 symbols)
Signature based characterization
24Plan
- Groundtruth and test documents
- Performance characterization
- Conclusions and perspectives
25Conclusions and perspectives
- Conclusions
- Large databases of segmented symbol images exist
GREC - Synthetic databases in real context exist SESYD
- True-life documents and groundtruth are at the
corner EPEIRES - Characterization tools have been proposed
SymbolRec - Perspectives
- Continue to produce other databases, using
existing platforms - Mapping is the key problem today, to achieve a
performance evaluation in real context
26Thanks
- All the referenced papers can be found in
- 1 M. Delalandre, E. Valveny and J. Lladós
Performance Evaluation of Symbol Recognition and
Spotting Systems A Overview. Workshop on
Document Analysis Systems (DAS), pp 497-505, 2008.