Augmenting the Generalized Hough Transform to Enable the Mining of Petroglyphs PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Augmenting the Generalized Hough Transform to Enable the Mining of Petroglyphs


1
Augmenting theGeneralized Hough Transform to
Enable the Mining of Petroglyphs
  • Qiang Zhu, Xiaoyue Wang, Eamonn Keogh, 1Sang-Hee
    Lee
  • Dept. Of Computer Science Eng., 1Dept. of
    Anthropology
  • University of California, Riverside

2
Outline
  • Motivation
  • Approach
  • Evaluation
  • Conclusion

3
Motivation(1) -applications
  • Petroglyphs are one of the earliest expressions
    of abstract thinking.
  • Providing a rich source of information
  • climate change
  • existence of a certain species
  • patterns of humans migrations and interactions

4
Motivation(2) -difficulties
  • Progress in petroglyph research has been
    frustratingly slow.
  • due to their extraordinarily diverse and complex
    structure
  • most matching algorithms can not capture the
    similarity of petroglyphs
  • for those that can, even in limited cases, do not
    scale to large collections

5
Approach
  • How to preprocess the raw data?
  • How to define the distance measure?
  • How to speed up?

6
Preprocessing(1)
  • With rare exceptions, petroglyphs do not lend
    themselves to automatic extraction with
    segmentation algorithms.

The border of this rock may be recognized as the
edge of this petroglyph
7
PetroAnnotator
Load the raw image into our human computation tool
8
PetroAnnotator (cont.)
Draw an approximate boundary around object, and
then trace the shape
9
Preprocessing(2) -downsampling
A
  • Two overlaid skeleton traces (340 by 250) of the
    same image of a Bighorn sheep. Less than 3.5 of
    the pixels from each image overlap.
  • (B) The same two images after downsampling (30 by
    23).
  • 75.6 of the pixels (denoted by black) are
    common to both.

10
Distance Measure -why GHT?
  • essentially makes no assumption about the data
  • open/closed boundaries
  • connected/disconnected shapes
  • correctly captures the similarity
  • subjective/objective similarity on
    unlabeled/labeled datasets
  • tightly lower bound the distance
  • allowing for very efficient searches in large
    datasets

11
Classic GHT
  • GHT is a useful method for two dimensional
    arbitrary shape detection.

12
(1) Find the star-pattern
13
(2) Superimpose Accumulate
14
(3) Find the peak
Q
R
R
0
1
1
1
0
A
0
0
1
0
0
1
2
3
2
1
0
1
1
1
0
15
A Basic Distance Measure
  • Classic GHT doesnt explicitly encode a
    similarity measure
  • We can simply define a GHT-based distance
  • minimal unmatched edge points (MUE)
  • number of edge points in Q maximal matched
    edge points
  • 4 3 1 (for our toy example)

16
A New Cell Incrementation Strategy
  • When can we obtain the value of a particular cell
    in the accumulator?
  • In the classic GHT, until the end of all
    incrementation
  • Is it possible to obtain the value one by one?
  • Need to check all positions that are possible to
    increase the cell value

?
17
Lower Bound
?
Q
C
?
?
?
?
?
?
?
?
In this column Q needs 2 pixels in C, and has 3
In this column Q needs 2 pixels in C, and has 2
In this column Q needs 4 pixels in C, and has
only 2
In this column Q needs 2 pixels in C, and has 2
In this column Q needs 2 pixels in C, and has 3
Minimal missed points
2
0
0
2
0
0
18
Time Complexity
  • Classic GHT
  • O(NQNCS2)
  • superimpose all query vectors to all edge points
    in the candidate image
  • Lower bound GHT
  • O(S2)
  • compare one-dimensional signatures
  • further reduced by early abandon and shifting
    order
  • one to two orders of magnitude speed-up

19
Variants on the Basic Distance Measure
  • Query-by-Content
  • Clustering
  • Finding Motifs

20
Evaluation
  • We performed three sets of experiments
  • Evaluation of Utility
  • -on unlabeled data
  • Evaluation of Accuracy
  • -on labeled data
  • Evaluation of Scalability
  • -on synthetic data

21
Evaluation of Utility (1)
  1. Our GHT-based distance measure correctly groups
    all seven pairs
  2. The higher level structure of the dendrogram also
    correctly groups similar petroglyphs

Atlatls
Anthropomorphs
Bighorn Sheep
A clustering of typical Southwestern USA
petroglyphs
22
Evaluation of Utility (2)
23
(No Transcript)
24
Evaluation of Utility (3)
  • Whether our distance measure can find meaningful
    motifs?
  • 2,852 real petroglyphs
  • 4,065,526 possible pairs
  • 52 top motifs (0.00128) by motif cutoff

Motif Cutoff
25
Evaluation of Accuracy -datasets
  • NicIcon dataset
  • 24,441 images
  • 14 categories
  • 33 volunteers
  • 234234 pixels
  • WD/WI tests
  • Farsi digits dataset
  • From 11,942 registration forms
  • 60,000 digits for training
  • 20,000 digits for testing
  • 5464 pixels (largest MBR)

26
(1) Test the Downsampling Size
30
20
Error Rate ()
WD
In both datasets, the error rate of
one-nearest-neighbor test varies little once the
resolution is greater than 1010
10
WI
0

10
20
30
40
50
60
70
80
5
Resolution (RR) of Downsampled Images (NicIcon)

16
12
Error Rate ()
8
4
2
5
10
20
30
Resolution (RR) of Downsampled Images (Farsi)
27
(2) Competitive accuracy
  • NicIcon dataset
  • Error rate for WD 4.78
  • 8.46 for WI
  • The dataset creators tested on the online data
    using three classifiers.
  • Only one of them (DTWB) is better, however,
    slower
  • Farsi digits dataset
  • Error rate 4.54
  • Borji et al. performed extensive empirical tests
    on this dataset
  • Of the twenty reported error rates, the mean was
    8.69
  • Only four beat our approach, but need to set at
    least six parameters

28
Evaluation of Scalability -datasets
  • We made 8 synthetic petroglyph datasets
  • Based on 22 classic petroglyphs
  • Duplicated by 10 volunteers on a tablet
  • Applied a Random Polynomial Transformation
  • Containing up to 1,280,000 objects

29
(1) Querying by Content
  • Leave-one-out one-nearest-neighbor test.
  • Repeated the test for 10 times on each dataset.

30
(2) Finding Motifs
  • A brute force algorithm requires time quadratic
    in the size of dataset.
  • By using the triangular inequality of our
    distance measure, we only need to calculate a
    tiny fraction of the exact distance.
  • Even for the smallest dataset
  • -our algorithm is 712 times faster
  • -we can prune 99.84 of the calculations

31
Conclusion
  • In this work we considered, for the first time,
    the problem of mining large collections of rock
    art.
  • Introduced a novel distance measure
  • Found an efficiently computable tight lower bound
    to this measure
  • Enabled mining large data archives effectively

32
All datasets and the code can be downloaded
from http//www.cs.ucr.edu/qzhu/petro.html
Thanks for your listening ! ?
Write a Comment
User Comments (0)
About PowerShow.com