Title: Sound Detection
1Sound Detection
- Derek Hoiem
- Rahul Sukthankar (mentor)
- August 24, 2004
2Objective
- Learn model of sound object from few (10-20)
examples and distinguish from all other sounds - Examples of sound classes
- Gunshots, screams, laughter, car horns, meow, dog
bark, etc
3Applications
- Tell me if you hear a gunshot. (monitoring)
- Get me video clips containing dogs barking.
(search and retrieval) - Whats going on? (scene understanding)
4Why its difficult
- Sound classes have large variations
- Sounds are often ambiguous without context
- Overlaid noise obscures sound
5Sound or not?
Which of these sounds are not from their named
classes?
Car horn
Dog bark
Laser gun
6Previous work
- Sound Classification (Wold 1996, Casey 2001, etc)
- Categorize short sound clips
- Reasonable accuracy (5-20 error)
- Sound Detection (Defaux 2000, Piamsa-nga 1999)
- Localize and recognize sound objects in long
clips - Poor performance or assumption of unrealistic
conditions (e.g., very quiet background)
7Detection via Windowed Search
Long Track
Clip Classifier
Return locations of detected sound object
Break audio track into short overlapping short
clips
Independently classify short clips as object or
non-object
8Representation
meows
phone rings
Raw Representation
9Classification Features
- Diverse feature set
- Different sound classes are distinctive in
different ways - means and standard deviations of power at
different frequencies - Band-width, peaks, loudness, etc.
- 138 features in all
10Classification by Decision Trees
- Try to find simple rules that discriminate object
from non-object - Each decision is based on a threshold of a
feature value - Assign confidence based on likelihood of data for
object and non-object classes at each leaf node
Decision nodes
Leaf Nodes
11Boosted Trees
- Problem One decision tree by itself may not be a
great classifier - Solution Use several trees, with each one
focusing on the mistakes of previously learned
trees - Adaboost
- Weight training data uniformly
- Learn a decision tree classifier on weighted data
- Re-weight data giving more weight to incorrectly
classified examples - Final classification based on linear combination
of confidences from all learned decision trees
12Examples of Decision Trees
Meow
Gunshot
Low percentage of power in low frequencies in
mid-time of sound
High power amplitude range
Very high power amplitude range
Gunshot
More complex tree that focuses on examples
misclassified by tree above
13Cascade of Classifiers
- Goal eliminate false positives with few false
negatives in early stages - Advantages
- Allows use of large set of negative training
examples - Improves classification speed
- Dangers cannot recover from false negatives
Pass (5)
Pass (2)
Pass (0.005)
Stage 1
Sound Clip
Stage 2
Stage 3
Pass
Fail
Fail
Fail
Fail
14Results Classification Error
stage 1 stage 1 stage 2 stage 2 stages 3 stages 3
pos neg pos neg pos neg
meow 0.0 1.4 0.0 1.2 2.2 0.8
phone 0.0 0.4 4.3 0.1 5.9 0.0
car horn 0.0 3.9 0.6 2.2 3.6 1.3
door bell 1.4 2.1 2.1 0.4 6.3 0.1
swords 6.1 1.3 6.7 0.1 6.7 0.0
scream 0.3 5.5 2.7 1.4 5.3 1.1
dog bark 0.7 1.0 6.0 0.3 7.7 0.2
laser gun 0.0 6.8 4.4 5.1 6.7 0.9
explosion 4.1 5.2 7.5 1.5 12.0 0.5
light saber 4.8 6.8 9.7 1.0 13.9 0.2
gunshot 8.1 6.1 12.5 2.3 14.5 1.1
close door 7.9 7.8 14.5 4.8 17.6 2.3
male laugh 4.3 14.7 9.5 9.7 13.3 7.0
average 2.9 4.4 6.0 2.2 8.5 1.1
15Results ROC curves
Note to approximate negative error rate divide
FP by 25,000
16Results Anecdotal
Gunshots
Female Laugh
Male Laugh
Swords
Scream