Classifying Motion Picture Audio - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Classifying Motion Picture Audio

Description:

FREQUENCY DOMAIN. Audio Spectrum Centroid. Fundamental Frequency ... Pre-processing makes it possible to classify motion picture audio correctly ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 21
Provided by: whoc8
Category:

less

Transcript and Presenter's Notes

Title: Classifying Motion Picture Audio


1
Classifying Motion Picture Audio
  • Eirik Gustavsen
  • 07.06.07

2
Outline
  • Motivation
  • Thesis
  • State of the Art
  • Proposed system
  • Experimental setup
  • Results
  • Future work
  • Conclusion

3
Motivation
  • Most projects classify clear classes or classes
    with noise.
  • Few clear boundaries in motion picture audio
  • Subjective descriptions of movies
  • Dificult to compare movie content

4
Thesis
  • It is possible to automatically create a table
    of contents of a motion picture, based on its
    audio track only.

5
Research questions
  • Find best LLDs to classify motion picture audio
  • Detect boundaries between audio classes within
    complex audio segments
  • Automatically create a TOC based on the audio
    track only

6
Pre-Processing
44100 Hz sample rate Mono 16 bits 30 ms windows
(LW)
7
Low Level Descriptors
  • Time domain
    Frequency domain

8
Low Level Descriptors
  • Total of 23 low level descriptors
  • TIME DOMAIN
  • Audio Power
  • Audio Wave Form
  • Root-Mean Square
  • Short Time Energy
  • Low Short Time Energy Ratio
  • Zero-Crossing Rate
  • High Zero-Crossing Rate Ratio
  • FREQUENCY DOMAIN
  • Audio Spectrum Centroid
  • Fundamental Frequency
  • 10 Mel-Frequency Cepstral Coefficients
  • Spectrum Flux

9
Dimensionally reduction
  • Principal components analysis (PCA) is a
    technique used to reduce multidimensional data
    sets to lower dimensions for analysis.

f(1) f(2) f(3) f(4) f(5) ... f(23)
PCA
d(1) d(2) d(3)
10
K Nearest Neighbors
11
Proposed system
Pre- Prosessing
LLD
Norm
PCA
KNN
Post- Prosessing
TOC Generation
12
Classifying Audio
Music
Speech
Mixed audio classes
Noise (white)
Silence
13
Class Boundary Detection
14
Class Boundary Detection
15
Class Boundary Detection
16
Finding most suitable LLDs
Most Suitable ASC AWF RMS HZCRR
17
Sample Results
Music with low volume
Clear speech
Speech with Background music
Speech with background environmental sounds
Jingle
Some mistakes
Fading between music and speech
18
Future Work
  • To be done in this thesis
  • Post processing
  • TOC
  • Open research questions for future works
  • New motion picture audio classes
  • Detecting sound objects
  • Speech recognition

19
Conclusion
  • Pre-processing makes it possible to classify
    motion picture audio correctly
  • Using right combination of LLDs enhances the
    result of the classification

20
Questions
?
Write a Comment
User Comments (0)
About PowerShow.com