Audio databases - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Audio databases

Description:

Manual identification. 5. Searching with metadata ... the leaf nodes. Each node has a max number of children and is at least half full (except leaves and root ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 21
Provided by: carl116
Category:

less

Transcript and Presenter's Notes

Title: Audio databases


1
Audio databases
2
Overview
  • Basics of audio data
  • Metadata
  • Segmentation
  • Features
  • Indexing
  • Retrieval

3
Audio basics
  • Sound vibrations in air (or other medium)
  • Temporal signal (i.e. time-based)
  • Various frequencies
  • Humans hear 15 Hz to 20 kHz
  • Amplitude / wavelength

4
Metadata
  • Use data about the audio data (metadata)
  • Identify segments with associated metadata
  • Objects / individuals in segment
  • Their properties
  • Activities
  • Dependent on application
  • Manual identification

5
Searching with metadata
  • Use searches on metadata to find segments that
    match the query
  • E.g. find segments with a particular object
  • find segments with a violin playing
  • Organize metadata using standard indexing
    techniques to support efficient retrieval

6
Segmentation
  • Automatic segmentation is based on signal
    processing
  • Look at properties of the signal to identify
    short segments that have homogeneity
  • E.g. steady wavelength
  • This may result in a large number of short
    segments

7
Features
  • For each segment, extract a set of features that
    represent the content of that segment
  • Intensity, loudness, pitch, etc.
  • These form a k-dimensional vector, where k is the
    number of features extracted.
  • Add in identifiers for the audio and segment to
    give a complete description of the features and
    where they are found.

8
Indexing
  • For efficient retrieval, a suitable index
    structure is needed
  • R-tree is possible, but with high dimensions it
    becomes less satisfactory
  • Each region covers a large hypercube
  • Overlapping hypercubes introduce inefficiencies

9
TV-tree
  • Telescopic vector tree (TV-tree) is a variation
    on the R-tree that performs better with higher
    dimensions
  • It stores the vectors in a tree structure in the
    leaf nodes
  • Each node has a max number of children and is at
    least half full (except leaves and root node)

10
TV-tree
  • Some variations use spheres instead of rectangles
    for the regions
  • Hyperspheres instead of hypercubes.
  • For these, each node has a central point, a
    radius and a set of active dimensions.

11
Active dimensions
  • The active dimensions in a node are those
    components of the vectors that differentiate
    between the values.
  • The tree will allow a maximum number of active
    dimensions per node
  • This technique keeps down the number of
    dimensions to consider

12
Example
  • Consider three vectors of dimension 6
  • 2,4,67,32,9, 1
  • 2,4,88,35,17,1
  • 2,4,5,60,3,1
  • Components in dimensions 1,2 and 6 are all the
    same so only dimensions 3,4,5 differentiate
    between the values
  • Active dimensions are 3,4 and 5.

13
Adding into a TV-tree
  • Start with first vector in one node (both leaf
    and root)
  • Second vector added in to same node
  • Continue until node is full, then split the node
    into 2 leaf nodes, with half of the values in one
    leaf, the others in the second create a new root
    to cover these two leaf nodes

14
Adding further vectors
  • Further vectors are compared with the root
    entries to see which child node to go down into.
  • Pick a child node which minimises the change to
    the radius of the child.
  • Continue down the tree to find a leaf node to
    hold the value.

15
Insertion and active dimensions
  • When a node receives a new vector, you might
    alter the active dimensions
  • Look for the components that most differentiate
    the vectors up to a maximum specified for the
    tree.

16
Splitting a node and active dimensions
  • When a node splits, the active dimensions from
    the original may be used for the two new nodes,
    but there may be changes
  • Some active dimensions may be removed if the
    vectors left in the node have similar values in
    certain components
  • Some new active dimension may be added if the
    inserted vector differs considerably on other
    components.

17
Retrieval in a TV-tree
  • A query has to find the segments that match a
    given feature vector
  • Exact matching follows the tree structure from
    the root to the leaf node
  • Nearest-neighbour matching works down the tree
    looking for the n closest matches, where n is a
    specified number

18
Reducing number of segments
  • One problem is that there may be 100,000 segments
    in a short audio clip
  • A technique to reduce the size is to compress the
    signal using the Discrete Fourier Transform
  • This lets you transform a signal into a set of
    coefficients that represent the components of the
    frequencies in the signal
  • You can use a small number of components (up to
    10, say) to represent the signal

19
Using indexes on DFTs
  • For each audio signal, compute the coefficients
    for its DFT
  • Add a component to identify the particular signal
  • Store the vector in a TV-tree (or other suitable
    index)

20
Summary
  • Audio data can be split into segments and have
    metadata created for each segment
  • Searching is then on the metadata
  • Automatic segmentation can be done by signal
    processing
  • Feature vectors of each segment are constructed
    and stored in an index
  • TV-trees can be used for the indexes
  • DFTs can be used to reduce the number of segments
    in a signal.
Write a Comment
User Comments (0)
About PowerShow.com