Audio databases - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Audio databases

Description:

Manual identification. 5. Searching with metadata ... the leaf nodes. Each node has a max number of children and is at least half full (except leaves and root ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 21

Provided by: carl116

Category:

more less

Transcript and Presenter's Notes

Title: Audio databases

1
Audio databases
2
Overview

Basics of audio data
Metadata
Segmentation
Features
Indexing
Retrieval

3
Audio basics

Sound vibrations in air (or other medium)
Temporal signal (i.e. time-based)
Various frequencies
Humans hear 15 Hz to 20 kHz
Amplitude / wavelength

4
Metadata

Use data about the audio data (metadata)
Identify segments with associated metadata
Objects / individuals in segment
Their properties
Activities
Dependent on application
Manual identification

5
Searching with metadata

Use searches on metadata to find segments that
match the query
E.g. find segments with a particular object
find segments with a violin playing
Organize metadata using standard indexing
techniques to support efficient retrieval

6
Segmentation

Automatic segmentation is based on signal
processing
Look at properties of the signal to identify
short segments that have homogeneity
E.g. steady wavelength
This may result in a large number of short
segments

7
Features

For each segment, extract a set of features that
represent the content of that segment
Intensity, loudness, pitch, etc.
These form a k-dimensional vector, where k is the
number of features extracted.
Add in identifiers for the audio and segment to
give a complete description of the features and
where they are found.

8
Indexing

For efficient retrieval, a suitable index
structure is needed
R-tree is possible, but with high dimensions it
becomes less satisfactory
Each region covers a large hypercube
Overlapping hypercubes introduce inefficiencies

9
TV-tree

Telescopic vector tree (TV-tree) is a variation
on the R-tree that performs better with higher
dimensions
It stores the vectors in a tree structure in the
leaf nodes
Each node has a max number of children and is at
least half full (except leaves and root node)

10
TV-tree

Some variations use spheres instead of rectangles
for the regions
Hyperspheres instead of hypercubes.
For these, each node has a central point, a
radius and a set of active dimensions.

11
Active dimensions

The active dimensions in a node are those
components of the vectors that differentiate
between the values.
The tree will allow a maximum number of active
dimensions per node
This technique keeps down the number of
dimensions to consider

12
Example

Consider three vectors of dimension 6
2,4,67,32,9, 1
2,4,88,35,17,1
2,4,5,60,3,1
Components in dimensions 1,2 and 6 are all the
same so only dimensions 3,4,5 differentiate
between the values
Active dimensions are 3,4 and 5.

13
Adding into a TV-tree

Start with first vector in one node (both leaf
and root)
Second vector added in to same node
Continue until node is full, then split the node
into 2 leaf nodes, with half of the values in one
leaf, the others in the second create a new root
to cover these two leaf nodes

14
Adding further vectors

Further vectors are compared with the root
entries to see which child node to go down into.
Pick a child node which minimises the change to
the radius of the child.
Continue down the tree to find a leaf node to
hold the value.

15
Insertion and active dimensions

When a node receives a new vector, you might
alter the active dimensions
Look for the components that most differentiate
the vectors up to a maximum specified for the
tree.

16
Splitting a node and active dimensions

When a node splits, the active dimensions from
the original may be used for the two new nodes,
but there may be changes
Some active dimensions may be removed if the
vectors left in the node have similar values in
certain components
Some new active dimension may be added if the
inserted vector differs considerably on other
components.

17
Retrieval in a TV-tree

A query has to find the segments that match a
given feature vector
Exact matching follows the tree structure from
the root to the leaf node
Nearest-neighbour matching works down the tree
looking for the n closest matches, where n is a
specified number

18
Reducing number of segments

One problem is that there may be 100,000 segments
in a short audio clip
A technique to reduce the size is to compress the
signal using the Discrete Fourier Transform
This lets you transform a signal into a set of
coefficients that represent the components of the
frequencies in the signal
You can use a small number of components (up to
10, say) to represent the signal

19
Using indexes on DFTs

For each audio signal, compute the coefficients
for its DFT
Add a component to identify the particular signal
Store the vector in a TV-tree (or other suitable
index)

20
Summary

Audio data can be split into segments and have
metadata created for each segment
Searching is then on the metadata
Automatic segmentation can be done by signal
processing
Feature vectors of each segment are constructed
and stored in an index
TV-trees can be used for the indexes
DFTs can be used to reduce the number of segments
in a signal.