Polyphonic Queries - PowerPoint PPT Presentation

About This Presentation
Title:

Polyphonic Queries

Description:

Melodic fragments are a useful way of searching electronic music databases. Convenient for humans to remember musical phrases. Easy to enter musical fragments ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 35
Provided by: Win6232
Category:

less

Transcript and Presenter's Notes

Title: Polyphonic Queries


1
Polyphonic Queries
  • A Review of Recent Research
  • by Cory Mckay

2
Overview
  • Introduction to polyphonic music queries
  • Review of five recent systems
  • Conclusions

3
Melodic Fragments as Queries
  • Melodic fragments are a useful way of searching
    electronic music databases
  • Convenient for humans to remember musical phrases
  • Easy to enter musical fragments
  • Monophonic query by humming
  • Polyphonic notation software

4
Search Requirements
  • Want searches to return all occurrences of a
    given set of notes in a database
  • May also want records that contain similar sets
    of notes
  • Should be able to deal with incomplete or
    partially erroneous queries
  • Ideally, want to search music stored in
  • Symbolic formats (MIDI, scores, sketches)
  • Raw audio data

5
Monophonic Databases
  • Has been some success with special case of
    monophonic databases
  • Andreas Kornstadts Themefinder
  • Roger J. MacNabs Meledex
  • Searching polyphonic databases is much more
    difficult

6
Polyphonic Databases Problems
  • Notes may begin simultaneously. Makes it
    impossible to outline an unambiguous sequence of
    events
  • Multiple voices, with varying roles and relevance
    to particular queries
  • Hard to deal with both symbolic and raw audio
    representations
  • Monophonic can just transcribe audio
  • Polyphonic no good transcription system

7
Current Polyphonic Systems
  • Wide-spread systems rely on meta-data
  • This no good for searches of melodic fragments
  • Currently no widely accepted content-based system
  • A number of papers on topic have recently been
    published

8
Wiggins et al. (2002)
  • Designed a new algorithm called SIA(M)ESE
  • For making transposition-invariant queries
  • Matches a query even if there are events in a
    score being searched that separate musical events
    in the query
  • Assumes that both queries and database files are
    in symbolic form and are accurate
  • This limits generality of algorithm
  • No implementation of the algorithm given

9
Doraisamy Rüger (2001)
  • Polyphonic queries
  • Partial queries permitted
  • Symbolic queries
  • Symbolic records
  • Pitch and rhythm features

10
Doraisamy Rüger (2001)
  • N-grams are produced by converting notes into
    interval-based representation and grouping
    intervals into subdivisions of length n using a
    gliding window
  • Leads to transposition-invariant data
  • N-grams work well with monophonic queries

11
Doraisamy Rüger (2001)
  • To deal with potential for simultaneous note
    onsets in polyphonic music, constructs exhaustive
    melodic strings
  • Divide piece into overlapping windows of n
    adjacent onset times
  • Find all possible combinations of onsets within
    each window

12
Doraisamy Rüger (2001)
  • Incorporates rhythmic information as well as
    intervals into each n-gram window
  • Done by calculating ratio of time differences
    between adjacent note onsets
  • Ratioi (Onseti2 Onseti1) / (Onseti1
    Onseti)
  • Avoids need to quantize events based on a
    predetermined beat duration
  • By using onsets only, avoids needing to determine
    the duration of notes, which can be difficult to
    detect in raw audio recordings

13
Doraisamy Rüger (2001)
  • N-grams converted into text-based representations
    to allow use of text-based search engines
  • Interval and rhythmic ratio histograms
    constructed in order to search for patterns in
    each piece

14
Doraisamy Rüger (2001)
  • Tested using a database of 3096 MIDI
    representations of classical music
  • Studied effects of varying window sizes, bin
    ranges and query lengths
  • 95 success rate with window sizes of 4 onset
    times, variable bin ranges and queries involving
    50 notes
  • 74 with query lengths of ten notes
  • 65 with queries containing errors

15
Doraisamy Rüger (2001)
  • Evaluation
  • Performs well under ideal conditions
  • Databases or queries containing raw audio not
    considered
  • Only transposition invariant searches possible
  • Undesirably long query lengths necessary
  • Only classical music records tested
  • Successful in showing the potential utility of
    n-grams
  • Most recent work uses a variant of this n-gram
    approach

16
Doraisamy Rüger (2002)
  • Monophonic queries
  • Partial queries permitted
  • Symbolic queries
  • Symbolic records
  • Pitch and rhythm features

17
Doraisamy Rüger (2002)
  • Focused on monophonic queries because of query by
    humming
  • N-grams particularly appropriate for error-prone
    queries resulting from query by humming
  • One or two mistakes only lead to a few incorrect
    n-grams among a larger number of correct ones

18
Doraisamy Rüger (2002)
  • Used same overall design as previous system
  • Includes more sophisticated error models to test
    effects of query inaccuracies
  • Database expanded to include popular music as
    well as classical music
  • 80 of the relevant compositions were returned in
    first 15 hits, on average

19
Doraisamy Rüger (2002)
  • Evaluation
  • N-grams are an effective and error-tolerant tool
    for searching polyphonic music with monophonic
    queries
  • Improvements still need to be made
  • Queries still symbolic, so true applicability of
    system to query by humming not tested

20
Pickens et al. (2002)
  • Polyphonic queries
  • Full-length queries only
  • Audio queries
  • Symbolic records
  • Harmonic features

21
Pickens et al. (2002)
  • Relies on transcription algorithms to transform
    audio queries into symbolic form
  • Relies on error tolerance of searches to
    compensate for transcription errors
  • Two types of polyphonic transcription systems
    used, including blackboard

22
Pickens et al. (2002)
  • Transcribed queries and database records analyzed
    and stored using a harmonic modelling module
  • Characterizes pieces by mapping chords to a
    probability distribution
  • Breaks into sequences of independent note sets
  • Applies smoothing procedure to sets
  • Markov models created from smoothed sets

23
Pickens et al. (2002)
  • Performing searches
  • Scoring function used to compare query models
    with each model stored in database
  • Dissimilarity scores produced
  • Allows search hits to be ranked

24
Pickens et al. (2002)
  • Tested with database containing 3150 classical
    piano pieces
  • Queries consisted of full-length audio recordings
  • On average, searches assigned a rank of between 2
    and 6 to the correct database record
  • Moderately successful at matching variations of a
    piece on average, 3 of top 5 hits relevant

25
Pickens et al. (2002)
  • Evaluation
  • Can use polyphonic audio queries
  • Limited by effectiveness of its transcription
    systems and error tolerance of the query system
  • System only tested on piano music
  • Full-length queries limit usefulness
  • Only one feature used

26
Song, Bae Yoon (2002)
  • Monophonic queries
  • Partial queries permitted
  • Audio queries
  • Audio records
  • Melodic features

27
Song, Bae Yoon (2002)
  • Intended to be used with query by humming
  • Avoids disadvantages of automated transcription
    by mapping audio data directly to mid-level
    melody-based feature set description
  • Contrasts with approach of first transcribing
    audio data and then extracting features from this
    high-level symbolic representation

28
Song, Bae Yoon (2002)
  • Mid-level representation produced by processing
    audio frames using a five-step process

29
Song, Bae Yoon (2002)
  • Instead of making definite decision on which
    notes were present, as a blackboard system would
    have done, vectors of all possible notes were
    kept for each audio segment
  • A DP-matching method was used during searches so
    that potentially error-prone patterns of
    different lengths could be compared

30
Song, Bae Yoon (2002)
  • Tested by attempting to match 176 hummed samples
    to 92 short extracts (15-20 seconds long) of
    popular Korean and Western popular songs
  • Resulted in exact matches roughly 43 of the time
    and a match in the top ten from 69 to 76 of the
    time (depending on window size)
  • Search time varied from 3 to 14 seconds

31
Song, Bae Yoon (2002)
  • Evaluation
  • Performance relatively poor
  • Only monophonic queries possible
  • Only tested using a database of short recordings
  • This was only system using audio data for both
    queries and database records

32
Conclusions
  • No truly viable systems produced yet
  • Promising approaches have been proposed
  • Recurring problems
  • None of systems can deal with both symbolic and
    audio data
  • None have been tested with both polyphonic and
    monophonic queries
  • Tend to require long queries to achieve good
    results
  • Searches do not allow much flexibility (e.g. must
    be transposition invariant)

33
Conclusions
  • Difficult to compare systems because they each
    use different performance evaluation metrics
  • Limited data sets used during testing

34
Conclusions
  • Possible improvement use a greater number of
    feature classes
  • Aside from Doraisamy Rüger, all systems
    discussed limited themselves to one feature class
  • Harmonic, melodic, timbral and rhythm-based
    features could all prove useful
  • Would allow much more flexible searches
Write a Comment
User Comments (0)
About PowerShow.com