Using Blackboard Systems for Polyphonic Transcription - PowerPoint PPT Presentation

About This Presentation
Title:

Using Blackboard Systems for Polyphonic Transcription

Description:

Keith Martin (1996 b) Proposes the use of log-lag correlograms in front end ... by measuring the periodic energy in each filter channel as a function of lag ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 36
Provided by: Win6232
Category:

less

Transcript and Presenter's Notes

Title: Using Blackboard Systems for Polyphonic Transcription


1
Using Blackboard Systems for Polyphonic
Transcription
  • A Literature Review
  • by Cory McKay

2
Outline
  • Intro to polyphonic transcription
  • Intro to blackboard systems
  • Keith Martins work
  • Kunio Kashinos work
  • Recent contributions
  • Conclusion

3
Polyphonic Transcription
  • Represent an audio signal as a score
  • Must segregate notes belonging to different
    voices
  • Problems variations of timbre within a voice,
    voice crossing, identification of correct octave
  • No successful general purpose system to date

4
Polyphonic Transcription
  • Can use simplified models
  • Music for a single instrument (e.g. piano)
  • Extract only a given instrument from mix
  • Use music which obeys restrictive rules
  • Simplified systems have had success rates of
    between 80 and 90
  • These rates may be exaggerated, since only very
    limited testing suites generally used

5
Polyphonic Transcription
  • Systems to date generally identify only rhythm,
    pitch and voice
  • Would like systems that also identify other
    notated aspects such as dynamics and vibrato
  • Ideal is to have system that can identify and
    understand parameters of music that humans hear
    but do not notate

6
Blackboard Systems
  • Used in AI for decades but only applied to music
    transcription in early 1990s
  • Term blackboard comes from notion of a group of
    experts standing around a blackboard working
    together to solve a problem
  • Each expert writes contributions on blackboard
  • Experts watch problem evolve on blackboard,
    making changes until a solution is reached

7
Blackboard Systems
  • Blackboard is a central dataspace
  • Usually arranged in hierarchy so that input is at
    lowest level and output is at highest
  • Experts are called knowledge sources
  • KSs generally consist of a set of heuristics and
    a precondition whose satisfaction results in a
    hypothesis that is written on blackboard
  • Each KS forms hypotheses based on information
    from front end of system and hypotheses presented
    by other KSs

8
Blackboard Systems
  • Problem is solved when all KSs are satisfied with
    all hypotheses on blackboard to within a given
    margin of error
  • Eliminates need for global control module
  • Each KS can be easily updated and new KSs can be
    added with little difficulty
  • Combines top-down and bottom-up processing

9
Blackboard Systems
  • Music has a naturally hierarchal structure that
    lends itself well to blackboard systems
  • Allow integration of different types of
    expertise
  • signal processing KSs at low level
  • human perception KSs at middle level
  • musical knowledge KSs at upper level

10
Blackboard Systems
  • Limitation giving upper level KSs too much
    specialized knowledge and influence limits
    generality of transcription systems
  • Ideal system would not use knowledge above the
    level of human perception and the most
    rudimentary understanding of music
  • Current trend is to increase significance of
    upper-level musical KSs in order to increase
    success rate

11
Keith Martin (1996 a)
  • A Blackboard System for Automatic Transcription
    of Simple Polyphonic Music
  • Used a blackboard system to transcribe a
    four-voice Bach chorale with appropriate
    segregation of voices
  • Limited input signal to synthesized piano
    performances
  • Gave system only rudimentary musical knowledge,
    although choice of Bach chorale allowed the use
    of generally unacceptable assumptions by lower
    level KSs

12
Keith Martin (1996 a)
  • Front-end system used short-time Fourier
    transform on input signal
  • Equivalent to a filter bank that is a gross
    approximation the way the human cochlea processes
    auditory signals
  • Blackboard system fed sets of associated onset
    times, frequencies and amplitudes

13
Keith Martin (1996 a)
  • Knowledge sources made five classes of
    hierarchally organized hypotheses
  • Tracks
  • Partials
  • Notes
  • Intervals
  • Chords

14
Keith Martin (1996 a)
  • Three types of knowledge sources
  • Garbage collection
  • Physics
  • Musical practice
  • Thirteen knowledge sources in all
  • Each KS only authourized to make certain classes
    of hypotheses

15
Keith Martin (1996 a)
  • KSs with access to upper-level hypotheses can put
    pressure on KSs with lower-level access to make
    certain hypotheses and vice versa
  • Example if the hypotheses have been made that
    the notes C and G are present in a beat, a KS
    with information about chords might put forward
    the hypothesis that there is a C chord, thus
    putting pressure on other KSs to find an E or Eb.
  • Used a sequential scheduler to coordinate KSs

16
Keith Martin (1996 b)
  • Automatic Transcription of Simple Polyphonic
    Music Robust Front End Processing
  • Previous system often misidentified octaves
  • Attempted to improve performance by shifting
    octave identification task from a top-down
    process to a bottom-up process

17
Keith Martin (1996 b)
  • Proposes the use of log-lag correlograms in front
    end
  • Models the inner hair cells in the cochlea with a
    bank of filters
  • Determines pitch by measuring the periodic energy
    in each filter channel as a function of lag
  • Correlograms now basic unit fed to blackboard
    system
  • No definitive results as to which approach is
    better

18
Kashino, Nadaki, Kinoshita and Tanaka (1995)
  • Application of Bayesian Probability Networks to
    Music Scene Analysis
  • Work slightly preceded that of Martin
  • Used test patterns involving more than one
    instrument
  • Uses principles of stream segregation from
    auditory scene analysis
  • Implements more high-level musical knowledge
  • Uses Bayesian network instead of Martins simple
    scheduler to coordinate KSs

19
Kashino, Nadaki, Kinoshita and Tanaka (1995)
  • Knowledge sources used
  • Chord transition dictionary
  • Chord-note relation
  • Chord naming rules
  • Tone memory
  • Timbre models
  • Human perception rules
  • Used very specific instrument timbres and musical
    rules, so has limited general applicability

20
Kashino, Nadaki, Kinoshita and Tanaka (1995)
  • Tone memory frequency components of different
    instruments played with different parameters
  • Found that the integration of tone memory with
    the other KSs greatly improved success rates

21
Kashino, Nadaki, Kinoshita and Tanaka (1995)
  • Bayesian networks well known for finding good
    solutions despite noisy input or missing data
  • Often used in implementing learning methods that
    trade off prior belief in a hypothesis against
    its agreement with current data
  • Therefore seem to be a good choice for
    coordinating KSs

22
Kashino, Nadaki, Kinoshita and Tanaka (1995)
  • No experimental comparisons of this approach and
    Martins simple scheduler
  • Only used simple test patterns rather than real
    music

23
Kashino and Hagita (1996)
  • A Music Scene Analysis System with the MRF-Based
    Information Integration Scheme
  • Suggests replacing Bayesian networks with Markov
    Random Field hypothesis network
  • Successful in correcting two most common problems
    in previous system
  • Misidentification of instruments
  • Incorrect octave labelling

24
Kashino and Hagita (1996)
  • MRF-based networks use simulated annealing to
    converge to a low-energy state
  • MRF approach enables information to be integrated
    on a multiply connected hypothesis network
  • Bayesian networks only allow singly connected
    networks
  • Could now deal with two kinds of transition
    information within a single hypothesis network
  • chord transitions
  • note transitions

25
Kashino and Hagita (1996)
  • Instrument and octave identification errors
    corrected, but some new errors introduced
  • Overall, performed roughly 10 better than
    Bayesian-based system at transcribing 3-part
    arrangement of Auld Lang Syne
  • Still only had a recognition rate of 71.7

26
Kashino and Murase (1998)
  • Shifts some work away from blackboard system by
    feeding it higher-level information
  • Simplifies and mathematically formalizes notion
    of knowledge sources
  • Switches back to Bayesian network
  • Perhaps not truly a blackboard system anymore
  • Has very good recognition rate
  • Scalability of system is seriously compromised by
    new approach

27
Kashino and Murase (1998)
  • Uses adaptive template matching
  • Implemented using a bank of filters arranged in
    parallel and a number of templates corresponding
    to particular notes played by particular
    instruments
  • The correlation between the outputs of the
    filters is calculated and a match is then made to
    one of the templates

28
Kashino and Murase (1998)
  • Achieved recognition rate of 88.5 on real
    recordings of piano, violin and flute
  • Including templates for many more instruments
    could make adaptive template matching intractable
  • Particularly a problem for instruments with
  • Similar frequency spectra
  • A great deal of spectral variation from note to
    note

29
Hainsworth and Macleod (2001)
  • Automatic Bass Line Transcription from
    Polyphonic Music
  • Wanted to be able to extract a single given
    instrument from an arbitrary musical signal
  • Contrast to previous approaches of using
    recordings of only one instrument or a set of
    pre-defined instruments

30
Hainsworth and Macleod (2001)
  • Chose to work with bass
  • Can filter out high frequencies
  • Notes usually fairly steady
  • Used simple mathematical relations to trim
    hypotheses rather than a true blackboard system
  • Had a 78.7 success rate on a Miles Davis
    recording

31
Bello and Sandler (2000)
  • Blackboard Systems and Top-Down Processing for
    the Transcription of Simple Polyphonic Music
  • Return to a true blackboard system
  • Based on Martins implementation, using a
    conventional scheduler
  • Refines knowledge sources and adds high-level
    musical knowledge
  • Implements one of knowledge sources as a neural
    network

32
Bello and Sandler (2000)
  • The chord recognizer KS is a feedworard network
  • Trained using the spectrograph of different
    chords of a piano
  • Trained network fed a spectrograph and outputs
    possible chords
  • Can therefore output more than one hypothesis at
    each iteration
  • Gives other KSs more information and allows
    parallel exploration of solution space

33
Bello and Sandler (2000)
  • Could automatically retrain network to recognize
    spectrograph of other instruments with no manual
    modifications needed
  • Preliminary testing showed tendency to
    misidentify octaves and make incorrect
    identification of note onsets
  • These problems could potentially be corrected by
    signal processing system that feeds blackboard
    system

34
Conclusions
  • Bass transcription system and more recent work of
    Kashino useful for specific applications, but
    limited potential for general transcription
    purposes
  • True blackboard approach scales well and appears
    to hold the most potential for general-purpose
    polyphonic transcription

35
Conclusions
  • Use of adaptive learning in knowledge sources
    seems promising
  • Interchangeable modules could be automatically
    trained to specialize in different areas
  • Could have semi-automatic transcription, where
    user chooses correct modules and system performs
    transcription using them
Write a Comment
User Comments (0)
About PowerShow.com