Title: Using Blackboard Systems for Polyphonic Transcription
1Using Blackboard Systems for Polyphonic
Transcription
- A Literature Review
- by Cory McKay
2Outline
- Intro to polyphonic transcription
- Intro to blackboard systems
- Keith Martins work
- Kunio Kashinos work
- Recent contributions
- Conclusion
3Polyphonic Transcription
- Represent an audio signal as a score
- Must segregate notes belonging to different
voices - Problems variations of timbre within a voice,
voice crossing, identification of correct octave - No successful general purpose system to date
4Polyphonic Transcription
- Can use simplified models
- Music for a single instrument (e.g. piano)
- Extract only a given instrument from mix
- Use music which obeys restrictive rules
- Simplified systems have had success rates of
between 80 and 90 - These rates may be exaggerated, since only very
limited testing suites generally used
5Polyphonic Transcription
- Systems to date generally identify only rhythm,
pitch and voice - Would like systems that also identify other
notated aspects such as dynamics and vibrato - Ideal is to have system that can identify and
understand parameters of music that humans hear
but do not notate
6Blackboard Systems
- Used in AI for decades but only applied to music
transcription in early 1990s - Term blackboard comes from notion of a group of
experts standing around a blackboard working
together to solve a problem - Each expert writes contributions on blackboard
- Experts watch problem evolve on blackboard,
making changes until a solution is reached
7Blackboard Systems
- Blackboard is a central dataspace
- Usually arranged in hierarchy so that input is at
lowest level and output is at highest - Experts are called knowledge sources
- KSs generally consist of a set of heuristics and
a precondition whose satisfaction results in a
hypothesis that is written on blackboard - Each KS forms hypotheses based on information
from front end of system and hypotheses presented
by other KSs
8Blackboard Systems
- Problem is solved when all KSs are satisfied with
all hypotheses on blackboard to within a given
margin of error - Eliminates need for global control module
- Each KS can be easily updated and new KSs can be
added with little difficulty - Combines top-down and bottom-up processing
9Blackboard Systems
- Music has a naturally hierarchal structure that
lends itself well to blackboard systems - Allow integration of different types of
expertise - signal processing KSs at low level
- human perception KSs at middle level
- musical knowledge KSs at upper level
10Blackboard Systems
- Limitation giving upper level KSs too much
specialized knowledge and influence limits
generality of transcription systems - Ideal system would not use knowledge above the
level of human perception and the most
rudimentary understanding of music - Current trend is to increase significance of
upper-level musical KSs in order to increase
success rate
11Keith Martin (1996 a)
- A Blackboard System for Automatic Transcription
of Simple Polyphonic Music - Used a blackboard system to transcribe a
four-voice Bach chorale with appropriate
segregation of voices - Limited input signal to synthesized piano
performances - Gave system only rudimentary musical knowledge,
although choice of Bach chorale allowed the use
of generally unacceptable assumptions by lower
level KSs
12Keith Martin (1996 a)
- Front-end system used short-time Fourier
transform on input signal - Equivalent to a filter bank that is a gross
approximation the way the human cochlea processes
auditory signals - Blackboard system fed sets of associated onset
times, frequencies and amplitudes
13Keith Martin (1996 a)
- Knowledge sources made five classes of
hierarchally organized hypotheses - Tracks
- Partials
- Notes
- Intervals
- Chords
14Keith Martin (1996 a)
- Three types of knowledge sources
- Garbage collection
- Physics
- Musical practice
- Thirteen knowledge sources in all
- Each KS only authourized to make certain classes
of hypotheses
15Keith Martin (1996 a)
- KSs with access to upper-level hypotheses can put
pressure on KSs with lower-level access to make
certain hypotheses and vice versa - Example if the hypotheses have been made that
the notes C and G are present in a beat, a KS
with information about chords might put forward
the hypothesis that there is a C chord, thus
putting pressure on other KSs to find an E or Eb.
- Used a sequential scheduler to coordinate KSs
16Keith Martin (1996 b)
- Automatic Transcription of Simple Polyphonic
Music Robust Front End Processing - Previous system often misidentified octaves
- Attempted to improve performance by shifting
octave identification task from a top-down
process to a bottom-up process
17Keith Martin (1996 b)
- Proposes the use of log-lag correlograms in front
end - Models the inner hair cells in the cochlea with a
bank of filters - Determines pitch by measuring the periodic energy
in each filter channel as a function of lag - Correlograms now basic unit fed to blackboard
system - No definitive results as to which approach is
better
18Kashino, Nadaki, Kinoshita and Tanaka (1995)
- Application of Bayesian Probability Networks to
Music Scene Analysis - Work slightly preceded that of Martin
- Used test patterns involving more than one
instrument - Uses principles of stream segregation from
auditory scene analysis - Implements more high-level musical knowledge
- Uses Bayesian network instead of Martins simple
scheduler to coordinate KSs
19Kashino, Nadaki, Kinoshita and Tanaka (1995)
- Knowledge sources used
- Chord transition dictionary
- Chord-note relation
- Chord naming rules
- Tone memory
- Timbre models
- Human perception rules
- Used very specific instrument timbres and musical
rules, so has limited general applicability
20Kashino, Nadaki, Kinoshita and Tanaka (1995)
- Tone memory frequency components of different
instruments played with different parameters - Found that the integration of tone memory with
the other KSs greatly improved success rates
21Kashino, Nadaki, Kinoshita and Tanaka (1995)
- Bayesian networks well known for finding good
solutions despite noisy input or missing data - Often used in implementing learning methods that
trade off prior belief in a hypothesis against
its agreement with current data - Therefore seem to be a good choice for
coordinating KSs
22Kashino, Nadaki, Kinoshita and Tanaka (1995)
- No experimental comparisons of this approach and
Martins simple scheduler - Only used simple test patterns rather than real
music
23Kashino and Hagita (1996)
- A Music Scene Analysis System with the MRF-Based
Information Integration Scheme - Suggests replacing Bayesian networks with Markov
Random Field hypothesis network - Successful in correcting two most common problems
in previous system - Misidentification of instruments
- Incorrect octave labelling
24Kashino and Hagita (1996)
- MRF-based networks use simulated annealing to
converge to a low-energy state - MRF approach enables information to be integrated
on a multiply connected hypothesis network - Bayesian networks only allow singly connected
networks - Could now deal with two kinds of transition
information within a single hypothesis network - chord transitions
- note transitions
25Kashino and Hagita (1996)
- Instrument and octave identification errors
corrected, but some new errors introduced - Overall, performed roughly 10 better than
Bayesian-based system at transcribing 3-part
arrangement of Auld Lang Syne - Still only had a recognition rate of 71.7
26Kashino and Murase (1998)
- Shifts some work away from blackboard system by
feeding it higher-level information - Simplifies and mathematically formalizes notion
of knowledge sources - Switches back to Bayesian network
- Perhaps not truly a blackboard system anymore
- Has very good recognition rate
- Scalability of system is seriously compromised by
new approach
27Kashino and Murase (1998)
- Uses adaptive template matching
- Implemented using a bank of filters arranged in
parallel and a number of templates corresponding
to particular notes played by particular
instruments - The correlation between the outputs of the
filters is calculated and a match is then made to
one of the templates
28Kashino and Murase (1998)
- Achieved recognition rate of 88.5 on real
recordings of piano, violin and flute - Including templates for many more instruments
could make adaptive template matching intractable - Particularly a problem for instruments with
- Similar frequency spectra
- A great deal of spectral variation from note to
note
29Hainsworth and Macleod (2001)
- Automatic Bass Line Transcription from
Polyphonic Music - Wanted to be able to extract a single given
instrument from an arbitrary musical signal - Contrast to previous approaches of using
recordings of only one instrument or a set of
pre-defined instruments
30Hainsworth and Macleod (2001)
- Chose to work with bass
- Can filter out high frequencies
- Notes usually fairly steady
- Used simple mathematical relations to trim
hypotheses rather than a true blackboard system - Had a 78.7 success rate on a Miles Davis
recording
31Bello and Sandler (2000)
- Blackboard Systems and Top-Down Processing for
the Transcription of Simple Polyphonic Music - Return to a true blackboard system
- Based on Martins implementation, using a
conventional scheduler - Refines knowledge sources and adds high-level
musical knowledge - Implements one of knowledge sources as a neural
network
32Bello and Sandler (2000)
- The chord recognizer KS is a feedworard network
- Trained using the spectrograph of different
chords of a piano - Trained network fed a spectrograph and outputs
possible chords - Can therefore output more than one hypothesis at
each iteration - Gives other KSs more information and allows
parallel exploration of solution space
33Bello and Sandler (2000)
- Could automatically retrain network to recognize
spectrograph of other instruments with no manual
modifications needed - Preliminary testing showed tendency to
misidentify octaves and make incorrect
identification of note onsets - These problems could potentially be corrected by
signal processing system that feeds blackboard
system
34Conclusions
- Bass transcription system and more recent work of
Kashino useful for specific applications, but
limited potential for general transcription
purposes - True blackboard approach scales well and appears
to hold the most potential for general-purpose
polyphonic transcription
35Conclusions
- Use of adaptive learning in knowledge sources
seems promising - Interchangeable modules could be automatically
trained to specialize in different areas - Could have semi-automatic transcription, where
user chooses correct modules and system performs
transcription using them