Using Blackboard Systems for Polyphonic Transcription - PowerPoint PPT Presentation

About This Presentation

Title:

Using Blackboard Systems for Polyphonic Transcription

Description:

Keith Martin (1996 b) Proposes the use of log-lag correlograms in front end ... by measuring the periodic energy in each filter channel as a function of lag ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 36

Provided by: Win6232

Category:

more less

Transcript and Presenter's Notes

Title: Using Blackboard Systems for Polyphonic Transcription

1
Using Blackboard Systems for Polyphonic
Transcription

A Literature Review
by Cory McKay

2
Outline

Intro to polyphonic transcription
Intro to blackboard systems
Keith Martins work
Kunio Kashinos work
Recent contributions
Conclusion

3
Polyphonic Transcription

Represent an audio signal as a score
Must segregate notes belonging to different
voices
Problems variations of timbre within a voice,
voice crossing, identification of correct octave
No successful general purpose system to date

4
Polyphonic Transcription

Can use simplified models
Music for a single instrument (e.g. piano)
Extract only a given instrument from mix
Use music which obeys restrictive rules
Simplified systems have had success rates of
between 80 and 90
These rates may be exaggerated, since only very
limited testing suites generally used

5
Polyphonic Transcription

Systems to date generally identify only rhythm,
pitch and voice
Would like systems that also identify other
notated aspects such as dynamics and vibrato
Ideal is to have system that can identify and
understand parameters of music that humans hear
but do not notate

6
Blackboard Systems

Used in AI for decades but only applied to music
transcription in early 1990s
Term blackboard comes from notion of a group of
experts standing around a blackboard working
together to solve a problem
Each expert writes contributions on blackboard
Experts watch problem evolve on blackboard,
making changes until a solution is reached

7
Blackboard Systems

Blackboard is a central dataspace
Usually arranged in hierarchy so that input is at
lowest level and output is at highest
Experts are called knowledge sources
KSs generally consist of a set of heuristics and
a precondition whose satisfaction results in a
hypothesis that is written on blackboard
Each KS forms hypotheses based on information
from front end of system and hypotheses presented
by other KSs

8
Blackboard Systems

Problem is solved when all KSs are satisfied with
all hypotheses on blackboard to within a given
margin of error
Eliminates need for global control module
Each KS can be easily updated and new KSs can be
added with little difficulty
Combines top-down and bottom-up processing

9
Blackboard Systems

Music has a naturally hierarchal structure that
lends itself well to blackboard systems
Allow integration of different types of
expertise
signal processing KSs at low level
human perception KSs at middle level
musical knowledge KSs at upper level

10
Blackboard Systems

Limitation giving upper level KSs too much
specialized knowledge and influence limits
generality of transcription systems
Ideal system would not use knowledge above the
level of human perception and the most
rudimentary understanding of music
Current trend is to increase significance of
upper-level musical KSs in order to increase
success rate

11
Keith Martin (1996 a)

A Blackboard System for Automatic Transcription
of Simple Polyphonic Music
Used a blackboard system to transcribe a
four-voice Bach chorale with appropriate
segregation of voices
Limited input signal to synthesized piano
performances
Gave system only rudimentary musical knowledge,
although choice of Bach chorale allowed the use
of generally unacceptable assumptions by lower
level KSs

12
Keith Martin (1996 a)

Front-end system used short-time Fourier
transform on input signal
Equivalent to a filter bank that is a gross
approximation the way the human cochlea processes
auditory signals
Blackboard system fed sets of associated onset
times, frequencies and amplitudes

13
Keith Martin (1996 a)

Knowledge sources made five classes of
hierarchally organized hypotheses
Tracks
Partials
Notes
Intervals
Chords

14
Keith Martin (1996 a)

Three types of knowledge sources
Garbage collection
Physics
Musical practice
Thirteen knowledge sources in all
Each KS only authourized to make certain classes
of hypotheses

15
Keith Martin (1996 a)

KSs with access to upper-level hypotheses can put
pressure on KSs with lower-level access to make
certain hypotheses and vice versa
Example if the hypotheses have been made that
the notes C and G are present in a beat, a KS
with information about chords might put forward
the hypothesis that there is a C chord, thus
putting pressure on other KSs to find an E or Eb.
Used a sequential scheduler to coordinate KSs

16
Keith Martin (1996 b)

Automatic Transcription of Simple Polyphonic
Music Robust Front End Processing
Previous system often misidentified octaves
Attempted to improve performance by shifting
octave identification task from a top-down
process to a bottom-up process

17
Keith Martin (1996 b)

Proposes the use of log-lag correlograms in front
end
Models the inner hair cells in the cochlea with a
bank of filters
Determines pitch by measuring the periodic energy
in each filter channel as a function of lag
Correlograms now basic unit fed to blackboard
system
No definitive results as to which approach is
better

18
Kashino, Nadaki, Kinoshita and Tanaka (1995)

Application of Bayesian Probability Networks to
Music Scene Analysis
Work slightly preceded that of Martin
Used test patterns involving more than one
instrument
Uses principles of stream segregation from
auditory scene analysis
Implements more high-level musical knowledge
Uses Bayesian network instead of Martins simple
scheduler to coordinate KSs

19
Kashino, Nadaki, Kinoshita and Tanaka (1995)

Knowledge sources used
Chord transition dictionary
Chord-note relation
Chord naming rules
Tone memory
Timbre models
Human perception rules
Used very specific instrument timbres and musical
rules, so has limited general applicability

20
Kashino, Nadaki, Kinoshita and Tanaka (1995)

Tone memory frequency components of different
instruments played with different parameters
Found that the integration of tone memory with
the other KSs greatly improved success rates

21
Kashino, Nadaki, Kinoshita and Tanaka (1995)

Bayesian networks well known for finding good
solutions despite noisy input or missing data
Often used in implementing learning methods that
trade off prior belief in a hypothesis against
its agreement with current data
Therefore seem to be a good choice for
coordinating KSs

22
Kashino, Nadaki, Kinoshita and Tanaka (1995)

No experimental comparisons of this approach and
Martins simple scheduler
Only used simple test patterns rather than real
music

23
Kashino and Hagita (1996)

A Music Scene Analysis System with the MRF-Based
Information Integration Scheme
Suggests replacing Bayesian networks with Markov
Random Field hypothesis network
Successful in correcting two most common problems
in previous system
Misidentification of instruments
Incorrect octave labelling

24
Kashino and Hagita (1996)

MRF-based networks use simulated annealing to
converge to a low-energy state
MRF approach enables information to be integrated
on a multiply connected hypothesis network
Bayesian networks only allow singly connected
networks
Could now deal with two kinds of transition
information within a single hypothesis network
chord transitions
note transitions

25
Kashino and Hagita (1996)

Instrument and octave identification errors
corrected, but some new errors introduced
Overall, performed roughly 10 better than
Bayesian-based system at transcribing 3-part
arrangement of Auld Lang Syne
Still only had a recognition rate of 71.7

26
Kashino and Murase (1998)

Shifts some work away from blackboard system by
feeding it higher-level information
Simplifies and mathematically formalizes notion
of knowledge sources
Switches back to Bayesian network
Perhaps not truly a blackboard system anymore
Has very good recognition rate
Scalability of system is seriously compromised by
new approach

27
Kashino and Murase (1998)

Uses adaptive template matching
Implemented using a bank of filters arranged in
parallel and a number of templates corresponding
to particular notes played by particular
instruments
The correlation between the outputs of the
filters is calculated and a match is then made to
one of the templates

28
Kashino and Murase (1998)

Achieved recognition rate of 88.5 on real
recordings of piano, violin and flute
Including templates for many more instruments
could make adaptive template matching intractable
Particularly a problem for instruments with
Similar frequency spectra
A great deal of spectral variation from note to
note

29
Hainsworth and Macleod (2001)

Automatic Bass Line Transcription from
Polyphonic Music
Wanted to be able to extract a single given
instrument from an arbitrary musical signal
Contrast to previous approaches of using
recordings of only one instrument or a set of
pre-defined instruments

30
Hainsworth and Macleod (2001)

Chose to work with bass
Can filter out high frequencies
Notes usually fairly steady
Used simple mathematical relations to trim
hypotheses rather than a true blackboard system
Had a 78.7 success rate on a Miles Davis
recording

31
Bello and Sandler (2000)

Blackboard Systems and Top-Down Processing for
the Transcription of Simple Polyphonic Music
Return to a true blackboard system
Based on Martins implementation, using a
conventional scheduler
Refines knowledge sources and adds high-level
musical knowledge
Implements one of knowledge sources as a neural
network

32
Bello and Sandler (2000)

The chord recognizer KS is a feedworard network
Trained using the spectrograph of different
chords of a piano
Trained network fed a spectrograph and outputs
possible chords
Can therefore output more than one hypothesis at
each iteration
Gives other KSs more information and allows
parallel exploration of solution space

33
Bello and Sandler (2000)

Could automatically retrain network to recognize
spectrograph of other instruments with no manual
modifications needed
Preliminary testing showed tendency to
misidentify octaves and make incorrect
identification of note onsets
These problems could potentially be corrected by
signal processing system that feeds blackboard
system

34
Conclusions

Bass transcription system and more recent work of
Kashino useful for specific applications, but
limited potential for general transcription
purposes
True blackboard approach scales well and appears
to hold the most potential for general-purpose
polyphonic transcription

35
Conclusions

Use of adaptive learning in knowledge sources
seems promising
Interchangeable modules could be automatically
trained to specialize in different areas
Could have semi-automatic transcription, where
user chooses correct modules and system performs
transcription using them

Write a Comment

User Comments (0)