Title: Elementare Akustik
1Elementare Akustik
- nach
- http//www.ling.mq.edu.au/units/sph301/main/schedu
le.html - (nicht mehr im Netz)
2What is Sound?
Sound is a wave-like distortion of a physical
medium.
There are two classes of wave that can distort a
physical medium transverse waves longitudinal
waves.
In transverse waves, the movement of the elements
of the medium move orthogonally (at 90) to the
direction of movement of the wave.
A typical example of a transverse wave is a wave
pattern on the surface of a body of water.
In such a wave the molecules of water move up and
down whilst the wave front moves along the
surface of the water.
3An example of a transverse wave a wave induced
in a piece of string.
4Longitudinal waves
In longitudinal waves the elements of the medium
move back and forth in line with the direction
of propagation of the wave fronts.
In a spring a hand can induce a longitudinal wave
by periodically moving back and forth in line
with the direction of the spring.
This causes the regions of high and low spring
compression to move along the spring.
This movement propagates through the spring
producing a series of wavefronts which move
towards the fixed wall with a velocity v.
Individual parts of the spring only move
backwards and forwards short distances in the
direction of wave propagation.
This causes the coils to periodically come closer
to and further from adjacent coils than would be
the case for the spring at rest.
A longitudinal wave is a compression wave in
which particles move back and forth in the
direction of wavefront movement.
5 An example of a longitudinal wave a wave
induced in a spring.
6Sound is a longitudinal compression wave.
Sound is a longitudinal compression wave which
distorts a medium by creating moving fronts of
high and low particle compression.
Sound can occur in any medium (solid, liquid and
gas). Sound cannot occur in a vacuum as there is
no medium to compress.
Individual particles only move short distances
backward and forward in the direction of wave
propagation whilst the compression wave front can
move considerable distances.
Sound in air consists of consecutive regions of
higher and lower air pressure relative to
ambient air pressure (typically 1 atmosphere at
sea level). These fluctuations in air pressure
are extremely small relative to ambient air
pressure.
7Acoustic Units of Measurement
The wavelength (?) of a wave is the distance
between successive wave fronts (ie. peak-to-peak
distance). Wavelength is measured in metres (m).
The frequency (f) of a wave is the number of
times per second that a complete wave cycle
passes an observer. Frequency is measured in
Hertz (Hz) or /second (s-1) in basic units.
The period (T) of a wave is the time it takes for
one wave cycle to pass an observer. The period
is measured in seconds (s) or milliseconds (ms).
The speed or velocity of sound (c) is the number
of metres that a wave front can travel in a
second. The speed of sound is measured in
metres/second (m.s-1)
8Sound "Amplitude"
The human ear and the microphone (the main
artificial transducer of sound) both measure the
tiny changes in pressure that result from the
passage of a longitudinal wave through a medium.
The average air pressure at sea level is
approximately equivalent to the pressure exerted
by a column of mercury 76 cm high (in a
barometer) at 0C under standard gravity. This
is equivalent to 1 atm.
1 atm 1.013 x 105 Pa
The sound pressure that is only just perceivable
(ie. the threshold of hearing for a 1000 Hz
tone) is taken to be
2 x 10-5 Pa (ie. 20 µPa)
The threshold of pain (ie. the maximum sound
pressure that can be perceived without pain) is
about 100 Pa or about 1/1000 atm, which is
5,000,000 times the threshold sound pressure.
9The intensity of a sound, with a sound pressure
level of 20 µPa, is very close to
10-12 Watts.m-2.
The sensitivity of the ear to changes in
intensity is not related linearly to either
intensity or pressure.
The ear's sensitivity to sound intensity or sound
pressure is approximately logarithmic and
measured in deciBels (dB)
dB 10 x log10 (I1/I2)
The acoustic intensity, or average rate at which
work is being transferred through a unit area
(on the surface of the spherical wave front
radiating out from the source in all directions)
diminishes with distance in accordance with the
inverse square law
where I the intensity of a sound r the
distance from the source of the sound
10(No Transcript)
11A two dimensional simulation of the inverse
square law.
12Simple Harmonic Motion
A single cycle of a sine wave can be depicted as
if it were a point on a circle moving
anti-clockwise (they are mathematically
equivalent).
At its starting point (when the sine wave is
moving up from the baseline the point is at zero
degrees (or zero radians the 3 o'clock position
on the circle).
At the top of the sine wave's first peak it is
equivalent to being at the 90 (or p/2 radians
12 o'clock) position in the circle.
When the sine wave reaches the baseline on its
way down it is equivalent to the 180 (p
radians 9 o'clock) position.
When the sine wave reaches the bottom of the
first dip it is at 270 (3 p /2 radians 6
o'clock).
When it completes its first cycle it is back at
the starting point 360 0 (2p 0 radians).
13Simple Harmonic Motion
14Continuous Waveforms and Damping
A sine wave is a waveform generated by a system
that is characterised by simple harmonic motion.
An ideal sine wave which exhibits simple harmonic
motion looses no energy (or has its energy
replenished from outside the system).
A sound wave exhibiting these characteristics
would be a pure tone.
A continuous waveform - a pure tone
15Damping
The loss of energy in an oscillating system is
known as damping.
A damped waveform is non-continuous.
Damping is a characteristic of systems that
produce sounds with very complex spectral
patterns.
A non-continuous or damped waveform.
16Waveforms and Phase
Adding together two pure tones of 100 Hz and
500 Hz (and of different amplitudes).
17The vast majority of natural sounds are not pure
tones but are complex sounds that can be thought
of as the combination of two or more pure tones.
The diagram shows the effect of adding two pure
tones, one of 100 Hz and the other of 500 Hz.
The 500 Hz tone has half the sound pressure
level of the 100 Hz tone.
In the bottom part of the diagram we can see the
two pure tones as dashed lines. A simple
addition of the dashed lines results in the
unbroken line.
The unbroken line clearly has a more complex
pattern than either of the two pure tones.
18The complex pattern repeats with the same period
as the 100 Hz tone.
100 Hz is the highest common integer factor of
the frequencies of the two tones.
The period (and therefore the frequency) of a
complex wave is always equal to the period (or
frequency) of the highest common factor of the
sine waves being added to it.
The repetition frequency of the complex pattern
can be called its fundamental frequency (F0 or
F0).
19Adding together three pure tones of 100 Hz,
200 Hz and 300 Hz.
20The three pure tones at 100, 200 and 300 Hz are
of different amplitudes. They all start from the
0º position.
The highest common factor of 100, 200 and 300 is
100 and so the resultant complex wave has a
fundamental frequency of 100 Hz.
In tones with non-zero phase relationships the
difference in phase results in a totally
different complex wave shape.
In sounds with a continuous musical tone the
human ear is insensitive to phase differences.
What the ear picks up is the frequency and
amplitude characteristics.
21Speech Waveforms
There are two types of speech sound source
periodic vibration of the vocal folds resulting
in voiced speech
aperiodic sound produced by turbulence at some
constriction in the vocal tract resulting in
voiceless speech
These two sound sources are modified by the
frequency-selective (filtering) effects of
different vocal tract shapes to produce the
various sounds of speech.
The voiced source can be filtered ("modulated")
by the position of the tongue, lips and velum
22Close-up (40 ms) views of the waveforms of one
voiceless fricative (/h/) and 3 vowel tokens
23The sound /h/ is aperiodic.
The three vowel sounds are periodic. Their
patterns are repeated at regular intervals.
The period of these patterns is about 10 ms
(1/100 secs) and so their frequency is about 100
Hz.
Each repetition or period of these patterns
corresponds to one glottal cycle, or one cycle
of vocal fold opening and closing in the larynx.
An F0 (fundamental frequency) of 100 Hz is a
normal value for an adult male voice.
The more familiar term pitch refers to the way we
perceive F0. A voice with a high-sounding pitch
has a high F0.
24Close-up (40 ms) views of the waveforms of four
voiced consonants
25Close-ups of the fricative /z/ illustrating
varying degrees of source mixing
26Three long vowels in an /h_d/ context
27Identification of Speech Waveforms
Phones / phonems, e.g. vowels contrast in an
identical environment.
The differences between the waveforms are mainly
due to the differences between the waveforms of
the vowels.
Waveforms can tell you that you are looking at a
vowel, but they can't reliably identify the
vowel.
The intensity of the vowels rises rapidly at the
start, reaches a peak by about 1/4 of the way
through the vowel and then gradually drops.
28Three English voiceless oral stops in CV context
29All three stops commence with a burst.
The burst occurs when a build up of air pressure
is suddenly released.
The bursts are very short (about 1-5 ms) and are
followed by about 100 ms of aspiration (or
fricative-like voiceless sound).
30Waveforms of two of the English voiceless
fricatives in CV (consonant-vocal) context
31Voiceless fricatives are aperiodic, which means
that they don't consist of periodically
repeating patterns as occurs in voiced sounds.
The fricative aspiration in these two examples is
very long, 250 to 300 ms, compared to the
aspiration of the voiceless stops.
32Analog and Digital Sound
Sound has properties (the dimensions of
frequency, intensity, time and phase) that exist
in the real world as infinite continua of
infinitesimal changes.
"Representations" of sound are the result of
transformations of sound into other analog or
digital forms.
Sound can be recreated from these
representatations with the appropriate
technology.
Until the invention of the digital computer all
representations of speech sounds were analog
signals.
33Transduction
Transduction is the conversion of a signal from
one analog form into another.
A device that transforms a signal from one form
into another is called a transducer.
Microphones and audio speakers are transducers.
Sound is transduced into an electrical signal by
a microphone.
In this electrical signal, continuously changing
voltage is the analog representation of
continuously changing sound pressure level.
This electrical signal is transduced back into
sound via a loud speaker.
The ear is also a transducer that converts sound
into neural signals.
34Digitisation Sampling and Quantisation
Windowing
Acoustic analyses attempt to extract the sine
waves that add up to produce the variations
evident in the waveform.
To select a series of speech samples for spectral
analysis we need to "window" the original
waveform.
The simplest window is a "rectangular window".
A rectangular window has a starting point "t1"
and an end point "t2" with all values between t1
and t2 multiplied by one and all values before t1
or after t2 multiplied by zero.
A rectangular window has a complex spectrum of
its own which contaminates the spectrum of
speech.
35Rectangular filter
Filter / Fenster
36Hanning window
A Hanning window is a member of a family of
windows known as raised cosine windows.
An Hanning function is frequently used to reduce
aliasing (static distortion resulting from a low
sampling rate).
This class of windows has no significant effect
on the shape of the spectrum of the resulting
windowed speech.
These windows are often used during the frequency
analysis of speech sounds.
37Hanning filter
Filter / Fenster
38Digitisation
The basic digitisation hardware is an
analog-to-digital converter.
It takes snapshots of an input analog signal at
regular intervals outputting a number which is
closest to the magnitude of the snapshot
measurement.
Taking a series of snapshots of a signal can only
capture an approximation of the original.
The sampling frequency or sampling rate is a
measure of the number of snapshots taken from
the signal each second.
The absolute minimum number of samples per cycle
needed to properly reproduce a sinusoid is two -
one at the peak, one at the trough.
The sampling frequency should be at least twice
the frequency of the sinusoid being digitised
the Nyquist Frequency.
In studying speech recorded in quiet conditions
we often use a sampling frequency of 20000Hz
which gives information up to 10000Hz.
39Untersuchung von Tonsequenzen Samples
40Spectra
A two-dimensional spectrum is effectively a
snapshot of the spectrum of a sound at one point
in time.
This "point" in time is always a window of some
length.
Most often the amplitude axis will be in
deciBels (dB).
The frequency axis is usually in Hertz (Hz) or
kiloHertz (kHz).
41Line Spectra
A line spectrum is a spectral representation that
displays the frequencies and relative
intensities of the component sine waves.
Each sine wave is displayed as a single vertical
line placed at the appropriate frequency on the
x-axis.
The height of the line represents the amplitude
of the component sine wave.
The amplitude is usually displayed as a relative
sound pressure level (ie. in Pascals) or as a
deciBel value.
42Fourier Transforms
Fourier Transforms remain the primary method for
carrying out frequency analyses of sounds and
other phenomena.
The Fourier transform transforms a time domain
signal into a frequency domain representation of
that signal.
This means that it generates a description of the
distribution of the energy in the signal as a
function of frequency.
This is normally displayed as a plot of frequency
(x-axis) against amplitude (y-axis) called a
spectrum.
In digital signal processing the Fourier
transform is almost always performed using an
algorithm called the Fast Fourier Transform or
FFT.
43Fast Fourier Transform (FFT) of the vowel in the
word "heard"
44Linear Prediction Coefficient (LPC) analysis
A point of specific interest are the major
spectral peaks (formants) which correspond to
the resonant frequencies of the vocal tract.
Linear Prediction Coefficient (LPC) analysis
attempts to predict the major spectral peaks
(formants) seen in the Fourier transform.
The resulting LPC spectrum is a smoothed spectrum
with the peaks representing the formants
(resulting from the vocal tract resonances) of
the spectrum of a vowel or vowel-like consonant.
45An LPC analysis of the vowel of heard
46Combined FFT and LPC analysis of the vowel in
heard
47Spectrograms Time, Frequency and Intensity
A spectrograph is a machine or a computer
algorithm that performs a series of spectral
analyses at different times and then displays
them using a three dimensional display of time,
frequency and amplitude.
In most cases time is displayed on the X-axis,
frequency is displayed on the Y-axis and
amplitude is displayed as variations on greyscale
darkness or of colour.
The speech spectrograph consists of a series of
band pass (BP) filters.
A band pass filter permits frequency components
between two cut-off frequencies to pass
unattenuated and attenuates frequency components
below the lower (HP) cut-off frequency and above
the higher (LP) cut-off frequency.
48Broad band spectrogram of the word "heard" spoken
by an adult male speaker of Australian English
49Narrow band spectrogram of the word "heard"
50Speech Production Source-Filter Theory
The source-filter theory describes speech
production as a two stage process involving the
generation of a sound source, which is then
shaped or filtered by the resonant properties of
the vocal tract.
51Sound sources
Sound sources can be either periodic or aperiodic.
Glottal sound sources can be periodic (voiced),
aperiodic (whisper and /h/) or mixed (eg.
breathy voice).
Supra-glottal sound sources that are used
contrastively in speech are almost always
aperiodic (ie. random noise)
Most of the filtering of a source spectrum is
carried out by that part of the vocal tract
anterior to the sound source.
In the case of a glottal source, the filter is
the entire supra-glottal vocal tract.
A voiced glottal source has its own spectrum
which includes spectral fine structure
(harmonics and some noise) and a characteristic
spectral slope.
In voiced speech the fundamental frequency
(perceived as vocal pitch) is a characteristic
of the glottal source acoustics whilst features
such as vowel formants are characteristics of
the vocal tract filter (resonances).
52Resonance
All physical objects resonate.
Some have simple, uniform resonance patterns and
some have complex resonance patterns.
Some resonators are highly damped and some are
weakly damped.
Some resonators may generate sound by exciting
adjacent air particles in the surrounding
medium.
For example, a guitar string vibrates upon being
plucked.
The guitar string collides with the surrounding
air and generates longitudinal pressure waves
(sound) in that medium.
Some resonators (eg. the supra-glottal vocal
tract) may act upon sound waves generated
elsewhere (eg. at the glottis) and selectively
permit some frequencies (the resonant
frequencies) to pass unattenuated whilst causing
other frequencies to be attenuated (reduced in
intensity) to some extent.
53Reflexion einer Welle und Resonanz
Auf zur Demo von mind.net
http//id.mind.net/zona/mstm/physics/waves/waves.
html
Unsere Themen Interference Wave
Reflection Standing Waves
54Standing waves and resonance
When a wave front is reflected it must reflect
with inversion so that the resultant wave
interference pattern always maintains zero
displacement at each barrier.
In all cases where the end of the resonating body
is free to move wave reflection occurs without
inversion.
Resonant frequencies have wavelengths that all
result in standing waves with nodes at the fixed
ends.
For a string fixed at both ends, the resonance
frequencies are all multiples of the first
resonance frequency.
55Nodes at two fixed ends
node
antinode
Four wavelengths that would result in nodes at
the two fixed ends. In descending order, the
wave's wavelength is 2L, L, 2L/3, L/2, where L
is the length of the string.
56Resonanz in einem Rohr, das an einer Seite offen
ist
Standing wave patterns for the first four
resonances in a tube open at one end and closed
at the other.
57Resonanz im Vokaltrakt Formanten von Sonoranten
(Vokalen und stimmhaften Konsonanten)
The vocal tract during the production of vowels
and vowel-like consonants can be described as a
tube open at one end, the mouth, and closed at
the other, the glottis.
Resonance in a tube of uniform cross-sectional
area is a physical characteristic of that tube.
It is dependent upon the length of that tube and
the open or closed state of the two ends.
What actually vibrates, however, is the medium
contained in that tube.
When we produce vowel sounds the resonances of
the vocal tract selectively enhance sound
vibrations close to the resonance frequencies and
selectively attenuate sound vibrations remote
from the resonance frequencies.
This results in peaks in the acoustic spectrum of
the resulting speech sound. These acoustic
spectral peaks are called formants, particularly
when they occur in vowels and vowel-like
consonants.
58Praat
Weiter geht es praktisch-experimentell mit Praat.
Sehen Sie sich zuerst einmal bei Praat um
http//www.fon.hum.uva.nl/praat/
http//www.germanistik.unibe.ch/siebenhaar/Siebenh
aarFolder/subfolder/PraatEinfuehrung/PraatManual/P
raatManual_home.html
59(No Transcript)