Elementare Akustik - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Elementare Akustik

Description:

Title: Elementare Akustik Author: Brigitte Endres-Niggemeyer Last modified by: xx xx Created Date: 10/14/2004 4:38:32 AM Document presentation format – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 60
Provided by: BrigitteE
Category:

less

Transcript and Presenter's Notes

Title: Elementare Akustik


1
Elementare Akustik
  • nach
  • http//www.ling.mq.edu.au/units/sph301/main/schedu
    le.html
  • (nicht mehr im Netz)

2
What is Sound?
Sound is a wave-like distortion of a physical
medium.
There are two classes of wave that can distort a
physical medium transverse waves longitudinal
waves.

In transverse waves, the movement of the elements
of the medium move orthogonally (at 90) to the
direction of movement of the wave.
A typical example of a transverse wave is a wave
pattern on the surface of a body of water.
In such a wave the molecules of water move up and
down whilst the wave front moves along the
surface of the water.
3
An example of a transverse wave a wave induced
in a piece of string.
4
Longitudinal waves
In longitudinal waves the elements of the medium
move back and forth in line with the direction
of propagation of the wave fronts.
In a spring a hand can induce a longitudinal wave
by periodically moving back and forth in line
with the direction of the spring.
This causes the regions of high and low spring
compression to move along the spring.
This movement propagates through the spring
producing a series of wavefronts which move
towards the fixed wall with a velocity v.
Individual parts of the spring only move
backwards and forwards short distances in the
direction of wave propagation.
This causes the coils to periodically come closer
to and further from adjacent coils than would be
the case for the spring at rest.
A longitudinal wave is a compression wave in
which particles move back and forth in the
direction of wavefront movement.
5
An example of a longitudinal wave a wave
induced in a spring.
6
Sound is a longitudinal compression wave.
Sound is a longitudinal compression wave which
distorts a medium by creating moving fronts of
high and low particle compression.
Sound can occur in any medium (solid, liquid and
gas). Sound cannot occur in a vacuum as there is
no medium to compress.
Individual particles only move short distances
backward and forward in the direction of wave
propagation whilst the compression wave front can
move considerable distances.
Sound in air consists of consecutive regions of
higher and lower air pressure relative to
ambient air pressure (typically 1 atmosphere at
sea level). These fluctuations in air pressure
are extremely small relative to ambient air
pressure.
7
Acoustic Units of Measurement
The wavelength (?) of a wave is the distance
between successive wave fronts (ie. peak-to-peak
distance). Wavelength is measured in metres (m).
The frequency (f) of a wave is the number of
times per second that a complete wave cycle
passes an observer. Frequency is measured in
Hertz (Hz) or /second (s-1) in basic units.
The period (T) of a wave is the time it takes for
one wave cycle to pass an observer. The period
is measured in seconds (s) or milliseconds (ms).
The speed or velocity of sound (c) is the number
of metres that a wave front can travel in a
second. The speed of sound is measured in
metres/second (m.s-1)
8
Sound "Amplitude"
The human ear and the microphone (the main
artificial transducer of sound) both measure the
tiny changes in pressure that result from the
passage of a longitudinal wave through a medium.
The average air pressure at sea level is
approximately equivalent to the pressure exerted
by a column of mercury 76 cm high (in a
barometer) at 0C under standard gravity. This
is equivalent to 1 atm.
1 atm 1.013 x 105 Pa
The sound pressure that is only just perceivable
(ie. the threshold of hearing for a 1000 Hz
tone) is taken to be
2 x 10-5 Pa (ie. 20 µPa)
The threshold of pain (ie. the maximum sound
pressure that can be perceived without pain) is
about 100 Pa or about 1/1000 atm, which is
5,000,000 times the threshold sound pressure.
9
The intensity of a sound, with a sound pressure
level of 20 µPa, is very close to
10-12 Watts.m-2.
The sensitivity of the ear to changes in
intensity is not related linearly to either
intensity or pressure.
The ear's sensitivity to sound intensity or sound
pressure is approximately logarithmic and
measured in deciBels (dB)
dB 10 x log10 (I1/I2)
The acoustic intensity, or average rate at which
work is being transferred through a unit area
(on the surface of the spherical wave front
radiating out from the source in all directions)
diminishes with distance in accordance with the
inverse square law
where I the intensity of a sound   r the
distance from the source of the sound
10
(No Transcript)
11
A two dimensional simulation of the inverse
square law.
12
Simple Harmonic Motion
A single cycle of a sine wave can be depicted as
if it were a point on a circle moving
anti-clockwise (they are mathematically
equivalent).
At its starting point (when the sine wave is
moving up from the baseline the point is at zero
degrees (or zero radians the 3 o'clock position
on the circle).
At the top of the sine wave's first peak it is
equivalent to being at the 90 (or p/2 radians
12 o'clock) position in the circle.
When the sine wave reaches the baseline on its
way down it is equivalent to the 180 (p
radians 9 o'clock) position.
When the sine wave reaches the bottom of the
first dip it is at 270 (3 p /2 radians 6
o'clock).
When it completes its first cycle it is back at
the starting point 360   0 (2p  0 radians).
13
Simple Harmonic Motion
14
Continuous Waveforms and Damping
A sine wave is a waveform generated by a system
that is characterised by simple harmonic motion.
An ideal sine wave which exhibits simple harmonic
motion looses no energy (or has its energy
replenished from outside the system).
A sound wave exhibiting these characteristics
would be a pure tone.
A continuous waveform - a pure tone
15
Damping
The loss of energy in an oscillating system is
known as damping.
A damped waveform is non-continuous.
Damping is a characteristic of systems that
produce sounds with very complex spectral
patterns.
A non-continuous or damped waveform.
16
Waveforms and Phase
Adding together two pure tones of 100 Hz and
500 Hz (and of different amplitudes).
17
The vast majority of natural sounds are not pure
tones but are complex sounds that can be thought
of as the combination of two or more pure tones.
The diagram shows the effect of adding two pure
tones, one of 100 Hz and the other of 500 Hz.
The 500 Hz tone has half the sound pressure
level of the 100 Hz tone.
In the bottom part of the diagram we can see the
two pure tones as dashed lines. A simple
addition of the dashed lines results in the
unbroken line.
The unbroken line clearly has a more complex
pattern than either of the two pure tones.
18
The complex pattern repeats with the same period
as the 100 Hz tone.
100 Hz is the highest common integer factor of
the frequencies of the two tones.
The period (and therefore the frequency) of a
complex wave is always equal to the period (or
frequency) of the highest common factor of the
sine waves being added to it.
The repetition frequency of the complex pattern
can be called its fundamental frequency (F0 or
F0).
19
Adding together three pure tones of 100 Hz,
200 Hz and 300 Hz.
20
The three pure tones at 100, 200 and 300 Hz are
of different amplitudes. They all start from the
0º position.
The highest common factor of 100, 200 and 300 is
100 and so the resultant complex wave has a
fundamental frequency of 100 Hz.
In tones with non-zero phase relationships the
difference in phase results in a totally
different complex wave shape.
In sounds with a continuous musical tone the
human ear is insensitive to phase differences.
What the ear picks up is the frequency and
amplitude characteristics.
21
Speech Waveforms
There are two types of speech sound source
periodic vibration of the vocal folds resulting
in voiced speech
aperiodic sound produced by turbulence at some
constriction in the vocal tract resulting in
voiceless speech
These two sound sources are modified by the
frequency-selective (filtering) effects of
different vocal tract shapes to produce the
various sounds of speech.
The voiced source can be filtered ("modulated")
by the position of the tongue, lips and velum
22
Close-up (40 ms) views of the waveforms of one
voiceless fricative (/h/) and 3 vowel tokens
23
The sound /h/ is aperiodic.
The three vowel sounds are periodic. Their
patterns are repeated at regular intervals.
The period of these patterns is about 10 ms
(1/100 secs) and so their frequency is about 100
Hz.
Each repetition or period of these patterns
corresponds to one glottal cycle, or one cycle
of vocal fold opening and closing in the larynx.
An F0 (fundamental frequency) of 100 Hz is a
normal value for an adult male voice.
The more familiar term pitch refers to the way we
perceive F0. A voice with a high-sounding pitch
has a high F0.
24
Close-up (40 ms) views of the waveforms of four
voiced consonants
25
Close-ups of the fricative /z/ illustrating
varying degrees of source mixing
26
Three long vowels in an /h_d/ context
27
Identification of Speech Waveforms
Phones / phonems, e.g. vowels contrast in an
identical environment.
The differences between the waveforms are mainly
due to the differences between the waveforms of
the vowels.
Waveforms can tell you that you are looking at a
vowel, but they can't reliably identify the
vowel.
The intensity of the vowels rises rapidly at the
start, reaches a peak by about 1/4 of the way
through the vowel and then gradually drops.
28
Three English voiceless oral stops in CV context
29
All three stops commence with a burst.
The burst occurs when a build up of air pressure
is suddenly released.
The bursts are very short (about 1-5 ms) and are
followed by about 100 ms of aspiration (or
fricative-like voiceless sound).
30
Waveforms of two of the English voiceless
fricatives in CV (consonant-vocal) context
31
Voiceless fricatives are aperiodic, which means
that they don't consist of periodically
repeating patterns as occurs in voiced sounds.
The fricative aspiration in these two examples is
very long, 250 to 300 ms, compared to the
aspiration of the voiceless stops.
32
Analog and Digital Sound
Sound has properties (the dimensions of
frequency, intensity, time and phase) that exist
in the real world as infinite continua of
infinitesimal changes.
"Representations" of sound are the result of
transformations of sound into other analog or
digital forms.
Sound can be recreated from these
representatations with the appropriate
technology.
Until the invention of the digital computer all
representations of speech sounds were analog
signals.
33
Transduction
Transduction is the conversion of a signal from
one analog form into another.
A device that transforms a signal from one form
into another is called a transducer.
Microphones and audio speakers are transducers.
Sound is transduced into an electrical signal by
a microphone.
In this electrical signal, continuously changing
voltage is the analog representation of
continuously changing sound pressure level.
This electrical signal is transduced back into
sound via a loud speaker.
The ear is also a transducer that converts sound
into neural signals.
34
Digitisation Sampling and Quantisation
Windowing
Acoustic analyses attempt to extract the sine
waves that add up to produce the variations
evident in the waveform.
To select a series of speech samples for spectral
analysis we need to "window" the original
waveform.
The simplest window is a "rectangular window".
A rectangular window has a starting point "t1"
and an end point "t2" with all values between t1
and t2 multiplied by one and all values before t1
or after t2 multiplied by zero.
A rectangular window has a complex spectrum of
its own which contaminates the spectrum of
speech.
35
Rectangular filter
Filter / Fenster
36
Hanning window
A Hanning window is a member of a family of
windows known as raised cosine windows.
An Hanning function is frequently used to reduce
aliasing (static distortion resulting from a low
sampling rate).
This class of windows has no significant effect
on the shape of the spectrum of the resulting
windowed speech.
These windows are often used during the frequency
analysis of speech sounds.
37
Hanning filter
Filter / Fenster
38
Digitisation
The basic digitisation hardware is an
analog-to-digital converter.
It takes snapshots of an input analog signal at
regular intervals outputting a number which is
closest to the magnitude of the snapshot
measurement.
Taking a series of snapshots of a signal can only
capture an approximation of the original.
The sampling frequency or sampling rate is a
measure of the number of snapshots taken from
the signal each second.
The absolute minimum number of samples per cycle
needed to properly reproduce a sinusoid is two -
one at the peak, one at the trough.
The sampling frequency should be at least twice
the frequency of the sinusoid being digitised
the Nyquist Frequency.
In studying speech recorded in quiet conditions
we often use a sampling frequency of 20000Hz
which gives information up to 10000Hz.
39
Untersuchung von Tonsequenzen Samples
40
Spectra
A two-dimensional spectrum is effectively a
snapshot of the spectrum of a sound at one point
in time.
This "point" in time is always a window of some
length.
Most often the amplitude axis will be in
deciBels (dB).
The frequency axis is usually in Hertz (Hz) or
kiloHertz (kHz).
41
Line Spectra
A line spectrum is a spectral representation that
displays the frequencies and relative
intensities of the component sine waves.
Each sine wave is displayed as a single vertical
line placed at the appropriate frequency on the
x-axis.
The height of the line represents the amplitude
of the component sine wave.
The amplitude is usually displayed as a relative
sound pressure level (ie. in Pascals) or as a
deciBel value.
42
Fourier Transforms
Fourier Transforms remain the primary method for
carrying out frequency analyses of sounds and
other phenomena.
The Fourier transform transforms a time domain
signal into a frequency domain representation of
that signal.
This means that it generates a description of the
distribution of the energy in the signal as a
function of frequency.
This is normally displayed as a plot of frequency
(x-axis) against amplitude (y-axis) called a
spectrum.
In digital signal processing the Fourier
transform is almost always performed using an
algorithm called the Fast Fourier Transform or
FFT.
43
Fast Fourier Transform (FFT) of the vowel in the
word "heard"
44
Linear Prediction Coefficient (LPC) analysis
A point of specific interest are the major
spectral peaks (formants) which correspond to
the resonant frequencies of the vocal tract.
Linear Prediction Coefficient (LPC) analysis
attempts to predict the major spectral peaks
(formants) seen in the Fourier transform.
The resulting LPC spectrum is a smoothed spectrum
with the peaks representing the formants
(resulting from the vocal tract resonances) of
the spectrum of a vowel or vowel-like consonant.
45
An LPC analysis of the vowel of heard
46
Combined FFT and LPC analysis of the vowel in
heard
47
Spectrograms Time, Frequency and Intensity
A spectrograph is a machine or a computer
algorithm that performs a series of spectral
analyses at different times and then displays
them using a three dimensional display of time,
frequency and amplitude.
In most cases time is displayed on the X-axis,
frequency is displayed on the Y-axis and
amplitude is displayed as variations on greyscale
darkness or of colour.
The speech spectrograph consists of a series of
band pass (BP) filters.
A band pass filter permits frequency components
between two cut-off frequencies to pass
unattenuated and attenuates frequency components
below the lower (HP) cut-off frequency and above
the higher (LP) cut-off frequency.
48
Broad band spectrogram of the word "heard" spoken
by an adult male speaker of Australian English
49
Narrow band spectrogram of the word "heard"
50
Speech Production Source-Filter Theory
The source-filter theory describes speech
production as a two stage process involving the
generation of a sound source, which is then
shaped or filtered by the resonant properties of
the vocal tract.
51
Sound sources
Sound sources can be either periodic or aperiodic.
Glottal sound sources can be periodic (voiced),
aperiodic (whisper and /h/) or mixed (eg.
breathy voice).
Supra-glottal sound sources that are used
contrastively in speech are almost always
aperiodic (ie. random noise)
Most of the filtering of a source spectrum is
carried out by that part of the vocal tract
anterior to the sound source.
In the case of a glottal source, the filter is
the entire supra-glottal vocal tract.
A voiced glottal source has its own spectrum
which includes spectral fine structure
(harmonics and some noise) and a characteristic
spectral slope.
In voiced speech the fundamental frequency
(perceived as vocal pitch) is a characteristic
of the glottal source acoustics whilst features
such as vowel formants are characteristics of
the vocal tract filter (resonances).
52
Resonance
All physical objects resonate.
Some have simple, uniform resonance patterns and
some have complex resonance patterns.
Some resonators are highly damped and some are
weakly damped.
Some resonators may generate sound by exciting
adjacent air particles in the surrounding
medium.
For example, a guitar string vibrates upon being
plucked.
The guitar string collides with the surrounding
air and generates longitudinal pressure waves
(sound) in that medium.
Some resonators (eg. the supra-glottal vocal
tract) may act upon sound waves generated
elsewhere (eg. at the glottis) and selectively
permit some frequencies (the resonant
frequencies) to pass unattenuated whilst causing
other frequencies to be attenuated (reduced in
intensity) to some extent.
53
Reflexion einer Welle und Resonanz
Auf zur Demo von mind.net
http//id.mind.net/zona/mstm/physics/waves/waves.
html
Unsere Themen Interference Wave
Reflection Standing Waves
54
Standing waves and resonance
When a wave front is reflected it must reflect
with inversion so that the resultant wave
interference pattern always maintains zero
displacement at each barrier.
In all cases where the end of the resonating body
is free to move wave reflection occurs without
inversion.
Resonant frequencies have wavelengths that all
result in standing waves with nodes at the fixed
ends.
For a string fixed at both ends, the resonance
frequencies are all multiples of the first
resonance frequency.
55
Nodes at two fixed ends
node
antinode
Four wavelengths that would result in nodes at
the two fixed ends. In descending order, the
wave's wavelength is 2L, L, 2L/3, L/2, where L
is the length of the string.
56
Resonanz in einem Rohr, das an einer Seite offen
ist
Standing wave patterns for the first four
resonances in a tube open at one end and closed
at the other.
57
Resonanz im Vokaltrakt Formanten von Sonoranten
(Vokalen und stimmhaften Konsonanten)
The vocal tract during the production of vowels
and vowel-like consonants can be described as a
tube open at one end, the mouth, and closed at
the other, the glottis.
Resonance in a tube of uniform cross-sectional
area is a physical characteristic of that tube.
It is dependent upon the length of that tube and
the open or closed state of the two ends.
What actually vibrates, however, is the medium
contained in that tube.
When we produce vowel sounds the resonances of
the vocal tract selectively enhance sound
vibrations close to the resonance frequencies and
selectively attenuate sound vibrations remote
from the resonance frequencies.
This results in peaks in the acoustic spectrum of
the resulting speech sound. These acoustic
spectral peaks are called formants, particularly
when they occur in vowels and vowel-like
consonants.
58
Praat
Weiter geht es praktisch-experimentell mit Praat.
Sehen Sie sich zuerst einmal bei Praat um
http//www.fon.hum.uva.nl/praat/
http//www.germanistik.unibe.ch/siebenhaar/Siebenh
aarFolder/subfolder/PraatEinfuehrung/PraatManual/P
raatManual_home.html
59
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com