Title: Information Coding for Transmission VoiceSpeechImage Coding, error protection and encryption
1Information Coding for Transmission
(Voice/Speech/Image Coding,error protection and
encryption)
- Southern Methodist University
- EETS8320
- Fall 2005
- Lecture 10
- Slides only. (No notes.)
2Improved Transmission
- We have considered improving the transmission
medium - Fiber optics is superior to wire/cable in many
aspects - Radio, in both microwave point-to-point and
cellular/PCS has many advantages (portability,
for example). - Microwave channels (little discussed in this
course) have good digital accuracy but no
portability. - UHF cellular/PCS radio channels have portability
but are usually error filled - Most transmission media can benefit from
well-designed coding and modulation - Efficient (high data rate) use of the bandwidth
- Example Makes the difference between feasible
video or image transmission via data compression,
vs. economically infeasible analog transmission - Two capabilities particularly valuable in
cellular/PCS radio - Error-free or reduced errors in an intrinsically
high error channel - Privacy and secrecy via encryption
3Source vs. Channel Coding
- Source Coding1 Data Compression via removing
known or redundant information - Lossy Coding exploit properties of eye and ear
to omit non-essential data. Typical examples
include - Speech
- The ear does not require perfect audio waveforms,
accepts nearly correct audio frequency power
spectrum - Vision
- tri-stimulus 3-color representation of complete
visible spectrum - Missing color is acceptable in fine detail
areas of image - Gray/Color Pictures (and video coded via DCT
etc.) - Lossless Coding allows perfect reconstruction of
binary data - Black/White Pictures (particularly FAX)
- Text and data files in general (Lempel-Ziv-Welch
and PKZip lossless coding) - Note 1 source code has different meaning in
computer programming - DCTDiscrete Cosine Transform
4Channel Coding
- Encode information (usually with extra
intentional redundant information inserted) to
reduce or correct errors due to (statistically
described) channel errors - Also use a modulation method with minimum analog
bandwidth, particularly for radio, and for data
modems via telephone channels - Error Correcting Codes
- A limited number of bit errors and their
locations can be detected, and once an erroneous
bit location is known, that bit can be inverted - Error Detecting Codes
- Presence of errors is known, but their exact
location is not known. Different patterns of many
bit errors all produce the same indication of
error, so a unique correction cannot be made. - Often combined with an Automatic Request to
Re-transmit (ARQ) algorithm for delay-able data - Any error protection scheme can be fooled by
sufficiently high error rate - So long as distinct messages exist, a
sufficiently high error rate can convert one
valid message into another! - Many codes act as error correction codes for low
error rates, but error detection codes for high
error rates.
5Speech Acoustics and Terminology
- The physics of speech production is studied by
two diverse groups of people - 1. Speech and singing therapists, foreign
language teachers, medical related professionals,
etc. - 2. Engineers and scientists interested in
encoding, storing, or synthesizing speech
electronically. - A peculiar jargon has developed historically from
group 1 in these areas of study - Knowledge of this jargon is needed to allow
reading the literature in this topic - Alexander Graham Bell, the inventor of the
telephone, and his father Alexander Melville
Bell, were both speech therapists - A.M. Bell developed a once popular type of
phonetic alphabet, called visible speech, which
is no longer used (supplanted by IPA). - Many historians believe that Professor Higgins,
in the G.B.Shaw play Pygmalion (source of the
musical My Fair Lady) was based on A.M.Bell, or
possibly Daniel Jones or Henry Sweet, two other
contemporary British phoneticians.
6Phonetic Alphabets
- While many traditional alphabets (Spanish,
German, Russian, etc.) come very close to one
sound for each symbol, there are many
inconsistencies in English, French and other
widely used languages. - English partially compensates by having simpler
grammar and a huge existing data base of
literature - The International Phonetic Association (IPA)
developed the International Phonetic Alphabet
(also abbreviated IPA) about 1895, and has
continuously revised it, most recently in 1998.
Used today by both types of researchers. - Some of the original inventors were Paul Passy
(France), J.O.H. Jesperson (Denmark), Daniel
Jones (England) - IPA is based on use of moveable type printing
- After using appropriate type characters from
various European languages (Latin, Greek,
Scandinavian languages, etc.), additional needed
symbols use existing characters inverted or
reversed. A few characters are specially invented
only for IPA. - Computer typesetting and word processing has made
use of IPA easier in todays technology. Not well
adapted to mechanical typewriters.
7IPA Principles
- The sounds themselves must be learned from a
recording, or a speaker who knows them. The chart
is only a memory aid! - Distinguishable sounds of speech are called
phonemes (explained later) - IPA uses one symbol for each phoneme
- Distinctions between phoneme use in different
languages may be indicated by special characters
or diacritical marks (e.g. dental T) - There is no traditional alphabetic order. The IPA
chart for consonants is organized according to
the point of articulation (columns) and the
type of sound characteristic (rows). - Within each chart entry having two characters,
the left character is unvoiced and the right
character is voiced - Vowels are classified in a modified Vieto
triangle diagram where each point indicates the
position of the highest part of the tongue - The lips are at left, the pharynx at right in the
IPA charts and diagrams - When two vowel symbols are shown, the right
symbol indicates the same tongue position with
extended lips. For example, y (front, close)
represents the sound in French rue (street) or
German ü or y, not the sound in English yes.
8Reading IPA
- Prosody markings
- Stress (lengthening, loudness and/or pitch
change) of a syllable is indicated by
'supra-seg'mental marks- small 'vertical bar at
top of line of text be'fore the syllable. - Palatalization (using semivowels like j as
pronounced in German), - Nasalization using the tilde , etc.
- Tone pitch (audio frequency) is indicated by tone
marks. Important in Chinese, Vietnamese, Swedish,
and in English questions, etc. - Much Latin terminology is used on the charts
- Bilabial (both lips), labio-dental (lip and
teeth), velar (velum soft roof-of-mouth area
behind the palate), uvula (hanging fold of
membrane at rear of velum).. - Nasal (with air passage to nose open at back of
mouth), trill (vibration of the tongue or lips),
lateral (air and sound flow around the sides of
the tongue), approximant (e.g., the tongue is
near but not touching the roof of the mouth) - IPA is used in many foreign language textbooks,
in some multi-lingual dictionaries (not US
Merriam-Webster dictionary). - Many textbooks use older versions of IPA or make
arbitrary alterations in IPA. For example, y
represents the sound inyes in some textbooks.
Most foreign phrase books for casual readers use
arbitrary phonetic spelling used in that one
phrase book only.
9Phonemes
- A phoneme is a distinguishable sound element in
at least one specific language - The pair test when two words differ in only one
element of their sound, and they have
distinguishable meaning to a native user, then
the two distinct sounds are distinct phonemes.
Example bed vs. bet - If the two sounds only produce accepted variants
(regional accents for example) then they are not
distinct phonemes in that language. - Example 1 The vowel sounds in the words Ma, me,
my, Moe, Moo are all phonemically distinct in
standard North American English. For a
southern accent the vowels in Ma and my are not
phonemically distinct (both /a/). Southerners
often speak of their mother as mom or mommy
to avoid confusion. - Example 2 palatalization (raising the tongue
near the palate at the beginning or end of a
vowel) is a regional or non-distinctive feature
in English. The word new can be pronounced either
/nju/ (British or radio announcer English) or
/nu/. Traditional English writing does not
indicate this distinction either. But in Russian,
palatal vowels are distinct phonemes from
non-palatal, even in traditional writing /a/
(but, and) differs from /ja/ (I, first person
subject case). The first is written like Latin a,
the second like a backwards R. - One phoneme may describe multiple allophones.
10Allophones1
- Many phonemes have different specific forms
- Example the /t/ in top is an opening form,
while the /t/ in pot is a closing form. The /p/
in pot is an opening form (which also is
immediately followed by aspiration - a puff of
breath - in English), while the /p/ in top is a
closing form. - We have just pointed out two allophones of /p/
and two allophones of /t/. The allophones involve
the same parts of the mouth, but in this case the
microscopic time scale of events is reversed
(opening vs. closing of air flow). - When we want to represent specific allophones,
they are written in square brackets rather
than the // used for phonemes. - Choice of the correct allophone is usually
uniquely dependent on context (e.g., beginning of
word, end of word, etc.) but automatic for a
native speaker (so-called hidden rules) - A speech synthesis machine must specifically
use the correct allophone. - Note 1. In Québec, Canada, the term allophone
refers to a person whose native language is
neither French nor English
11Formants
- When the lips, jaw and tongue are held in a
particular position, the air space between the
glottis (vocal cords) to the lips forms several
coupled acoustic resonant cavities. Each cavity
has one or more resonant frequencies, called
formant frequencies in phonetics. - The toot sound made by blowing air across the
neck of a bottle is due to oscillation of the air
contained in the bottle. A taller bottle has
lower resonant frequencies (lower formant
frequencies). - 19th century phoneticians would listen for the
pitch (audio frequency) of the thump sound
produced by tapping the cheek while holding mouth
and jaw in a particular phoneme configuration - In most cases, each phoneme has several specific
formant frequencies. Each formant frequency
corresponds to a configuration of acoustic
standing waves having different frequencies. - The lowest resonant frequency is called the first
formant, the next higher resonant frequency is
the second formant, and so forth.
12Formants Determine Recognizable Phoneme
- Some formants/resonances have more loss than
others. - Most formant frequencies have low Q (self
oscillations die out quickly, and sound like a
thump instead of a ping) - The formant frequencies do not change with the
pitch of the voice (as in singing, raised pitch
at the end of question phrases, etc.) This is an
important reason why we can recognize the same
phoneme (a particular vowel, for example)
regardless of whether it is spoken by a low,
medium or high pitch voice, or whether it is sung
or spoken. - Formant frequencies are slightly different for a
person with a large head vs. another person with
a small head. However, head size differences
between the smallest (a newborn infant) and the
largest adult differ by less than 2 to 1. (Sizes
of the trunk, arms and legs change much more
significantly during growth.) The difference in
the singing pitch (frequency) of voice may range
over 64 to 1 when comparing the lowest (basso)
male voice and the high pitched (soprano) voice
of small children. - Most voice recognition and speech recognition
(such as speech-to-text) systems recognize
distinct phonemes by examining the short term
audio frequency power spectrum of the speech. - In engineering terms, resonant oscillations that
die out rapidly have low Q
13Formant Display and Analysis
- 20th century phoneticians find formant
frequencies by using short term audio spectrum
analyzers. One type of visible display is the
sound spectrogram on p. 126 of the Bellamy
reference book. - This type of sound spectrogram is often called
Visible Speech (named after A.M.Bells 19th
century phonetic alphabet, but quite different). - Sound spectrograms display audio power density by
means of different shades of gray/black darker
gray indicates higher power. - Time is shown on the horizontal axis. Short term
audio frequency is shown on the vertical axis. - Calculation of the short term audio frequency
power may be done using wider audio sub-bands
with very short successive time windows, or - Alternatively, calculation uses longer successive
time windows but very narrow sub-bands of
frequency. - The latter display indicates the individual
fundamental and harmonic frequency components
more clearly and separately on the display.
14Speech Production I
- The mechanism of human speech production can be
viewed as two processes - A sound source producing one or both of these
characteristics - Voiced sounds originating from periodic air
pulses flowing through the so-called vocal
cords (vocal folds) in the glottis or larynx
part of the throat - Un-voiced sounds, mostly from forcing air flow at
high velocity through a narrow opening. Called
fricative sounds by acoustic phoneticians called
turbulence by physicists and aeronautical
engineers - In a rising column of smoke (as from a cigarette)
shown in the sketch here, the straight flow
section is called laminar flow, and the curved
section is turbulent, indicating a random
breakup and circulation of the straight flow
pattern - Unvoiced sounds may originate at several
different places in the mouth or throat, while
voiced sound mainly originates in the larynx.
(Speech coders and synthesizers often cheat and
produce all sound sources at the equivalent of
the glottis, even when the actual fricative sound
source is, for example, just behind the teeth.)
Turbulent flow
Laminar flow
15Speech Production II
- Individual sounds may start or end either
suddenly or gradually. Sudden onset or ending of
many sounds is due to rapid opening/closing of
lips or the space between the tongue and roof of
mouth (the palate or the alveolar ridge behind
teeth) or the teeth themselves, or sometimes the
throat or larynx. - Sudden onset sounds are called plosives by
phoneticians. Examples include P, B, T, K - In some languages, (English, Chinese) it is
customary to exhale a short puff of breath
(aspiration) after opening the air flow for an
unvoiced sound. In others (German) this does not
happen. Therefore English speaking people
perceive one aspect of a German accent as sound
of G in place of K. - German speaker produces non-voiced sound, but
omits the aspiration! - Some plosives (T,D) are produced at the teeth in
romance and slavic languages, but at the alveolar
ridge in English. English speaking people
perceive this dental sound as one aspect of a
foreign accent (or a New York City accent, since
many European immigrants there preserved a dental
t in their English speech). Traditional writing
does not distinguish these two different
phonemes, but IPA does.
16Speech Production III
- Phonemes produced in the larynx or the back of
the mouth (pharynx) are designated by symbols
similar to the question mark (glottal plosive)
and a left-right reversed question mark
(pharyngeal fricative), respectively, in IPA
(International Phonetic Alphabet) - The glottal plosive occurs in most languages, but
only the Semitic languages (modern Arabic,
ancient Hebrew) and Hawaiian recognize and
have a traditional symbol for it - The phrases a nice man and an ice man differ
in speaking due to the glottal plosive (glottal
stop) before the word ice. Similar stops
between un-connected vowels in many other
languages also utilize (but do not traditionally
recognize in writing) the glottal stop. For
example, the English word pre-existing or
Spanish contra-ataque - Small children learn to imitate the necessary
mouth configuration without diagrams, yet adults
with already fixed speech habits in one language
find this more difficult to learn and benefit
from use of pictures of the configuration of the
mouth and tongue.
17Written Language
- Who was that forgotten genius who first invented
written language, whereby we can read the
innermost thoughts of people far away or long
gone from the earth? -- Galileo Galilei, ca.
1660 (paraphrased) - Galileo is best known as an early physicist and
astronomer, but he was a renaissance man with
interests in languages and philology as well. - Traditional written language is the basis of much
data communication. - Machine inter-conversion between spoken language
(sound) and traditional written language,
apparently so simple for humans, has been a
difficult goal of many telecommunications
engineers - Accurate automatic speech-to-text conversion
could be much less costly than use of a human
stenographer! - The bit rate for a voice telephone channel is
64000 b/s. If a stenographer transcribes the same
speech into writing and sends it via traditional
character codes, the data rate is only about 100
bits/second! - In many cases a spoken command or announcement is
needed in a telephone connection to a human user.
Automatic text-to-speech synthesis is desirable
for this. Practical text-to-speech is available.
18Phonetics and Traditional Writing
- Pictographic writing was used in prehistory, and
is known from ancient artifacts, and is used
today in Chinese Han and Japanese Kanji symbols. - Aside from 2000 Kanji pictograms taught in
Japanese schools today, other modern Japanese
writing uses a syllable alphabet of 50 symbols,
which has two stylistic forms called Hiragana and
Katakana. - Most other languages use a phonetic or
semi-phonetic writing method. - With the possible exception of Korean Hangul
(which was apparently invented independently)
most present day alphabetic writing systems are
descendents of the ancient Phoenician
proto-Semitic alphabetic - Even though the vocabulary and grammar of
Turkish, Russian, Arabic and English are quite
different, they each are written using an
alphabet that traces back historically to the
Phoenician alphabet . - Phoenicians lived where modern Lebanon is today.
They are mentioned in the Bible. They were great
seafaring merchants and introduced writing around
the Mediterranean and from there to parts of Asia
and Africa. - For many years most European Egyptologists
incorrectly assumed that ancient Egyptian
hieroglyphics were pictographic. Perhaps
pre-hieroglphs were pictographic, but French
linguist J.F. Champollion deciphered the
Rosetta stone (ca. 1820) when he realized that
they are alphabetic writing, evolved from earlier
pictograms.
19Sound-to-symbol Correspondence
- Except for some recently created artificial
written alphabets (e.g., Turkish in 1922, or
written alphabets created by 20th century
missionaries1 for previously unwritten
languages), and some reformed spelling rules
issued by national academies or similar
language purifiers, most modern languages have
somewhat irregular spelling rules - Seldom is there one sound for each symbol, one
symbol for each sound. - Historians believe that each traditional alphabet
started with a one-to-one phoneme to symbol
correspondence, and the irregularities crept into
the scheme at a later date when sound changes
occurred. - Pronunciation changes over the years, but written
forms still represent the old pronunciation - We know from historical information and the
languages of Scotland and Holland (the
Netherlands) that the English word light (and
other GH spellings which are silent in modern
English) was pronounced in Shakespeares 17th
century (when English spelling became frozen)
like the modern German word licht /lIxt/ which
has the same meaning. - 1 The Summer Institute of Linguistics in Dallas
is a modern center of linguistic research for
obscure languages which have no significant
commercial use, motivated by a desire to
communicate religious beliefs to speakers of
these languages.
20Spelling Irregularities Other Causes
- Many languages have regional dialects with
different sounds for the same word (or different
regional words for the same thing1) - When regional dialects become very different over
a long time, they may each become recognized as a
separate language - The modern romance languages (French, Spanish,
Italian, Romanian, Portuguese, etc.) are all
divergent dialects of Latin (following centuries
of changes). We know this from historical
documents. - A separate language is a dialect whose speakers
have their own army and navy. - anonymous - For better communication (TV,radio and movies)
some de facto standard dialect is often in use
mid-western US English, Parisian French, Egyptian
Arabic for movies but standard modern Arabic (an
artificial semi-classical Arabic dialect) for
newspaper writing. - 1 Examples A paper bag (so-called in most of
the USA) is called a paper sack in the south
and southeast. In the north, the word sack is
used only for a cloth (not paper) bag. The word
insurance is accented on the first syllable in
the south, but on the second syllable elsewhere.
In British pronunciation, written syllables
ending -er or -ar are mostly pronounced without
the /r/. Father is /faða/ in the Queens
English, but /faðø/ in North American English. A
guagua is a motor bus vehicle in Cuban Spanish,
but is a baby in other Latin American Spanish
dialects.
21New Languages Written With Old Alphabets
- When necessary to write down the words and sounds
of an unfamiliar language, most people naturally
do this using the alphabet that they already
know. - Repeatedly in history, when one group already
having a written alphabet meets another group
with a distinct language but no alphabet, the
alphabet of the first group is used to write the
sounds of the second groups language. - Many examples listed on the following slide.
- If the people in the first group know multiple
alphabets from different languages, they may use
bits and pieces of all these previous alphabets
to write the new language! (As Saints Cyril and
Methodius did to first write Russian using a
combination of Latin, Greek and Hebrew
characters.)
22Alphabet Borrowing
- When the written alphabet of one language is
applied to writing a second language which
contains sounds not present in the first
language, the writers either - make up a new symbol...
- Two Eastern Orthodox monks (later Saints), Cyril
(also called Kiril) and Methodius, invented
theSlavonic Russian alphabet about the year
863. To get enough unique characters, they used
symbols taken from Greek, Latin and Hebrew.
Russian writing is still very phonetic despite
later sound changes. Czar Peter (the Great)
European-ized the appearance of these Old
Church Slavonic letters producing the modern
Russian alphabet. Some Greek-root letters were
eliminated in 1918. - Many other Slavic language family alphabets are
based on Latin (Czech, Polish, etc.) as the
result of a historical religious division of
Europe between the Eastern Orthodox and Roman
Catholic Christian churches. - Find a new use for an old symbol
- use a symbol which represents a sound not present
(or not recognized) in the second language but
explicitly represented in the first language.
This symbol is no longer needed for its original
purpose. - The alif (first letter) of the Phoenician
alphabet represented a glottal stop consonant. In
Greek this sound was not recognized. Alif was
then used for the Greek vowel sound A. The
Phoenicians did not write most vowels explicitly
(like modern shorthand) because their language
had few word pairs which had a phonemic vowel
difference. - These inconsistencies between different
descendant forms of Phoenician writing thus arise
partly from the existence of different phonemes
in different languages
23Alphabet Notes
- Many languages that are not related by
vocabulary, grammar or history nevertheless use
an alphabet derived from Phoenician! - Phoenician writing was adapted to Arabic, Hebrew,
Greek and via Greek to Latin, and from Latin to
most European languages - Semitic languages followed the original
Phoenician method of writing only the consonants
explicitly. - Greek and Latin adapted or invented explicit
vowel symbols. - Modern Arabic and Hebrew use various dots
surrounding the consonants for vowel symbols (and
to make some consonant distinctions in Arabic).
Vowel markings are used in childrens or
religious documents and omitted in newspapers or
general printing. - Right-to-left direction of writing Phoenician
evolved first into alternate lines in both
directions and then finally left-to-right for
Latin. - Arabic is studied for religious reasons where the
Islamic religion is prevalent. Arabic characters
are used to write the non-related languages such
as Urdu (and Turkish, until the 1920s when Latin
characters were adopted, based primarily on
German spelling rules). - Due to the Roman empire, most European languages
are written using the Latin alphabet, with the
addition of new letters J, K, U and W.
Specific European languages also added more new
letters (æ, ð, ø, etc.) and diacritical marks (é,
ö, ç, etc.) - V was used for both the consonant V and the
modern vowel U by ancient Romans.
24Character Codes
- The first machine character codes were the
Baudot-Murray1 teletypewriter systems (now
labeled ITU-T alphabet code No. 2) - These are 5-bit binary codes used with early
teletypewriter machines and also designed to be
readable by ear for human telegraphers at low bit
rates. - These codes do not have the binary numerical
codes in the order corresponding to alphabetic
sequence (collating sequence), which is a problem
for many data processing operations. - ITU No.2 code survives today mostly via
teletypewriters for the deaf (TTY/TDD).
Historically used in TELEX networks. TTY/TDD
devices use a special modem via telephone lines.
Dialing 711 connects to a US state TRS service
center, where human communications assistants
connect with the destination and translate calls
(voice-TTY) for deaf people. - Most modern data processing systems use an 8-bit
(or 7-bit with 1 parity check bit) code related
to ASCII (American Standard Code for Information
Interchange). First historical application of
ASCII was TWX networks. - ISO-646 and ITU No. 5 code and ISO-8859-1 code
are derived from and are similar to ASCII except
for punctuation marks and additional characters. - Alphabetic ordering, sorting, data look up is
straightforward. - Several proprietary alphabetic codes such as IBM
EBCDIC, used in mainframe computers, have faded
into minor use due to the growth in popularity of
personal computers, which all use ASCII or some
variant - 1 Named for Emile Baudot (19th c. French inventor
of an early teleprinter machine) and Gilbert
Murray (20th c. US inventor of improved
teletypewriter)
25ASCII codes 0-127 and Extensions 128-255
Note other character assignments are used for
codes 128-255 in addition to those shown here.
Some characters do not display here due to
limitations in PowerPoint character set.
26Explanation of ASCII Codes 0-31
- 000 (nul) Null 016
(dle) Data link escape 001 (soh) Start of
heading 017 (dc1) Device control 1
002 (stx) Start of text 018
(dc2) Device control 2 003 (etx) End of
text 019 (dc3) Device control 3
004 (eot) End of transmit 020
(dc4) Device control 4 005 (enq) Enquiry
021 (nak) Neg. acknowledge 006
(ack) Acknowledge 022 (syn)
Synchronous idle 007 (bel) Bell
023 (etb) End trans. block 008
(bs) Back space 024 (can)
Cancel - 009 (tab) Hor. tab 025
(em) End medium - 010 (lf) Line feed 026
(eof) end of file(sub.) 011 (vt) Vert. tab
027 (esc) Escape - 012 (np) Form feed 028
(fs) File separator - 013 (cr) Carriage return 029
(gs) Group separator 014 (so) Shift out
030 (rs) Record separator
015 (si) Shift in 031
(us) Unit separator
032 Blank space
These character code meanings were used
historically for character oriented data
communications in the 1960s. Again, many computer
operating systems and file systems do not use all
these meanings today, but assign proprietary
meanings to many codes. For example, in Internet
browsers, 004 (control with D) indicates save
this URL in my directory list, a new meaning
not intended by the 1960 designers of the ASCII
codes. When simple parity check error-detection
code is used, only the ASCII codes from 0-127 are
used. When parity check methods (explained in a
later lecture) are not used, all codes up to 255
are used for characters.
27Syllabic Alphabets
- Japanese writing is the best known example of a
syllabic alphabet, (called a syllabary). In
schools, Japanese students learn 2000 Kanji
pictograms, borrowed from Chinese Han characters
but mostly pronounced with different sounds. All
other words are written using 50 syllabic
characters. The syllables ka, ke, ki, ko and ku
each have a distinct symbol, for example. The
non-vowel syllable n also has a symbol. - There are 50 Japanese syllable symbols. There
are two styles of writing syllable characters - Hiragana, used for most ordinary Japanese writing
or printing - Katakana, used somewhat like italic writing in
English -- that is, for foreign words and for
special emphasis. - There are two recognized English character
transliteration methods Romaji (official
government method) and Wade-Giles (historical
method) that are almost the same. - Arabic and Hebrew traditionally have characters
only for the consonants. (Both have a consonant
transliterated as w that sometimes is
pronounced u and is thus used as a vowel. ) - Thus, most printed material effectively has one
character per syllable in many cases. Arabic and
Hebrew have a special symbol to indicate that
there is no vowel in a syllable. In Arabic the
no-vowel symbol is named sukun, a small circle
above the consonant. In Hebrew, the symbol is
named shva, a symbol like an English colon
used below the consonant. (Shva is not silent in
the first syllable of a modern Hebrew word, but
instead has a central neutral vowel sound.) - Vowel marks are optional and used only for
religious texts or beginners documents. - You must know the vocabulary well to read such
defective (no vowel) writing correctly. - Certain types of English shorthand writing
(example speedwriting) omit vowels. You must
know vocabulary well to read these abbreviations.
28Hiragana Syllables
This alphabetic order is used in dictionaries
for foreigners. Supplementary marks (not shown)
are used to Indicate doubling of consecutive
letters, palatalization (y sound with
syllable), voicing (makes ka read as ga, Sa read
as za, etc.) Transliteration shown here is the
Wade-Giles system. In Romaji transliteration,
si is used Instead of shi, tiinstead of
chi, tu instead of tsu, and hu instead of
fu. Hu is the bilabial fricative sound In the
IPA chart. Sounds of syllables are close to those
represented to unsophisticated English speakers
by the Wade-Giles system.
29Arabic Alphabet
Chart shows only basic consonant letters. Vowels
(not shown) are indicated by means of dots and
other marks below or above the character. Note
that several characters have distinctive shapes
when used in an initial, medial or final
position, in order to provide links to the
surrounding letters, as for English handwriting.
30Hebrew Alphabet
lt Alphabetic order begins here. Read
right-to-left.
Chart shows only basic consonant letters. Vowels
(not shown) are indicated by means of dots and
other marks below, above or to the side of the
character. Character forms identified on this
chart as left kap, left mem, left nun, left peh,
left sadeh are final character writing styles
used only at the end of a word. In documents for
beginners, sin is distinguished from in (shin)
by a dot in the upper left corner vs. upper right
corner respectively. Number values for characters
are only used in ancient Hebrew. Modern Hebrew
and modern Arabic use Arabic numbers as used in
English.
31Changing Alphabets
- Chinese is still written primarily in the Han
pictographic form, although many characters are
now simplified. For use of the English
alphabet, the standard sound of each word is the
Putonghua (common language, or Mandarin)
pronunciation, based on the Beijing dialect and
taught in schools throughout China. - Most regional dialects have very different
sounds for each word. - Pinyin (phonetic spelling) was standardized in
1977, based on alphabetic transliteration rules
developed in Mongolia in the 1950s. The name of
the Chinese capital city is written Beijing in
pinyin, but was variously called Peiping, Pekin,
Peking in previous non-standard alphabetic
versions of Chinese. - Several computer software programs exist to allow
a typist to input Chinese on an English character
keyboard in pinyin spelling, and produce Han
pictogram results. Similarly, some computers
allow Japanese to be typed in Romaji spelling and
produce Kanji pictograms, Hiragana and Katakana
in traditional forms. - Azeri, the language of Azerbaijan, was written in
Arabic characters from approximately 900 to 1924.
Like the related Turkish language, Azeri was then
written using Latin characters. When Azerbaijan
became part of the U.S.S.R. in 1939, the Russian
government required the use of the Cyrillic
alphabet. On August 1, 2001, the Azerbaijan
government officially changed back to the Latin
alphabet. Public signs and newspapers are now
printed with both alphabets during the transition.
32Expansions of ASCII
- The existence of many essentially different
alphabets for natural languages has been
addressed differently in various historical
telecommunications and data processing systems - Chinese Han pictographs are tabulated in a
Chinese Telegraphic Code book, with a 5-digit
decimal number for each pictograph. Invented for
telegrams in the 19th century, this still has
some limited uses today. - As previously described, a standard Roman
letter phonetic transliteration (pinyin)
dictionary has been established for the
Putonghua-Mandarin spoken language of China.
Romanized text can be stored and communicated
like other alphabetic languages. - A standard romanization system (romaji) exists
for Japanese as well. - Pictographic writing systems have no obvious
alphabetic order - Chinese Han and Japanese Kanji pictographs are
placed in dictionaries according to the number of
pen strokes used to draw them, and the stroke
placement (lower left first, then upper left,
then upper right, lower right)
33Code Pages and Unicode
- In most word processing and similar computer
systems, 8-bit code pages of 256 characters each
are defined. - Decimal code values 0-31 are non-printing control
characters. In teletypewriters, these are used
for carriage return, horizontal tab, line feed,
etc. - Code values 32-255 are used for printing
characters. You can view this table in many word
processing systems from the insertgtsymbol pull
down menu in a word processor. - Different code pages (fonts) can be substituted
in different parts of a document for different
appearance or to handle different language
alphabets. - IS0-8859 is a standard with numerous code pages
for various languages (ISO- 8859-1 is so-called
Latin (English) ISO-8859-7 is Arabic, etc.) - Unicode 4.0 (ISO-106462003), a universal code
page set. It has 96248 printable characters,
each represented internally by a 32-bit binary
number, and includes most of ISO-8859 (with 24
binary zeros preceding each 8-bit code) as well
as approximately 18000 Chinese Han pictographs,
etc., etc. and other languages - In theory, one can write a multilingual document
in Unicode with no confusion. Portions of the
overall range of Unicode may be converted into
specific pre-existing limited code pages. Strong
support by Apple and Microsoft (in Windows NT and
Windows 2000 etc.) have made Unicode widely
available. - For more information on Unicode and how it can be
incorporated into data processing and
communication systems, see Internet URL
http//www.unicode.org - More unofficial information at http//czyborra.com
/charsets/iso646.html and http//www.wps.com/texts
/codes/ - Historically Unicode 3.0 used a 16 bit code and
described only 65536 characters.
34Text-Speech Inter-conversion
- Telephones are widely available, but most
telephones do not have a data terminal attached. - In many situations (airline reservations,
inquires about bank accounts, etc.) a human being
(employee) is interposed between a caller and a
data processing system for the sole purpose of
converting between speech and text, since the
caller cannot directly produce text or read the
text appearing from the data processing system. - There has been interest in automatic systems in
recent decades to convert - Text to speech reading machine (e.g.,
Kurtzweil reader for the blind) voice output for
telephone inquiries aid, instructions, or
warnings to aircraft pilot, repair technician,
surgeon, or other person whose hands and eyes are
occupied while working. - Uses text-to-phoneme/allophone (spelling/reading
rules) algorithm, followed by allophone-to-sound
algorithm (including syllable stress), and sound
generator. - MITalk system is one of the best documented.
- Speech to text automatic stenographer input
part of data base inquiry system etc. - Some recent systems (Dragon, IBM) are partly
successful after extensive training for use by
one speaker. When that speaker has a head cold,
there are word recognition errors!
35Text-to-Speech Systems
- First, process words on a list of exceptions to
all the rules - Foreign words frequently used in English Des
Moines, Iowa chic zeitgeist, voilà,... - Frequently used unruly native words two, iron,
Penelope, - Then use category and context rules. Examples
- All Germanic-language root words uniformly
pronounce hard g, c regardless of succeeding
vowel. Romance-language root words change sound
of g, c before i, e vowels to a different
(so-called soft) form. - Vowel o between consonants in first syllable is
often pronounced /a/ (bother, position, shower,)
except for list of exceptions (both, home,
show,...) - Then use text-to-phoneme rules for remaining
words - Example -ie and -ey after consonant at end of
word pronounced /i/ - Produce correct allophones, with stress and
pauses due to word boundaries, commas, periods,
etc. - Convert properly formed allophone string to sound
- pre-recorded or synthesized allophone sound
- Works rather well. Many applications in data
retrieval systems with touch-tone input and voice
output. - Some of these systems with limited vocabulary use
the older technology of pre-recorded phrases
rather than phoneme/allophone level synthesis.
36Speech-to-Text Limitations
- Two capabilities which are difficult to achieve
in the same system in the present state of the
art - Large vocabulary of spoken connected natural
speech - Speaker-independent accuracy. Some systems turn
this disadvantage around by marketing their
system as a speaker authenticating system! - Large vocabulary and connected speech for a
single speaker is on the brink of practicality
(next 5 to 10 years). - Several personal computer software packages allow
85-95 accurate dictation from only one speaker
after extensive training. - For large speaker population, very small
vocabulary and isolated spoken words, recognition
is practical - Nortels and TIs recognition systems support low
cost collect telephone service (e.g., MCIs 1 800
COLLECT) by recognizing only the words yes or
no spoken by the recipient. - Spoken digit dialing has been a long-time
telephone research goal, and is available on some
cellular systems. Rather sensitive to background
noise, etc. Some of these services have been
discontinued due to lack of sufficient accuracy
or lack of subscriber satisfaction/demand.
37Speech Recognition Problems
- Phonemes can sometimes be recognized by the
frequencies and pattern of change of the formants
(see p.126 in Bellamy). - One source of error is the difference in relative
time duration of each phoneme or syllable with
different speakers, different occurrences. - Time warping software can stretch/shrink all
syllable/phoneme data to equal time intervals - This improves large population accuracy, but
removes the clues of syllable stress, and
prevents distinguishing homonyms such as
'PERmit(noun)/per'MIT(verb), which differ only in
syllable stress patterns. - Syllable stress involves a combination of changes
in syllable time duration, pitch, and loudness.
38Language Limitations in the Telephone Market
- Telephone traffic between two population centers
which speak the same language are generally
proportional to the product of the two
populations, and (on average) are inversely
proportional to the square of the distance
between centers - Similar to Newtons law of gravitation!
- Lack of a common language reduces this traffic in
proportion to the smaller number of people at
each end who speak the language of the other end - A low-cost mechanism which overcomes language
incompatibility could greatly increase this
traffic - Or widespread knowledge of a universal auxiliary
language like English as a language of business
(the rôle planned for Esperanto artificial
language) - Telex (teletypewriter exchange), a switched
teleprinter network, now mostly superseded by
e-mail, was popular in parts of the world where
language differences preclude direct voice
conversations. - You can take a text printout to your desk and
translate with a multi-lingual dictionary.. - In the North American market, two competing
services, Western Union Telex and ATT TWX,
merged when WU purchased TWX. The entire US and
Canada user population never exceeded 100,000 - Internet E-mail has almost totally replaced Telex
services world wide
39Automatic Voice Translation System
Text-to-speech
Speech-to-text
Source-to- Object Language Translate
Voice-to- Phonetic- Text
Phonetic- to- Trad- itional Text
Trad- itional- to- Phonetic Text
Phonetic- Text-to- Voice
Voice Waveform Target Language
Voice Waveform Source Language
Traditional Writing Target Language
Traditional Writing Source Language
Phonetic Symbols (e.g. IPA)
Phonetic Symbols (e.g. IPA)
/h?lo/
Kon-ichi-wa
hello
/kon?it?iwa/
hello
kon?it?iwa
Color of box indicates difficulty level
Pink Difficult to do!
Blue adequate software exists
Orange Already works OK
Note Alternative to three middle boxes is a
direct phonetic source to phonetic target
translator.
40Ultimately Automatic Translation?
- Science fiction writers have described automatic
machines for spoken natural language translation
for decades. - This would require
- Speech content recognition in the speakers
language - Still a difficult problem, but more manageable
for a single speaker with a trained system - Translation from one written language
representation to another - Text translation for a limited number of simple
sentence or phrase structures is fairly practical
today. Many software packages available. Dont
expect good translations of poetry! - Automatic text translation is already popular for
some e-mail systems (without the speech parts!) - Text-to-speech conversion in the target language
- The most fully developed part of the system.
41Language Line Services, Inc.
- Founded by ATT, now a separate business, has
pursued a business development with two aspects - Support of automatic real-time spoken language
translation systems research, which could reach a
practical stage for English-Japanese within a few
years (jointly sponsored with some Japanese
firms). This research has been cut back in recent
years due to economic slowdown. - An existing service using human
translators/interpreters in 3-way telephone
conference calls between 2 people who do not
speak the same language - The translators (who contractually agree to
confidentiality in all their work) are mostly in
Monterey, CA, which has a large and varied
foreign language speaker population due to the US
Defense Dept. language training center located
there. - The major activities of language line are
(surprisingly) domestic rather than
international. Typical call is from a hospital
emergency room. A patient who speaks an unknown
language (Gujarati?) cannot describe the symptoms
of his/her illness or injury. - First step is to identify the language spoken by
this person. Second step is to conference in an
interpreter skilled in that particular language. - See www.languageline.com for more information.
Berlitz and other translation firms have similar
services as well.
42Human Ear and Hearing
- The ears each contain a sealed coiled tube called
the cochlea, filled with a fluid. A flexible
membrane at one end is coupled to the eardrum by
two small bones with flexible cartilage joints.
The cochlea is internally divided by a
longitudinal membrane - Sound waves in the air vibrate the eardrum and,
through the linkage bones, cause sound waves to
move through the fluid in the cochlea - The dividing membrane is permeated with many
nerve endings, terminating on small hair cells
(organs of Corti) which are moved by the sound
vibrations - from the Latin word for snail, which it resembles
43Human Ear Mechanism
- The human ear is known to contain many nerve
endings in the cochlea. Various frequency
components of a sound produce peak audio
frequency standing wave amplitude at different
locations in the cochlea. Nerve endings at or
near each such location are therefore sensitive
to particular frequency components of the audio
waveform. - The nerves appear to respond to signal power by
producing more electrical nerve pulsations per
second in response to higher audio signal power
in the vicinity of that nerve. - The electrical waveforms in the nerves are not a
replica of the analog sound waveform.
44Ear and Hearing
- The nerves from the ear to the brain appear to
transmit an audio power spectrum analysis of the
sound in the ear. More frequent nerve impulses
from certain nerves indicate portions of the
cochlea, and thus portions of the audio frequency
spectrum, that have high audio spectrum power. - This is the accepted explanation of why the ear
is not sensitive to phase shifts or delays in
various frequency components of a complicated
waveform - The ear is most sensitive to tones near 1 kHz
audio frequency (disclosed by Fletcher-Munson
tests in 1930s) - More audio power is required at higher or lower
frequencies to produce the same perception of
loudness - Deterioration of hearing clarity is often related
to further decrease in high and low frequency
sensitivity, producing muffled speech - Exposure to extremely loud sounds can damage
Corti (hair) cells, thus producing nerve
deafness. Once an occupational hazard of boiler
makers in loud factory workplaces, today nerve
deafness is often experienced by rock band
musicians.
45Cochlear Implants
- Essential deafness due to congenital conditions
or to injuries to the Corti/hair cells can be
partly alleviated with recently developed
cochlear implants - An external electronic filtering device performs
continual spectrum analysis of sound from a
microphone. Signals from this analyzer, after
coupling (via a transformer coil) to an implanted
pickup coil under the skin near the ear, cause
surgically implanted electrodes in the cochlea to
electrically stimulate nerve endings at
approximately the location estimated to respond
to the specified frequency component of the
signal. This produces a sensation of sound even
in people who have always been deaf. - The sound is sometimes indistinct and muffled, as
reported by patients who become deaf after having
normal hearing. However, it is beneficial in
conjunction with lip reading to aid in
understanding, and perceived sound quality is
improving each year due to research. Subjective
sound quality has improved year by year as new
signal processing methods come into use. - There is some controversy about the cochlear
implant. Some deaf people consider it too
complex, expensive, and risky in relation to the
relatively small improvement in overall ability
to hear. - The documentary motion picture Sound and Fury,
released in fall 2000, has an accompanying
Internet web page http//www.pbs.org/soundandfury
that gives more information about cochlear
implants, both pro and con
46Audio Signal Processes
- Telephone Conference Bridges
- Combining speech from 3 or more lines into one
conversation, without adding the background noise
from all inputs, is not simple - Exploitation of Silent Intervals in Speech
- Undersea cables in 1960s made use of time
assignment speech interpolation (TASI) modern
satellite circuits have a similar process called
Digital Speech Interpolation (DSI), and GSM and
IS-95 CDMA cellular/PCS radio uses a similar
feature to reduce average radio interference,
conserve battery power and increase the number of
conversations occurring in a radio cell - In GSM systems, mobile handset transmit power is
off during intervals of voice channel silence. - Because normal speech is about 40-60 silence
(between syllables, phrases), a channel can be
re-assigned to other speakers as required, thus
carrying more traffic. - Echo Cancellation
- Particularly in telephone circuits with long time
delay (e.g. satellite links) echoes are
particularly disturbing. Very short-time echoes
are perceived as simultaneous side tone.
473-way Conference Calls
- To establish multi-line conference calls, a
device is needed to add signals from all relevant
speakers - In a digital telephone switch, a three-port
conference bridge is switched into the
conversation by one of the participants who uses
the telephone dial and cradle switch to add a 3rd
party. - The audio PCM samples must be converted from
Mu-law code into uniform binary code with 12 to
16 bits resolution, then added, then the sum is
converted back to Mu-law - This process can also be done for two inputs via
a ROM look up table which implements in one step
the equivalent of mugtlineargtadd two
inputsgtreconvert to Mu law value. - On the line returning to each speaker, his/her
own audio signal component is almost completely
subtracted back out (not so for non-speakers) to
avoid high artificial side tone.
48Many-line Conference Bridge
- Avoiding background noise accumulation is a
serious problem