Information Coding for Transmission VoiceSpeechImage Coding, error protection and encryption - PowerPoint PPT Presentation

1 / 96
About This Presentation
Title:

Information Coding for Transmission VoiceSpeechImage Coding, error protection and encryption

Description:

... ancient artifacts, and is used today in Chinese Han and Japanese Kanji symbols. ... make up a new symbol... the year 863. To get enough unique ... – PowerPoint PPT presentation

Number of Views:596
Avg rating:3.0/5.0
Slides: 97
Provided by: richard843
Category:

less

Transcript and Presenter's Notes

Title: Information Coding for Transmission VoiceSpeechImage Coding, error protection and encryption


1
Information Coding for Transmission
(Voice/Speech/Image Coding,error protection and
encryption)
  • Southern Methodist University
  • EETS8320
  • Fall 2005
  • Lecture 10
  • Slides only. (No notes.)

2
Improved Transmission
  • We have considered improving the transmission
    medium
  • Fiber optics is superior to wire/cable in many
    aspects
  • Radio, in both microwave point-to-point and
    cellular/PCS has many advantages (portability,
    for example).
  • Microwave channels (little discussed in this
    course) have good digital accuracy but no
    portability.
  • UHF cellular/PCS radio channels have portability
    but are usually error filled
  • Most transmission media can benefit from
    well-designed coding and modulation
  • Efficient (high data rate) use of the bandwidth
  • Example Makes the difference between feasible
    video or image transmission via data compression,
    vs. economically infeasible analog transmission
  • Two capabilities particularly valuable in
    cellular/PCS radio
  • Error-free or reduced errors in an intrinsically
    high error channel
  • Privacy and secrecy via encryption

3
Source vs. Channel Coding
  • Source Coding1 Data Compression via removing
    known or redundant information
  • Lossy Coding exploit properties of eye and ear
    to omit non-essential data. Typical examples
    include
  • Speech
  • The ear does not require perfect audio waveforms,
    accepts nearly correct audio frequency power
    spectrum
  • Vision
  • tri-stimulus 3-color representation of complete
    visible spectrum
  • Missing color is acceptable in fine detail
    areas of image
  • Gray/Color Pictures (and video coded via DCT
    etc.)
  • Lossless Coding allows perfect reconstruction of
    binary data
  • Black/White Pictures (particularly FAX)
  • Text and data files in general (Lempel-Ziv-Welch
    and PKZip lossless coding)
  • Note 1 source code has different meaning in
    computer programming
  • DCTDiscrete Cosine Transform

4
Channel Coding
  • Encode information (usually with extra
    intentional redundant information inserted) to
    reduce or correct errors due to (statistically
    described) channel errors
  • Also use a modulation method with minimum analog
    bandwidth, particularly for radio, and for data
    modems via telephone channels
  • Error Correcting Codes
  • A limited number of bit errors and their
    locations can be detected, and once an erroneous
    bit location is known, that bit can be inverted
  • Error Detecting Codes
  • Presence of errors is known, but their exact
    location is not known. Different patterns of many
    bit errors all produce the same indication of
    error, so a unique correction cannot be made.
  • Often combined with an Automatic Request to
    Re-transmit (ARQ) algorithm for delay-able data
  • Any error protection scheme can be fooled by
    sufficiently high error rate
  • So long as distinct messages exist, a
    sufficiently high error rate can convert one
    valid message into another!
  • Many codes act as error correction codes for low
    error rates, but error detection codes for high
    error rates.

5
Speech Acoustics and Terminology
  • The physics of speech production is studied by
    two diverse groups of people
  • 1. Speech and singing therapists, foreign
    language teachers, medical related professionals,
    etc.
  • 2. Engineers and scientists interested in
    encoding, storing, or synthesizing speech
    electronically.
  • A peculiar jargon has developed historically from
    group 1 in these areas of study
  • Knowledge of this jargon is needed to allow
    reading the literature in this topic
  • Alexander Graham Bell, the inventor of the
    telephone, and his father Alexander Melville
    Bell, were both speech therapists
  • A.M. Bell developed a once popular type of
    phonetic alphabet, called visible speech, which
    is no longer used (supplanted by IPA).
  • Many historians believe that Professor Higgins,
    in the G.B.Shaw play Pygmalion (source of the
    musical My Fair Lady) was based on A.M.Bell, or
    possibly Daniel Jones or Henry Sweet, two other
    contemporary British phoneticians.

6
Phonetic Alphabets
  • While many traditional alphabets (Spanish,
    German, Russian, etc.) come very close to one
    sound for each symbol, there are many
    inconsistencies in English, French and other
    widely used languages.
  • English partially compensates by having simpler
    grammar and a huge existing data base of
    literature
  • The International Phonetic Association (IPA)
    developed the International Phonetic Alphabet
    (also abbreviated IPA) about 1895, and has
    continuously revised it, most recently in 1998.
    Used today by both types of researchers.
  • Some of the original inventors were Paul Passy
    (France), J.O.H. Jesperson (Denmark), Daniel
    Jones (England)
  • IPA is based on use of moveable type printing
  • After using appropriate type characters from
    various European languages (Latin, Greek,
    Scandinavian languages, etc.), additional needed
    symbols use existing characters inverted or
    reversed. A few characters are specially invented
    only for IPA.
  • Computer typesetting and word processing has made
    use of IPA easier in todays technology. Not well
    adapted to mechanical typewriters.

7
IPA Principles
  • The sounds themselves must be learned from a
    recording, or a speaker who knows them. The chart
    is only a memory aid!
  • Distinguishable sounds of speech are called
    phonemes (explained later)
  • IPA uses one symbol for each phoneme
  • Distinctions between phoneme use in different
    languages may be indicated by special characters
    or diacritical marks (e.g. dental T)
  • There is no traditional alphabetic order. The IPA
    chart for consonants is organized according to
    the point of articulation (columns) and the
    type of sound characteristic (rows).
  • Within each chart entry having two characters,
    the left character is unvoiced and the right
    character is voiced
  • Vowels are classified in a modified Vieto
    triangle diagram where each point indicates the
    position of the highest part of the tongue
  • The lips are at left, the pharynx at right in the
    IPA charts and diagrams
  • When two vowel symbols are shown, the right
    symbol indicates the same tongue position with
    extended lips. For example, y (front, close)
    represents the sound in French rue (street) or
    German ü or y, not the sound in English yes.

8
Reading IPA
  • Prosody markings
  • Stress (lengthening, loudness and/or pitch
    change) of a syllable is indicated by
    'supra-seg'mental marks- small 'vertical bar at
    top of line of text be'fore the syllable.
  • Palatalization (using semivowels like j as
    pronounced in German),
  • Nasalization using the tilde , etc.
  • Tone pitch (audio frequency) is indicated by tone
    marks. Important in Chinese, Vietnamese, Swedish,
    and in English questions, etc.
  • Much Latin terminology is used on the charts
  • Bilabial (both lips), labio-dental (lip and
    teeth), velar (velum soft roof-of-mouth area
    behind the palate), uvula (hanging fold of
    membrane at rear of velum)..
  • Nasal (with air passage to nose open at back of
    mouth), trill (vibration of the tongue or lips),
    lateral (air and sound flow around the sides of
    the tongue), approximant (e.g., the tongue is
    near but not touching the roof of the mouth)
  • IPA is used in many foreign language textbooks,
    in some multi-lingual dictionaries (not US
    Merriam-Webster dictionary).
  • Many textbooks use older versions of IPA or make
    arbitrary alterations in IPA. For example, y
    represents the sound inyes in some textbooks.
    Most foreign phrase books for casual readers use
    arbitrary phonetic spelling used in that one
    phrase book only.

9
Phonemes
  • A phoneme is a distinguishable sound element in
    at least one specific language
  • The pair test when two words differ in only one
    element of their sound, and they have
    distinguishable meaning to a native user, then
    the two distinct sounds are distinct phonemes.
    Example bed vs. bet
  • If the two sounds only produce accepted variants
    (regional accents for example) then they are not
    distinct phonemes in that language.
  • Example 1 The vowel sounds in the words Ma, me,
    my, Moe, Moo are all phonemically distinct in
    standard North American English. For a
    southern accent the vowels in Ma and my are not
    phonemically distinct (both /a/). Southerners
    often speak of their mother as mom or mommy
    to avoid confusion.
  • Example 2 palatalization (raising the tongue
    near the palate at the beginning or end of a
    vowel) is a regional or non-distinctive feature
    in English. The word new can be pronounced either
    /nju/ (British or radio announcer English) or
    /nu/. Traditional English writing does not
    indicate this distinction either. But in Russian,
    palatal vowels are distinct phonemes from
    non-palatal, even in traditional writing /a/
    (but, and) differs from /ja/ (I, first person
    subject case). The first is written like Latin a,
    the second like a backwards R.
  • One phoneme may describe multiple allophones.

10
Allophones1
  • Many phonemes have different specific forms
  • Example the /t/ in top is an opening form,
    while the /t/ in pot is a closing form. The /p/
    in pot is an opening form (which also is
    immediately followed by aspiration - a puff of
    breath - in English), while the /p/ in top is a
    closing form.
  • We have just pointed out two allophones of /p/
    and two allophones of /t/. The allophones involve
    the same parts of the mouth, but in this case the
    microscopic time scale of events is reversed
    (opening vs. closing of air flow).
  • When we want to represent specific allophones,
    they are written in square brackets rather
    than the // used for phonemes.
  • Choice of the correct allophone is usually
    uniquely dependent on context (e.g., beginning of
    word, end of word, etc.) but automatic for a
    native speaker (so-called hidden rules)
  • A speech synthesis machine must specifically
    use the correct allophone.
  • Note 1. In Québec, Canada, the term allophone
    refers to a person whose native language is
    neither French nor English

11
Formants
  • When the lips, jaw and tongue are held in a
    particular position, the air space between the
    glottis (vocal cords) to the lips forms several
    coupled acoustic resonant cavities. Each cavity
    has one or more resonant frequencies, called
    formant frequencies in phonetics.
  • The toot sound made by blowing air across the
    neck of a bottle is due to oscillation of the air
    contained in the bottle. A taller bottle has
    lower resonant frequencies (lower formant
    frequencies).
  • 19th century phoneticians would listen for the
    pitch (audio frequency) of the thump sound
    produced by tapping the cheek while holding mouth
    and jaw in a particular phoneme configuration
  • In most cases, each phoneme has several specific
    formant frequencies. Each formant frequency
    corresponds to a configuration of acoustic
    standing waves having different frequencies.
  • The lowest resonant frequency is called the first
    formant, the next higher resonant frequency is
    the second formant, and so forth.

12
Formants Determine Recognizable Phoneme
  • Some formants/resonances have more loss than
    others.
  • Most formant frequencies have low Q (self
    oscillations die out quickly, and sound like a
    thump instead of a ping)
  • The formant frequencies do not change with the
    pitch of the voice (as in singing, raised pitch
    at the end of question phrases, etc.) This is an
    important reason why we can recognize the same
    phoneme (a particular vowel, for example)
    regardless of whether it is spoken by a low,
    medium or high pitch voice, or whether it is sung
    or spoken.
  • Formant frequencies are slightly different for a
    person with a large head vs. another person with
    a small head. However, head size differences
    between the smallest (a newborn infant) and the
    largest adult differ by less than 2 to 1. (Sizes
    of the trunk, arms and legs change much more
    significantly during growth.) The difference in
    the singing pitch (frequency) of voice may range
    over 64 to 1 when comparing the lowest (basso)
    male voice and the high pitched (soprano) voice
    of small children.
  • Most voice recognition and speech recognition
    (such as speech-to-text) systems recognize
    distinct phonemes by examining the short term
    audio frequency power spectrum of the speech.
  • In engineering terms, resonant oscillations that
    die out rapidly have low Q

13
Formant Display and Analysis
  • 20th century phoneticians find formant
    frequencies by using short term audio spectrum
    analyzers. One type of visible display is the
    sound spectrogram on p. 126 of the Bellamy
    reference book.
  • This type of sound spectrogram is often called
    Visible Speech (named after A.M.Bells 19th
    century phonetic alphabet, but quite different).
  • Sound spectrograms display audio power density by
    means of different shades of gray/black darker
    gray indicates higher power.
  • Time is shown on the horizontal axis. Short term
    audio frequency is shown on the vertical axis.
  • Calculation of the short term audio frequency
    power may be done using wider audio sub-bands
    with very short successive time windows, or
  • Alternatively, calculation uses longer successive
    time windows but very narrow sub-bands of
    frequency.
  • The latter display indicates the individual
    fundamental and harmonic frequency components
    more clearly and separately on the display.

14
Speech Production I
  • The mechanism of human speech production can be
    viewed as two processes
  • A sound source producing one or both of these
    characteristics
  • Voiced sounds originating from periodic air
    pulses flowing through the so-called vocal
    cords (vocal folds) in the glottis or larynx
    part of the throat
  • Un-voiced sounds, mostly from forcing air flow at
    high velocity through a narrow opening. Called
    fricative sounds by acoustic phoneticians called
    turbulence by physicists and aeronautical
    engineers
  • In a rising column of smoke (as from a cigarette)
    shown in the sketch here, the straight flow
    section is called laminar flow, and the curved
    section is turbulent, indicating a random
    breakup and circulation of the straight flow
    pattern
  • Unvoiced sounds may originate at several
    different places in the mouth or throat, while
    voiced sound mainly originates in the larynx.
    (Speech coders and synthesizers often cheat and
    produce all sound sources at the equivalent of
    the glottis, even when the actual fricative sound
    source is, for example, just behind the teeth.)

Turbulent flow
Laminar flow
15
Speech Production II
  • Individual sounds may start or end either
    suddenly or gradually. Sudden onset or ending of
    many sounds is due to rapid opening/closing of
    lips or the space between the tongue and roof of
    mouth (the palate or the alveolar ridge behind
    teeth) or the teeth themselves, or sometimes the
    throat or larynx.
  • Sudden onset sounds are called plosives by
    phoneticians. Examples include P, B, T, K
  • In some languages, (English, Chinese) it is
    customary to exhale a short puff of breath
    (aspiration) after opening the air flow for an
    unvoiced sound. In others (German) this does not
    happen. Therefore English speaking people
    perceive one aspect of a German accent as sound
    of G in place of K.
  • German speaker produces non-voiced sound, but
    omits the aspiration!
  • Some plosives (T,D) are produced at the teeth in
    romance and slavic languages, but at the alveolar
    ridge in English. English speaking people
    perceive this dental sound as one aspect of a
    foreign accent (or a New York City accent, since
    many European immigrants there preserved a dental
    t in their English speech). Traditional writing
    does not distinguish these two different
    phonemes, but IPA does.

16
Speech Production III
  • Phonemes produced in the larynx or the back of
    the mouth (pharynx) are designated by symbols
    similar to the question mark (glottal plosive)
    and a left-right reversed question mark
    (pharyngeal fricative), respectively, in IPA
    (International Phonetic Alphabet)
  • The glottal plosive occurs in most languages, but
    only the Semitic languages (modern Arabic,
    ancient Hebrew) and Hawaiian recognize and
    have a traditional symbol for it
  • The phrases a nice man and an ice man differ
    in speaking due to the glottal plosive (glottal
    stop) before the word ice. Similar stops
    between un-connected vowels in many other
    languages also utilize (but do not traditionally
    recognize in writing) the glottal stop. For
    example, the English word pre-existing or
    Spanish contra-ataque
  • Small children learn to imitate the necessary
    mouth configuration without diagrams, yet adults
    with already fixed speech habits in one language
    find this more difficult to learn and benefit
    from use of pictures of the configuration of the
    mouth and tongue.

17
Written Language
  • Who was that forgotten genius who first invented
    written language, whereby we can read the
    innermost thoughts of people far away or long
    gone from the earth? -- Galileo Galilei, ca.
    1660 (paraphrased)
  • Galileo is best known as an early physicist and
    astronomer, but he was a renaissance man with
    interests in languages and philology as well.
  • Traditional written language is the basis of much
    data communication.
  • Machine inter-conversion between spoken language
    (sound) and traditional written language,
    apparently so simple for humans, has been a
    difficult goal of many telecommunications
    engineers
  • Accurate automatic speech-to-text conversion
    could be much less costly than use of a human
    stenographer!
  • The bit rate for a voice telephone channel is
    64000 b/s. If a stenographer transcribes the same
    speech into writing and sends it via traditional
    character codes, the data rate is only about 100
    bits/second!
  • In many cases a spoken command or announcement is
    needed in a telephone connection to a human user.
    Automatic text-to-speech synthesis is desirable
    for this. Practical text-to-speech is available.

18
Phonetics and Traditional Writing
  • Pictographic writing was used in prehistory, and
    is known from ancient artifacts, and is used
    today in Chinese Han and Japanese Kanji symbols.
  • Aside from 2000 Kanji pictograms taught in
    Japanese schools today, other modern Japanese
    writing uses a syllable alphabet of 50 symbols,
    which has two stylistic forms called Hiragana and
    Katakana.
  • Most other languages use a phonetic or
    semi-phonetic writing method.
  • With the possible exception of Korean Hangul
    (which was apparently invented independently)
    most present day alphabetic writing systems are
    descendents of the ancient Phoenician
    proto-Semitic alphabetic
  • Even though the vocabulary and grammar of
    Turkish, Russian, Arabic and English are quite
    different, they each are written using an
    alphabet that traces back historically to the
    Phoenician alphabet .
  • Phoenicians lived where modern Lebanon is today.
    They are mentioned in the Bible. They were great
    seafaring merchants and introduced writing around
    the Mediterranean and from there to parts of Asia
    and Africa.
  • For many years most European Egyptologists
    incorrectly assumed that ancient Egyptian
    hieroglyphics were pictographic. Perhaps
    pre-hieroglphs were pictographic, but French
    linguist J.F. Champollion deciphered the
    Rosetta stone (ca. 1820) when he realized that
    they are alphabetic writing, evolved from earlier
    pictograms.

19
Sound-to-symbol Correspondence
  • Except for some recently created artificial
    written alphabets (e.g., Turkish in 1922, or
    written alphabets created by 20th century
    missionaries1 for previously unwritten
    languages), and some reformed spelling rules
    issued by national academies or similar
    language purifiers, most modern languages have
    somewhat irregular spelling rules
  • Seldom is there one sound for each symbol, one
    symbol for each sound.
  • Historians believe that each traditional alphabet
    started with a one-to-one phoneme to symbol
    correspondence, and the irregularities crept into
    the scheme at a later date when sound changes
    occurred.
  • Pronunciation changes over the years, but written
    forms still represent the old pronunciation
  • We know from historical information and the
    languages of Scotland and Holland (the
    Netherlands) that the English word light (and
    other GH spellings which are silent in modern
    English) was pronounced in Shakespeares 17th
    century (when English spelling became frozen)
    like the modern German word licht /lIxt/ which
    has the same meaning.
  • 1 The Summer Institute of Linguistics in Dallas
    is a modern center of linguistic research for
    obscure languages which have no significant
    commercial use, motivated by a desire to
    communicate religious beliefs to speakers of
    these languages.

20
Spelling Irregularities Other Causes
  • Many languages have regional dialects with
    different sounds for the same word (or different
    regional words for the same thing1)
  • When regional dialects become very different over
    a long time, they may each become recognized as a
    separate language
  • The modern romance languages (French, Spanish,
    Italian, Romanian, Portuguese, etc.) are all
    divergent dialects of Latin (following centuries
    of changes). We know this from historical
    documents.
  • A separate language is a dialect whose speakers
    have their own army and navy. - anonymous
  • For better communication (TV,radio and movies)
    some de facto standard dialect is often in use
    mid-western US English, Parisian French, Egyptian
    Arabic for movies but standard modern Arabic (an
    artificial semi-classical Arabic dialect) for
    newspaper writing.
  • 1 Examples A paper bag (so-called in most of
    the USA) is called a paper sack in the south
    and southeast. In the north, the word sack is
    used only for a cloth (not paper) bag. The word
    insurance is accented on the first syllable in
    the south, but on the second syllable elsewhere.
    In British pronunciation, written syllables
    ending -er or -ar are mostly pronounced without
    the /r/. Father is /faða/ in the Queens
    English, but /faðø/ in North American English. A
    guagua is a motor bus vehicle in Cuban Spanish,
    but is a baby in other Latin American Spanish
    dialects.

21
New Languages Written With Old Alphabets
  • When necessary to write down the words and sounds
    of an unfamiliar language, most people naturally
    do this using the alphabet that they already
    know.
  • Repeatedly in history, when one group already
    having a written alphabet meets another group
    with a distinct language but no alphabet, the
    alphabet of the first group is used to write the
    sounds of the second groups language.
  • Many examples listed on the following slide.
  • If the people in the first group know multiple
    alphabets from different languages, they may use
    bits and pieces of all these previous alphabets
    to write the new language! (As Saints Cyril and
    Methodius did to first write Russian using a
    combination of Latin, Greek and Hebrew
    characters.)

22
Alphabet Borrowing
  • When the written alphabet of one language is
    applied to writing a second language which
    contains sounds not present in the first
    language, the writers either
  • make up a new symbol...
  • Two Eastern Orthodox monks (later Saints), Cyril
    (also called Kiril) and Methodius, invented
    theSlavonic Russian alphabet about the year
    863. To get enough unique characters, they used
    symbols taken from Greek, Latin and Hebrew.
    Russian writing is still very phonetic despite
    later sound changes. Czar Peter (the Great)
    European-ized the appearance of these Old
    Church Slavonic letters producing the modern
    Russian alphabet. Some Greek-root letters were
    eliminated in 1918.
  • Many other Slavic language family alphabets are
    based on Latin (Czech, Polish, etc.) as the
    result of a historical religious division of
    Europe between the Eastern Orthodox and Roman
    Catholic Christian churches.
  • Find a new use for an old symbol
  • use a symbol which represents a sound not present
    (or not recognized) in the second language but
    explicitly represented in the first language.
    This symbol is no longer needed for its original
    purpose.
  • The alif (first letter) of the Phoenician
    alphabet represented a glottal stop consonant. In
    Greek this sound was not recognized. Alif was
    then used for the Greek vowel sound A. The
    Phoenicians did not write most vowels explicitly
    (like modern shorthand) because their language
    had few word pairs which had a phonemic vowel
    difference.
  • These inconsistencies between different
    descendant forms of Phoenician writing thus arise
    partly from the existence of different phonemes
    in different languages

23
Alphabet Notes
  • Many languages that are not related by
    vocabulary, grammar or history nevertheless use
    an alphabet derived from Phoenician!
  • Phoenician writing was adapted to Arabic, Hebrew,
    Greek and via Greek to Latin, and from Latin to
    most European languages
  • Semitic languages followed the original
    Phoenician method of writing only the consonants
    explicitly.
  • Greek and Latin adapted or invented explicit
    vowel symbols.
  • Modern Arabic and Hebrew use various dots
    surrounding the consonants for vowel symbols (and
    to make some consonant distinctions in Arabic).
    Vowel markings are used in childrens or
    religious documents and omitted in newspapers or
    general printing.
  • Right-to-left direction of writing Phoenician
    evolved first into alternate lines in both
    directions and then finally left-to-right for
    Latin.
  • Arabic is studied for religious reasons where the
    Islamic religion is prevalent. Arabic characters
    are used to write the non-related languages such
    as Urdu (and Turkish, until the 1920s when Latin
    characters were adopted, based primarily on
    German spelling rules).
  • Due to the Roman empire, most European languages
    are written using the Latin alphabet, with the
    addition of new letters J, K, U and W.
    Specific European languages also added more new
    letters (æ, ð, ø, etc.) and diacritical marks (é,
    ö, ç, etc.)
  • V was used for both the consonant V and the
    modern vowel U by ancient Romans.

24
Character Codes
  • The first machine character codes were the
    Baudot-Murray1 teletypewriter systems (now
    labeled ITU-T alphabet code No. 2)
  • These are 5-bit binary codes used with early
    teletypewriter machines and also designed to be
    readable by ear for human telegraphers at low bit
    rates.
  • These codes do not have the binary numerical
    codes in the order corresponding to alphabetic
    sequence (collating sequence), which is a problem
    for many data processing operations.
  • ITU No.2 code survives today mostly via
    teletypewriters for the deaf (TTY/TDD).
    Historically used in TELEX networks. TTY/TDD
    devices use a special modem via telephone lines.
    Dialing 711 connects to a US state TRS service
    center, where human communications assistants
    connect with the destination and translate calls
    (voice-TTY) for deaf people.
  • Most modern data processing systems use an 8-bit
    (or 7-bit with 1 parity check bit) code related
    to ASCII (American Standard Code for Information
    Interchange). First historical application of
    ASCII was TWX networks.
  • ISO-646 and ITU No. 5 code and ISO-8859-1 code
    are derived from and are similar to ASCII except
    for punctuation marks and additional characters.
  • Alphabetic ordering, sorting, data look up is
    straightforward.
  • Several proprietary alphabetic codes such as IBM
    EBCDIC, used in mainframe computers, have faded
    into minor use due to the growth in popularity of
    personal computers, which all use ASCII or some
    variant
  • 1 Named for Emile Baudot (19th c. French inventor
    of an early teleprinter machine) and Gilbert
    Murray (20th c. US inventor of improved
    teletypewriter)

25
ASCII codes 0-127 and Extensions 128-255


 
 
 

Note other character assignments are used for
codes 128-255 in addition to those shown here.
Some characters do not display here due to
limitations in PowerPoint character set.
26
Explanation of ASCII Codes 0-31
  • 000 (nul) Null 016
    (dle) Data link escape 001 (soh) Start of
    heading 017 (dc1) Device control 1
    002 (stx) Start of text 018
    (dc2) Device control 2 003 (etx) End of
    text 019 (dc3) Device control 3
    004 (eot) End of transmit 020
    (dc4) Device control 4 005 (enq) Enquiry
    021 (nak) Neg. acknowledge 006
    (ack) Acknowledge 022 (syn)
    Synchronous idle 007 (bel) Bell
    023 (etb) End trans. block 008
    (bs) Back space 024 (can)
    Cancel
  • 009 (tab) Hor. tab 025
    (em) End medium
  • 010 (lf) Line feed 026
    (eof) end of file(sub.) 011 (vt) Vert. tab
    027 (esc) Escape
  • 012 (np) Form feed 028
    (fs) File separator
  • 013 (cr) Carriage return 029
    (gs) Group separator 014 (so) Shift out
    030 (rs) Record separator
    015 (si) Shift in 031
    (us) Unit separator
    032 Blank space

These character code meanings were used
historically for character oriented data
communications in the 1960s. Again, many computer
operating systems and file systems do not use all
these meanings today, but assign proprietary
meanings to many codes. For example, in Internet
browsers, 004 (control with D) indicates save
this URL in my directory list, a new meaning
not intended by the 1960 designers of the ASCII
codes. When simple parity check error-detection
code is used, only the ASCII codes from 0-127 are
used. When parity check methods (explained in a
later lecture) are not used, all codes up to 255
are used for characters.
27
Syllabic Alphabets
  • Japanese writing is the best known example of a
    syllabic alphabet, (called a syllabary). In
    schools, Japanese students learn 2000 Kanji
    pictograms, borrowed from Chinese Han characters
    but mostly pronounced with different sounds. All
    other words are written using 50 syllabic
    characters. The syllables ka, ke, ki, ko and ku
    each have a distinct symbol, for example. The
    non-vowel syllable n also has a symbol.
  • There are 50 Japanese syllable symbols. There
    are two styles of writing syllable characters
  • Hiragana, used for most ordinary Japanese writing
    or printing
  • Katakana, used somewhat like italic writing in
    English -- that is, for foreign words and for
    special emphasis.
  • There are two recognized English character
    transliteration methods Romaji (official
    government method) and Wade-Giles (historical
    method) that are almost the same.
  • Arabic and Hebrew traditionally have characters
    only for the consonants. (Both have a consonant
    transliterated as w that sometimes is
    pronounced u and is thus used as a vowel. )
  • Thus, most printed material effectively has one
    character per syllable in many cases. Arabic and
    Hebrew have a special symbol to indicate that
    there is no vowel in a syllable. In Arabic the
    no-vowel symbol is named sukun, a small circle
    above the consonant. In Hebrew, the symbol is
    named shva, a symbol like an English colon
    used below the consonant. (Shva is not silent in
    the first syllable of a modern Hebrew word, but
    instead has a central neutral vowel sound.)
  • Vowel marks are optional and used only for
    religious texts or beginners documents.
  • You must know the vocabulary well to read such
    defective (no vowel) writing correctly.
  • Certain types of English shorthand writing
    (example speedwriting) omit vowels. You must
    know vocabulary well to read these abbreviations.

28
Hiragana Syllables
This alphabetic order is used in dictionaries
for foreigners. Supplementary marks (not shown)
are used to Indicate doubling of consecutive
letters, palatalization (y sound with
syllable), voicing (makes ka read as ga, Sa read
as za, etc.) Transliteration shown here is the
Wade-Giles system. In Romaji transliteration,
si is used Instead of shi, tiinstead of
chi, tu instead of tsu, and hu instead of
fu. Hu is the bilabial fricative sound In the
IPA chart. Sounds of syllables are close to those
represented to unsophisticated English speakers
by the Wade-Giles system.
29
Arabic Alphabet
Chart shows only basic consonant letters. Vowels
(not shown) are indicated by means of dots and
other marks below or above the character. Note
that several characters have distinctive shapes
when used in an initial, medial or final
position, in order to provide links to the
surrounding letters, as for English handwriting.
30
Hebrew Alphabet
lt Alphabetic order begins here. Read
right-to-left.
Chart shows only basic consonant letters. Vowels
(not shown) are indicated by means of dots and
other marks below, above or to the side of the
character. Character forms identified on this
chart as left kap, left mem, left nun, left peh,
left sadeh are final character writing styles
used only at the end of a word. In documents for
beginners, sin is distinguished from in (shin)
by a dot in the upper left corner vs. upper right
corner respectively. Number values for characters
are only used in ancient Hebrew. Modern Hebrew
and modern Arabic use Arabic numbers as used in
English.
31
Changing Alphabets
  • Chinese is still written primarily in the Han
    pictographic form, although many characters are
    now simplified. For use of the English
    alphabet, the standard sound of each word is the
    Putonghua (common language, or Mandarin)
    pronunciation, based on the Beijing dialect and
    taught in schools throughout China.
  • Most regional dialects have very different
    sounds for each word.
  • Pinyin (phonetic spelling) was standardized in
    1977, based on alphabetic transliteration rules
    developed in Mongolia in the 1950s. The name of
    the Chinese capital city is written Beijing in
    pinyin, but was variously called Peiping, Pekin,
    Peking in previous non-standard alphabetic
    versions of Chinese.
  • Several computer software programs exist to allow
    a typist to input Chinese on an English character
    keyboard in pinyin spelling, and produce Han
    pictogram results. Similarly, some computers
    allow Japanese to be typed in Romaji spelling and
    produce Kanji pictograms, Hiragana and Katakana
    in traditional forms.
  • Azeri, the language of Azerbaijan, was written in
    Arabic characters from approximately 900 to 1924.
    Like the related Turkish language, Azeri was then
    written using Latin characters. When Azerbaijan
    became part of the U.S.S.R. in 1939, the Russian
    government required the use of the Cyrillic
    alphabet. On August 1, 2001, the Azerbaijan
    government officially changed back to the Latin
    alphabet. Public signs and newspapers are now
    printed with both alphabets during the transition.

32
Expansions of ASCII
  • The existence of many essentially different
    alphabets for natural languages has been
    addressed differently in various historical
    telecommunications and data processing systems
  • Chinese Han pictographs are tabulated in a
    Chinese Telegraphic Code book, with a 5-digit
    decimal number for each pictograph. Invented for
    telegrams in the 19th century, this still has
    some limited uses today.
  • As previously described, a standard Roman
    letter phonetic transliteration (pinyin)
    dictionary has been established for the
    Putonghua-Mandarin spoken language of China.
    Romanized text can be stored and communicated
    like other alphabetic languages.
  • A standard romanization system (romaji) exists
    for Japanese as well.
  • Pictographic writing systems have no obvious
    alphabetic order
  • Chinese Han and Japanese Kanji pictographs are
    placed in dictionaries according to the number of
    pen strokes used to draw them, and the stroke
    placement (lower left first, then upper left,
    then upper right, lower right)

33
Code Pages and Unicode
  • In most word processing and similar computer
    systems, 8-bit code pages of 256 characters each
    are defined.
  • Decimal code values 0-31 are non-printing control
    characters. In teletypewriters, these are used
    for carriage return, horizontal tab, line feed,
    etc.
  • Code values 32-255 are used for printing
    characters. You can view this table in many word
    processing systems from the insertgtsymbol pull
    down menu in a word processor.
  • Different code pages (fonts) can be substituted
    in different parts of a document for different
    appearance or to handle different language
    alphabets.
  • IS0-8859 is a standard with numerous code pages
    for various languages (ISO- 8859-1 is so-called
    Latin (English) ISO-8859-7 is Arabic, etc.)
  • Unicode 4.0 (ISO-106462003), a universal code
    page set. It has 96248 printable characters,
    each represented internally by a 32-bit binary
    number, and includes most of ISO-8859 (with 24
    binary zeros preceding each 8-bit code) as well
    as approximately 18000 Chinese Han pictographs,
    etc., etc. and other languages
  • In theory, one can write a multilingual document
    in Unicode with no confusion. Portions of the
    overall range of Unicode may be converted into
    specific pre-existing limited code pages. Strong
    support by Apple and Microsoft (in Windows NT and
    Windows 2000 etc.) have made Unicode widely
    available.
  • For more information on Unicode and how it can be
    incorporated into data processing and
    communication systems, see Internet URL
    http//www.unicode.org
  • More unofficial information at http//czyborra.com
    /charsets/iso646.html and http//www.wps.com/texts
    /codes/
  • Historically Unicode 3.0 used a 16 bit code and
    described only 65536 characters.

34
Text-Speech Inter-conversion
  • Telephones are widely available, but most
    telephones do not have a data terminal attached.
  • In many situations (airline reservations,
    inquires about bank accounts, etc.) a human being
    (employee) is interposed between a caller and a
    data processing system for the sole purpose of
    converting between speech and text, since the
    caller cannot directly produce text or read the
    text appearing from the data processing system.
  • There has been interest in automatic systems in
    recent decades to convert
  • Text to speech reading machine (e.g.,
    Kurtzweil reader for the blind) voice output for
    telephone inquiries aid, instructions, or
    warnings to aircraft pilot, repair technician,
    surgeon, or other person whose hands and eyes are
    occupied while working.
  • Uses text-to-phoneme/allophone (spelling/reading
    rules) algorithm, followed by allophone-to-sound
    algorithm (including syllable stress), and sound
    generator.
  • MITalk system is one of the best documented.
  • Speech to text automatic stenographer input
    part of data base inquiry system etc.
  • Some recent systems (Dragon, IBM) are partly
    successful after extensive training for use by
    one speaker. When that speaker has a head cold,
    there are word recognition errors!

35
Text-to-Speech Systems
  • First, process words on a list of exceptions to
    all the rules
  • Foreign words frequently used in English Des
    Moines, Iowa chic zeitgeist, voilà,...
  • Frequently used unruly native words two, iron,
    Penelope,
  • Then use category and context rules. Examples
  • All Germanic-language root words uniformly
    pronounce hard g, c regardless of succeeding
    vowel. Romance-language root words change sound
    of g, c before i, e vowels to a different
    (so-called soft) form.
  • Vowel o between consonants in first syllable is
    often pronounced /a/ (bother, position, shower,)
    except for list of exceptions (both, home,
    show,...)
  • Then use text-to-phoneme rules for remaining
    words
  • Example -ie and -ey after consonant at end of
    word pronounced /i/
  • Produce correct allophones, with stress and
    pauses due to word boundaries, commas, periods,
    etc.
  • Convert properly formed allophone string to sound
  • pre-recorded or synthesized allophone sound
  • Works rather well. Many applications in data
    retrieval systems with touch-tone input and voice
    output.
  • Some of these systems with limited vocabulary use
    the older technology of pre-recorded phrases
    rather than phoneme/allophone level synthesis.

36
Speech-to-Text Limitations
  • Two capabilities which are difficult to achieve
    in the same system in the present state of the
    art
  • Large vocabulary of spoken connected natural
    speech
  • Speaker-independent accuracy. Some systems turn
    this disadvantage around by marketing their
    system as a speaker authenticating system!
  • Large vocabulary and connected speech for a
    single speaker is on the brink of practicality
    (next 5 to 10 years).
  • Several personal computer software packages allow
    85-95 accurate dictation from only one speaker
    after extensive training.
  • For large speaker population, very small
    vocabulary and isolated spoken words, recognition
    is practical
  • Nortels and TIs recognition systems support low
    cost collect telephone service (e.g., MCIs 1 800
    COLLECT) by recognizing only the words yes or
    no spoken by the recipient.
  • Spoken digit dialing has been a long-time
    telephone research goal, and is available on some
    cellular systems. Rather sensitive to background
    noise, etc. Some of these services have been
    discontinued due to lack of sufficient accuracy
    or lack of subscriber satisfaction/demand.

37
Speech Recognition Problems
  • Phonemes can sometimes be recognized by the
    frequencies and pattern of change of the formants
    (see p.126 in Bellamy).
  • One source of error is the difference in relative
    time duration of each phoneme or syllable with
    different speakers, different occurrences.
  • Time warping software can stretch/shrink all
    syllable/phoneme data to equal time intervals
  • This improves large population accuracy, but
    removes the clues of syllable stress, and
    prevents distinguishing homonyms such as
    'PERmit(noun)/per'MIT(verb), which differ only in
    syllable stress patterns.
  • Syllable stress involves a combination of changes
    in syllable time duration, pitch, and loudness.

38
Language Limitations in the Telephone Market
  • Telephone traffic between two population centers
    which speak the same language are generally
    proportional to the product of the two
    populations, and (on average) are inversely
    proportional to the square of the distance
    between centers
  • Similar to Newtons law of gravitation!
  • Lack of a common language reduces this traffic in
    proportion to the smaller number of people at
    each end who speak the language of the other end
  • A low-cost mechanism which overcomes language
    incompatibility could greatly increase this
    traffic
  • Or widespread knowledge of a universal auxiliary
    language like English as a language of business
    (the rôle planned for Esperanto artificial
    language)
  • Telex (teletypewriter exchange), a switched
    teleprinter network, now mostly superseded by
    e-mail, was popular in parts of the world where
    language differences preclude direct voice
    conversations.
  • You can take a text printout to your desk and
    translate with a multi-lingual dictionary..
  • In the North American market, two competing
    services, Western Union Telex and ATT TWX,
    merged when WU purchased TWX. The entire US and
    Canada user population never exceeded 100,000
  • Internet E-mail has almost totally replaced Telex
    services world wide

39
Automatic Voice Translation System


Text-to-speech
Speech-to-text
Source-to- Object Language Translate
Voice-to- Phonetic- Text
Phonetic- to- Trad- itional Text
Trad- itional- to- Phonetic Text
Phonetic- Text-to- Voice
Voice Waveform Target Language
Voice Waveform Source Language
Traditional Writing Target Language
Traditional Writing Source Language
Phonetic Symbols (e.g. IPA)
Phonetic Symbols (e.g. IPA)
/h?lo/
Kon-ichi-wa
hello
/kon?it?iwa/
hello
kon?it?iwa
Color of box indicates difficulty level
Pink Difficult to do!
Blue adequate software exists
Orange Already works OK
Note Alternative to three middle boxes is a
direct phonetic source to phonetic target
translator.
40
Ultimately Automatic Translation?
  • Science fiction writers have described automatic
    machines for spoken natural language translation
    for decades.
  • This would require
  • Speech content recognition in the speakers
    language
  • Still a difficult problem, but more manageable
    for a single speaker with a trained system
  • Translation from one written language
    representation to another
  • Text translation for a limited number of simple
    sentence or phrase structures is fairly practical
    today. Many software packages available. Dont
    expect good translations of poetry!
  • Automatic text translation is already popular for
    some e-mail systems (without the speech parts!)
  • Text-to-speech conversion in the target language
  • The most fully developed part of the system.

41
Language Line Services, Inc.
  • Founded by ATT, now a separate business, has
    pursued a business development with two aspects
  • Support of automatic real-time spoken language
    translation systems research, which could reach a
    practical stage for English-Japanese within a few
    years (jointly sponsored with some Japanese
    firms). This research has been cut back in recent
    years due to economic slowdown.
  • An existing service using human
    translators/interpreters in 3-way telephone
    conference calls between 2 people who do not
    speak the same language
  • The translators (who contractually agree to
    confidentiality in all their work) are mostly in
    Monterey, CA, which has a large and varied
    foreign language speaker population due to the US
    Defense Dept. language training center located
    there.
  • The major activities of language line are
    (surprisingly) domestic rather than
    international. Typical call is from a hospital
    emergency room. A patient who speaks an unknown
    language (Gujarati?) cannot describe the symptoms
    of his/her illness or injury.
  • First step is to identify the language spoken by
    this person. Second step is to conference in an
    interpreter skilled in that particular language.
  • See www.languageline.com for more information.
    Berlitz and other translation firms have similar
    services as well.

42
Human Ear and Hearing
  • The ears each contain a sealed coiled tube called
    the cochlea, filled with a fluid. A flexible
    membrane at one end is coupled to the eardrum by
    two small bones with flexible cartilage joints.
    The cochlea is internally divided by a
    longitudinal membrane
  • Sound waves in the air vibrate the eardrum and,
    through the linkage bones, cause sound waves to
    move through the fluid in the cochlea
  • The dividing membrane is permeated with many
    nerve endings, terminating on small hair cells
    (organs of Corti) which are moved by the sound
    vibrations
  • from the Latin word for snail, which it resembles

43
Human Ear Mechanism
  • The human ear is known to contain many nerve
    endings in the cochlea. Various frequency
    components of a sound produce peak audio
    frequency standing wave amplitude at different
    locations in the cochlea. Nerve endings at or
    near each such location are therefore sensitive
    to particular frequency components of the audio
    waveform.
  • The nerves appear to respond to signal power by
    producing more electrical nerve pulsations per
    second in response to higher audio signal power
    in the vicinity of that nerve.
  • The electrical waveforms in the nerves are not a
    replica of the analog sound waveform.

44
Ear and Hearing
  • The nerves from the ear to the brain appear to
    transmit an audio power spectrum analysis of the
    sound in the ear. More frequent nerve impulses
    from certain nerves indicate portions of the
    cochlea, and thus portions of the audio frequency
    spectrum, that have high audio spectrum power.
  • This is the accepted explanation of why the ear
    is not sensitive to phase shifts or delays in
    various frequency components of a complicated
    waveform
  • The ear is most sensitive to tones near 1 kHz
    audio frequency (disclosed by Fletcher-Munson
    tests in 1930s)
  • More audio power is required at higher or lower
    frequencies to produce the same perception of
    loudness
  • Deterioration of hearing clarity is often related
    to further decrease in high and low frequency
    sensitivity, producing muffled speech
  • Exposure to extremely loud sounds can damage
    Corti (hair) cells, thus producing nerve
    deafness. Once an occupational hazard of boiler
    makers in loud factory workplaces, today nerve
    deafness is often experienced by rock band
    musicians.

45
Cochlear Implants
  • Essential deafness due to congenital conditions
    or to injuries to the Corti/hair cells can be
    partly alleviated with recently developed
    cochlear implants
  • An external electronic filtering device performs
    continual spectrum analysis of sound from a
    microphone. Signals from this analyzer, after
    coupling (via a transformer coil) to an implanted
    pickup coil under the skin near the ear, cause
    surgically implanted electrodes in the cochlea to
    electrically stimulate nerve endings at
    approximately the location estimated to respond
    to the specified frequency component of the
    signal. This produces a sensation of sound even
    in people who have always been deaf.
  • The sound is sometimes indistinct and muffled, as
    reported by patients who become deaf after having
    normal hearing. However, it is beneficial in
    conjunction with lip reading to aid in
    understanding, and perceived sound quality is
    improving each year due to research. Subjective
    sound quality has improved year by year as new
    signal processing methods come into use.
  • There is some controversy about the cochlear
    implant. Some deaf people consider it too
    complex, expensive, and risky in relation to the
    relatively small improvement in overall ability
    to hear.
  • The documentary motion picture Sound and Fury,
    released in fall 2000, has an accompanying
    Internet web page http//www.pbs.org/soundandfury
    that gives more information about cochlear
    implants, both pro and con

46
Audio Signal Processes
  • Telephone Conference Bridges
  • Combining speech from 3 or more lines into one
    conversation, without adding the background noise
    from all inputs, is not simple
  • Exploitation of Silent Intervals in Speech
  • Undersea cables in 1960s made use of time
    assignment speech interpolation (TASI) modern
    satellite circuits have a similar process called
    Digital Speech Interpolation (DSI), and GSM and
    IS-95 CDMA cellular/PCS radio uses a similar
    feature to reduce average radio interference,
    conserve battery power and increase the number of
    conversations occurring in a radio cell
  • In GSM systems, mobile handset transmit power is
    off during intervals of voice channel silence.
  • Because normal speech is about 40-60 silence
    (between syllables, phrases), a channel can be
    re-assigned to other speakers as required, thus
    carrying more traffic.
  • Echo Cancellation
  • Particularly in telephone circuits with long time
    delay (e.g. satellite links) echoes are
    particularly disturbing. Very short-time echoes
    are perceived as simultaneous side tone.

47
3-way Conference Calls
  • To establish multi-line conference calls, a
    device is needed to add signals from all relevant
    speakers
  • In a digital telephone switch, a three-port
    conference bridge is switched into the
    conversation by one of the participants who uses
    the telephone dial and cradle switch to add a 3rd
    party.
  • The audio PCM samples must be converted from
    Mu-law code into uniform binary code with 12 to
    16 bits resolution, then added, then the sum is
    converted back to Mu-law
  • This process can also be done for two inputs via
    a ROM look up table which implements in one step
    the equivalent of mugtlineargtadd two
    inputsgtreconvert to Mu law value.
  • On the line returning to each speaker, his/her
    own audio signal component is almost completely
    subtracted back out (not so for non-speakers) to
    avoid high artificial side tone.

48
Many-line Conference Bridge
  • Avoiding background noise accumulation is a
    serious problem
Write a Comment
User Comments (0)
About PowerShow.com