Title: Application Areas
1Application Areas
- For two lectures, we will examine a number of
application areas of AI research - Visual image understanding
- Speech recognition
- Natural language processing
- mostly well look at NL understanding
- but we will briefly talk about NL generation and
machine translation - Search engine technology
- Next time, we will cover the following topics and
possibly others (specific topics to be
determined) - Homeland security
- Sensor interpretation
- Robotic vehicles
- AI in space
2Perception Problems
- Vision understanding, natural language processing
and speech recognition all have several things in
common - Each problem is so complex that it must be solved
through a series of mappings from subproblem to
subproblem - Each problem requires a great deal of knowledge
that is not necessarily available or
well-understood such that successful applications
often utilize non-knowledge-based mechanisms - Each problem contains some degree of uncertainty,
often implemented using HMMs or neural networks - Early approaches in AI were symbolic and often
suffered for several reasons - Poor run-time performance (because of the sheer
amount of knowledge needed and the slow
processors of the time) - Models that were based on our incomplete
knowledge of human (or animal) vision, auditory
abilities and language - Lack of learning so that knowledge acquisition
was essential - Research into all three areas has progressed, but
slowly
3Vision Understanding Mapping
- The vision problem is shown to the left as a
series of mappings - Low level processing and filtering to convert
from analog to digital - Pixels ? edges/lines
- Lines ? regions
- Regions ? surfaces
- Add texture, shading, contours
- Surfaces ? objects
- Classify objects
- Analyze scene (if necessary)
4A Few Details
- Computer vision has been studied for decades
- There is no single solution to the problem
- Each of the mappings has its own solution
- often mathematical, such as the edge detection
and mapping from edges to regions - often applies constraint satisfaction algorithms
to reduce the amount of search or computation
required for low level processes - The intelligence part really comes in toward
the end of the process - Object classification
- Scene analysis
- Surface and object disambiguation (determine
which object a particular surface belongs,
dealing with optical illusions) - Computer vision is practically an entire CS
discipline and so is beyond the scope of what we
can cover here (sadly)
5Edge Detection
- Waltz created an algorithm for edge detection by
- finding junction points (intersections of lines)
- determining the orientation of the lines into the
junction points - applying constraint satisfaction to select which
lines belong to which surfaces - Below, convex edges are denoted with and
concave edges with -
Other approaches may be necessary for curve,
contour or blob detection and analysis Often,
these approaches use such mathematical models as
eigen-models (eigenform, eigenface), quadratics
or superquadratics, distance measures, closest
point computations, etc
For trihedral junction points (intersection of 3
lines), these are the 18 legal connections
6Vision Sub-Applications
- Machine produced character recognition
- Solved satisfactorily through neural networks
- Hand-written character recognition
- Many solutions neural networks, genetic
algorithms, HMMs, Bayesian probabilities, nearest
neighbor matching approaches, symbolic approaches - printed character recognition highly accurate,
cursive recognition greatly varies - Face recognition
- Many solutions, often the solutions attempt to
map the faces contours and texture into
mathematical equations and Gaussian distributions - Image stabilization and image (object) tracking
- Solutions include neural networks, fuzzy logic,
best fit search - UAV input well discuss UAVs next time
7Two Approaches to Handwritten Character
Recognition
Neural Networks with voting Symbolic approach
using pattern matching
8Speech Recognition
- In spite of the fact that research began in
earnest in 1970, speech recognition is a problem
far from solved - Problems
- Multispeaker people speak at different
frequencies - Continuous speech the speech signal for a given
sound is dependent on the preceding and
succeeding sounds, and in continuous speech, it
is extremely hard to find the beginning of a new
word, making the search process even more
computationally difficult - Large vocabularies not only does this
complicate the search process because there might
be many words that match, but it also brings in
ambiguity - SR attempts
- Knowledge based approaches (particularly Hearsay)
- Neural networks
- Hidden Markov Models
- Hybrid approaches
9The Task Pictorially
The speech signal is segmented overlapping
segments are processed to create a small window
of speech Processing typically involves FFT and
Linear Predictive Coding (LPC) analysis This
provides a series of energy patterns at different
frequencies
10Phonetic Dependence
- Below are two wave forms created by uttering the
same vowel sound, ee as in three (on the left)
and tea (on the right) - notice how dissimilar the ee portion is, in
fact the one on the right is even longer - this problem is caused because of co-articulation
one sound will directly impact the next sound
11Hearsay-II
- Hearsay-II attempted to solve the problem through
symbolic problem solvers - Each called a knowledge group
- They would communicate through a global mechanism
called a blackboard - Each KG knew what part of the BB to read from and
where to post partial conclusions to - A scheduler would use a complex algorithm to
decide which KG should be invoked next based on
priority of KGs and what knowledge was currently
available on the BB
Hearsay could recognize 1000 words of continuous
speech and several speakers with a limited
syntax with an accuracy of around 90
12Sample of Hearsay-IIs Blackboard
Sentence is Are any by Feigenbaum and Feldman?
13One Neural Network Approach
- One approach is to build a separate network for
every word known in the system - For a small vocabulary isolated word system, this
approach may work - no need to worry about finding the separation
between words or the effect that a word ending
might have on the next word beginning
Notice that NN have fixed sized inputs An
input here will be the processed speech signal
in the form of LPCs
14Continuous Speech
- The preceding solution does not work for
continuous speech - Since there is no easy way to determine where one
word ends and the next begins, we cannot just
rely on word models - Instead, we need phonetic models
- The problem here is that the sound of a phoneme
is influenced by the preceding and succeeding
sounds - a neural network only learns a snapshot of data
and what we need is context dependent
- One solution is to use a recurrent NN, which
remembers the output of the previous input to
provide context or memory - Note that the RNN is much more difficult to
train, but can solve the speech problem more
effectively than the normal NN
15Another Solution Multiple Nets
Here, there are multiple neural networks for the
various levels The segmentation module is
responsible for dividing the continuous
signal into segments The unit level Generates
phonetic units The word module combines possible
phonetic units to words
16HMM Approach
- The HMM approach views speech recognition as
finding a path through a graph of connected
phonetic and/or grammar models, each of which is
an HMM - In this case, the speech signal is the
observable, and the unit of speech uttered is the
hidden state - Typically, several frames of the speech signal
are mapped into a codebook (a fancy name for
selecting one of a set of acoustic
classifications) - Separate HMMs are developed for every phonetic
unit - For instance, we might have a /d/ HMM and an /i/
HMM, etc - There may be multiple paths through a single HMM
to allow for differences in duration caused by
co-articulation and other effects
Here, the speech problem is one of working
through several layers of HMMs, at the lowest
level are the phonetic HMMs, which combine to
make up word HMMs which combine to make up
grammar HMMs
17Discrete Speech Using HMMs
- Here, digits spoken one at a time are recognized
- The HMM word model for a digit consists of 5
states any of which can repeat - each model is trained before being used to adjust
transition probabilities to the speaker - The process is simple, giving the LPC, work
through each word model and find the one which
yields the greatest probability using Viterbi
18Continuous Speech with HMM
- Many simplifications made for discrete speech do
not work for continuous speech - HMMs will have to model smaller units, possibly
phonemes - To reduce the search space, use a beam search
- To ease the word-to-word transitions, use bigrams
or unigrams
The process is similar to the previous slide
where all phoneme HMMs are searched using
Viterbi, but here, transition probabilities are
included along with more codebooks to handle the
phoneme-to-phoneme transitions and word-to-word
transitions
A successful path through the phoneme HMMs
to match the word seeks
A trained Phoneme HMM for /d/
19HMM Codebook
- HMMs need to have an observable state
- For the speech problem, the observable is a
speech signal this needs to be converted to a
state - The LPC coefficients are mapped to a codebook
using a nearest-neighbor type of search
- More codebooks mean larger HMMs (more observable
states to model) but the more accurate the
nearest-neighbor matching will be
HMM systems will commonly use at least 256
codebooks
20HMM/NN Hybrid
- The strength of the NN is in its low level
recognition ability - The strength of the HMM is in its matching
ability of the LPC values to a codebook and
selecting the right phoneme - Why not combine them?
Here, a neural network is trained and used to
determine the classification of the frames rather
than a matching codebook -- thus, the system can
learn to match better the acoustic information
to a phonetic classification Phonetic
classifications are gathered together into an
array and mapped to the HMMs
21Outstanding Problems
- Most current solutions are stochastic or neural
network and therefore exclude potentially useful
symbolic knowledge which might otherwise aid
during the recognition problem (e.g., semantics,
discourse, pragmatics) - Speech segmentation dividing the continuous
speech into individual words - Selection of the proper phonetic units we have
seen phonemes and words, but also common are
diphones, demisyllables and other possibilities
speech science still has not determined which
type of unit is the proper type of unit to model
for speech recognition - Handling intonation, stress, dialect, accent, etc
- Dealing with very large vocabularies (currently,
speech systems recognize no more than a few
thousand words, not an entire language) - Accuracy is still too low for reliability (95-98
is common)
22NLP
- We mainly look at NLU here
- How can a machine understand natural language?
- As we know, machines dont understand, so the
goal is to transform a natural language input
(speech or text) into an internal representation - the system may be one that takes that
representation and selects an action, or merely
stores it - for instance, if the NLU system is the front end
of a database, then the goal is to form a DB
query, submit it, and respond to the user with
the DB results, or if it is the front end of an
OS, the goal is to generate an OS command - Research began in the 1940s and virtually died in
the early 1950s until progress was made in the
field of linguistics in the 1960s - Since then, a number of approaches have been
tried - Symbolic
- Stochastic/probabilistic
- Neural network
23NLP
NLU on the left input text (or speech) and come
to a meaning by syntactic parsing followed
by semantic analysis NLG on the right a
planning problem how to create a sentence to
convey a given idea?
24NLU Processes
- Morphological analysis
- Syntactic parsing
- identifying grammatical categories
- top down parser
- bottom up parser
- Semantic parsing
- Identifying meaning
- template matching
- alternatives are semantic grammars and using
other forms of representations - ambiguity handled by some form of word sense
disambiguation - Discourse analysis
- Combining word meanings for a full sentence
- handling references
- applying world knowledge causality,
expectations, inferences - Pragmatic analysis
- Speech acts, cognitive states, and beliefs,
illocutionary acts
25Parsing
Recursive Transition Network parser
Augmented Transition Network parser
26An NLU Problem Ambiguity
- At the grammatical level
- words can take on multiple grammatical roles
- Our company is training workings has 3
syntactic parses - List the sales of the products produced in 1973
with the products produced in 1972 has 455
parses! - At the semantic level
- words can take on multiple meanings even within
one grammatical category - consider the sentence I made her duck
- At the pragmatic and discourse levels
- what is the meaning of it sure is cold in here
is this just a statement of discomfort, or a
request to make it warmer? - identifying the relationship for references The
chef made the boy some stew. He ate it and
thanked him who do the he and him refer to?
What does it refer to? - Some real US headlines with multiple
interpretations
27Fun Headlines
- Hospitals are Sued by 7 Foot Doctors
- Astronaut Takes Blame for Gas in Spacecraft
- New Study of Obesity Looks for Larger Test Group
- Chef Throws His Heart into Helping Feed Needy
- Include your Children when Baking Cookies
28Stochastic Solutions Offer Promise
- HMMs bayesian probabilities are often used
how? - in either case, build a corpora the probability
that a given word or a given word transition to
another word will have a specific meaning - this requires obtaining statistics on word
frequencies, collocation frequencies (phrases),
and interpretation frequencies - Problem Zipfs law
- many words appear infrequently and therefore will
have low probabilities in stochastic models - rather than using the word count to determine the
probability of a given word, rank the words by
frequency of occurrence and then use the position
in the list to compute a probability - frequency ? 1 / rank
- In addition, we need to filter collocations to
remove common phrases like and so, one of
the, etc, to obtain more reasonable frequency
rankings - otherwise, collocation frequencies will be
misleading toward very common phrases
29Other Solutions
- Symbolic solutions include
- Logic-based models for parsing and context-free
grammars to generate parsers - finite state automota such as RTNs and ATNs
- Parsing by dynamic programming
- Knowledge representation approaches and case
grammars (word models) for syntactic and semantic
parsing - Ad hoc knowledge-based approaches
- Neural network solutions include
- Parsing (combining recurrent networks and
self-organizing maps) - Parsing relative clauses using recurrent networks
- Case role assignment
- Word sense disambiguation
30NLG, Machine Translation
- NLG given a concept to relate, translate it
into a legal statement - Like NLU, a mapping process, but this time in
reverse - much more straight forward than NLU because
ambiguity is not present - but there are many ways to say something, a good
NLG will know its audience and select the proper
words through register (audience context) - a sophisticated NLG will use reference and
possibly even parts of speech - Machine Translation
- This is perhaps the hardest problem in NLP
becomes it must combine NLU and NLG - Simple word-to-word translation is insufficient
- Meaning, references, idioms, etc must all be
taken care of - Current MT systems are highly inaccurate
31Application Areas
- MS Word spell checker/corrector, grammar
checker, thesaurus - WordNet
- Search engines (more generically, information
retrieval including library searches) - Database front ends
- Question-answering systems within restricted
domains - Automated documentation generation
- News categorization/summation
- Information extraction
- Machine translation
- for instance, web page translation
- Language composition assistants help non-native
speakers with the language - On-line dictionaries
32Search Engine Technology
- Search engines generally comprise three
components - Web crawler
- simple, non-AI, traverser of web pages/sites
- given web page, accumulate all URLs, add them to
a queue or stack - retrieve next page given the URL from the queue
(breadth-first) or stack (depth-first/recursive) - convert material to text format when possible
- Summary extractor
- take web page and summarize its content (possibly
just create a bag of words, possibly attempt some
form of classification) store summary,
classification and URL in DB - note some engines only save summaries, others
store summaries and the entire web page, but only
use summaries during the search process - create index of terms to web pages (possibly a
hash table) - Search engine portal
- accept query
- find all related items in the DB via hashing
- sort using some form of rating scheme
- display URLs, titles and possibly brief summaries
33Page Categorization/Summaries
- The tricky part of the search engine is to
properly categorize or summarize a web page - Information retrieval techniques are common
- Keywords from a bag of words
- Statistical analysis to gauge similarities
between pages - Link information such as page rank, hits, hubs,
etc - Filtering
- Many web pages (e.g., stores) try to take
advantage of the syntactic nature of search
engines and place meta tags in their pages that
contain all English words - Filtering is useful in eliminating pages that
attempt such tricks - Sorting
- Once web pages have been found that match the
given query, how are they sorted? - using word count, giving extra credit if any of
the words are found in the pages title or the
link text, examine font size and style for
importance of the words in the document, etc
34Page Ranking
- Based on the idea of academic citation to
determine somethings importance - PR(A) (1 d) d (PR(T1) / C(T1)
PR(Tn)/C(Tn)) - PR(A) page rank of page A
- d a damping factor between 0 and 1 (usually
set to .85) - C(A) number of links leaving page A
- T1..Tn are the n pages that point at A
- The page rank corresponds to the principle
eigenvector of a normalized link matrix (that
is, a matrix of pages and their links) - One way to view page rank is to think of an
average web surfer who randomly is walking around
pages by clicking on links (and never clicking
the back button) - the page rank is in essence the probability that
this page will be reached randomly - the damping factor is the likelihood that the
surfer will get bored at this page and request
another random page (rather than following a
specific link of interest)
35Googles Architecture
- There are numerous distributed crawlers working
all the time - Web pages are compressed before being stored
- Each page has a unique document ID provided by
the store server - The indexer uncompresses files and parses them
into word occurrences including the position in
the document of the given word - These word occurrences are stored in barrels to
create an index of word-to-document mappings
(using ISAM) - The Sorter resorts the barrel information by word
to create a reverse index - The URL resolver converts relative URLs into
absolute URLs
36An Architecture for Improved Search
- Search engines are limited in that they do not
take advantage of personalized knowledge - This is of course understandable given that
search engines have to support millions of users - Imagine that you had your own search engine DB,
could you provide user-specific knowledge to
improve search?
Three different techniques were tried using the
architecture to the left A search engine
provides a number (millions?) of URLs, which are
downloaded and kept locally User personalized
knowledge is applied to filter the items
found locally in order to provide only those
pages most likely to be of use
37User Profiles
Partial user profile accumulated from text
files debugging 1100 dec 1785
degree 4138 def 4938 default 1349
define 2752 defined 1140
department 2587 dept 1328
description 2691 design 1780
deskset 6472 development 1403 different
2336 digital 1424 directory
2517 disk 1907 distributed 5127 Word
and frequency listed
- The first approach merely created a bag of words
annotated by their frequency - Both the words and the frequency were derived by
accumulating text in user stored files - Word counts were summed based on which words
appeared in the document this score was used to
order the retrieved pages
38User-Defined Knowledge Base
An alternative approach is to allow the user to
define his/her own knowledge-base of topics,
people, phrases and keywords that are of
importance Classification is done through a
user-defined hierarchy where each concept has its
own matching knowledge Two concepts are shown to
the left threshold means the number of matches
required for the category to establish
39Learning-Driven Approach
- Learning what bag of words best represents a
category by having viewers vote on which web
pages are relevant for a given query and which
are not - Voting causes weights of the words in the bag to
be altered - Implementations have included linear matrix
forms, perceptrons and neural networks - this approach can also be used for recommendation
systems