Application Areas

About This Presentation

Title:

Application Areas

Description:

mostly we'll look at NL understanding ... Next time, we will cover the following ... which object a particular surface belongs, dealing with optical illusions) ... – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 40

Provided by: NKU

Learn more at: https://www.nku.edu

Category:

more less

Transcript and Presenter's Notes

Title: Application Areas

1
Application Areas

For two lectures, we will examine a number of
application areas of AI research
Visual image understanding
Speech recognition
Natural language processing
mostly well look at NL understanding
but we will briefly talk about NL generation and
machine translation
Search engine technology
Next time, we will cover the following topics and
possibly others (specific topics to be
determined)
Homeland security
Sensor interpretation
Robotic vehicles
AI in space

2
Perception Problems

Vision understanding, natural language processing
and speech recognition all have several things in
common
Each problem is so complex that it must be solved
through a series of mappings from subproblem to
subproblem
Each problem requires a great deal of knowledge
that is not necessarily available or
well-understood such that successful applications
often utilize non-knowledge-based mechanisms
Each problem contains some degree of uncertainty,
often implemented using HMMs or neural networks
Early approaches in AI were symbolic and often
suffered for several reasons
Poor run-time performance (because of the sheer
amount of knowledge needed and the slow
processors of the time)
Models that were based on our incomplete
knowledge of human (or animal) vision, auditory
abilities and language
Lack of learning so that knowledge acquisition
was essential
Research into all three areas has progressed, but
slowly

3
Vision Understanding Mapping

The vision problem is shown to the left as a
series of mappings
Low level processing and filtering to convert
from analog to digital
Pixels ? edges/lines
Lines ? regions
Regions ? surfaces
Add texture, shading, contours
Surfaces ? objects
Classify objects
Analyze scene (if necessary)

4
A Few Details

Computer vision has been studied for decades
There is no single solution to the problem
Each of the mappings has its own solution
often mathematical, such as the edge detection
and mapping from edges to regions
often applies constraint satisfaction algorithms
to reduce the amount of search or computation
required for low level processes
The intelligence part really comes in toward
the end of the process
Object classification
Scene analysis
Surface and object disambiguation (determine
which object a particular surface belongs,
dealing with optical illusions)
Computer vision is practically an entire CS
discipline and so is beyond the scope of what we
can cover here (sadly)

5
Edge Detection

Waltz created an algorithm for edge detection by
finding junction points (intersections of lines)
determining the orientation of the lines into the
junction points
applying constraint satisfaction to select which
lines belong to which surfaces
Below, convex edges are denoted with and
concave edges with -

Other approaches may be necessary for curve,
contour or blob detection and analysis Often,
these approaches use such mathematical models as
eigen-models (eigenform, eigenface), quadratics
or superquadratics, distance measures, closest
point computations, etc
For trihedral junction points (intersection of 3
lines), these are the 18 legal connections
6
Vision Sub-Applications

Machine produced character recognition
Solved satisfactorily through neural networks
Hand-written character recognition
Many solutions neural networks, genetic
algorithms, HMMs, Bayesian probabilities, nearest
neighbor matching approaches, symbolic approaches
printed character recognition highly accurate,
cursive recognition greatly varies
Face recognition
Many solutions, often the solutions attempt to
map the faces contours and texture into
mathematical equations and Gaussian distributions
Image stabilization and image (object) tracking
Solutions include neural networks, fuzzy logic,
best fit search
UAV input well discuss UAVs next time

7
Two Approaches to Handwritten Character
Recognition
Neural Networks with voting Symbolic approach
using pattern matching
8
Speech Recognition

In spite of the fact that research began in
earnest in 1970, speech recognition is a problem
far from solved
Problems
Multispeaker people speak at different
frequencies
Continuous speech the speech signal for a given
sound is dependent on the preceding and
succeeding sounds, and in continuous speech, it
is extremely hard to find the beginning of a new
word, making the search process even more
computationally difficult
Large vocabularies not only does this
complicate the search process because there might
be many words that match, but it also brings in
ambiguity
SR attempts
Knowledge based approaches (particularly Hearsay)
Neural networks
Hidden Markov Models
Hybrid approaches

9
The Task Pictorially
The speech signal is segmented overlapping
segments are processed to create a small window
of speech Processing typically involves FFT and
Linear Predictive Coding (LPC) analysis This
provides a series of energy patterns at different
frequencies
10
Phonetic Dependence

Below are two wave forms created by uttering the
same vowel sound, ee as in three (on the left)
and tea (on the right)
notice how dissimilar the ee portion is, in
fact the one on the right is even longer
this problem is caused because of co-articulation
one sound will directly impact the next sound

11
Hearsay-II

Hearsay-II attempted to solve the problem through
symbolic problem solvers
Each called a knowledge group
They would communicate through a global mechanism
called a blackboard
Each KG knew what part of the BB to read from and
where to post partial conclusions to
A scheduler would use a complex algorithm to
decide which KG should be invoked next based on
priority of KGs and what knowledge was currently
available on the BB

Hearsay could recognize 1000 words of continuous
speech and several speakers with a limited
syntax with an accuracy of around 90
12
Sample of Hearsay-IIs Blackboard
Sentence is Are any by Feigenbaum and Feldman?
13
One Neural Network Approach

One approach is to build a separate network for
every word known in the system
For a small vocabulary isolated word system, this
approach may work
no need to worry about finding the separation
between words or the effect that a word ending
might have on the next word beginning

Notice that NN have fixed sized inputs An
input here will be the processed speech signal
in the form of LPCs
14
Continuous Speech

The preceding solution does not work for
continuous speech
Since there is no easy way to determine where one
word ends and the next begins, we cannot just
rely on word models
Instead, we need phonetic models
The problem here is that the sound of a phoneme
is influenced by the preceding and succeeding
sounds
a neural network only learns a snapshot of data
and what we need is context dependent

One solution is to use a recurrent NN, which
remembers the output of the previous input to
provide context or memory
Note that the RNN is much more difficult to
train, but can solve the speech problem more
effectively than the normal NN

15
Another Solution Multiple Nets
Here, there are multiple neural networks for the
various levels The segmentation module is
responsible for dividing the continuous
signal into segments The unit level Generates
phonetic units The word module combines possible
phonetic units to words
16
HMM Approach

The HMM approach views speech recognition as
finding a path through a graph of connected
phonetic and/or grammar models, each of which is
an HMM
In this case, the speech signal is the
observable, and the unit of speech uttered is the
hidden state
Typically, several frames of the speech signal
are mapped into a codebook (a fancy name for
selecting one of a set of acoustic
classifications)
Separate HMMs are developed for every phonetic
unit
For instance, we might have a /d/ HMM and an /i/
HMM, etc
There may be multiple paths through a single HMM
to allow for differences in duration caused by
co-articulation and other effects

Here, the speech problem is one of working
through several layers of HMMs, at the lowest
level are the phonetic HMMs, which combine to
make up word HMMs which combine to make up
grammar HMMs
17
Discrete Speech Using HMMs

Here, digits spoken one at a time are recognized
The HMM word model for a digit consists of 5
states any of which can repeat
each model is trained before being used to adjust
transition probabilities to the speaker
The process is simple, giving the LPC, work
through each word model and find the one which
yields the greatest probability using Viterbi

18
Continuous Speech with HMM

Many simplifications made for discrete speech do
not work for continuous speech
HMMs will have to model smaller units, possibly
phonemes
To reduce the search space, use a beam search
To ease the word-to-word transitions, use bigrams
or unigrams

The process is similar to the previous slide
where all phoneme HMMs are searched using
Viterbi, but here, transition probabilities are
included along with more codebooks to handle the
phoneme-to-phoneme transitions and word-to-word
transitions
A successful path through the phoneme HMMs
to match the word seeks
A trained Phoneme HMM for /d/
19
HMM Codebook

HMMs need to have an observable state
For the speech problem, the observable is a
speech signal this needs to be converted to a
state
The LPC coefficients are mapped to a codebook
using a nearest-neighbor type of search

More codebooks mean larger HMMs (more observable
states to model) but the more accurate the
nearest-neighbor matching will be

HMM systems will commonly use at least 256
codebooks
20
HMM/NN Hybrid

The strength of the NN is in its low level
recognition ability
The strength of the HMM is in its matching
ability of the LPC values to a codebook and
selecting the right phoneme
Why not combine them?

Here, a neural network is trained and used to
determine the classification of the frames rather
than a matching codebook -- thus, the system can
learn to match better the acoustic information
to a phonetic classification Phonetic
classifications are gathered together into an
array and mapped to the HMMs
21
Outstanding Problems

Most current solutions are stochastic or neural
network and therefore exclude potentially useful
symbolic knowledge which might otherwise aid
during the recognition problem (e.g., semantics,
discourse, pragmatics)
Speech segmentation dividing the continuous
speech into individual words
Selection of the proper phonetic units we have
seen phonemes and words, but also common are
diphones, demisyllables and other possibilities
speech science still has not determined which
type of unit is the proper type of unit to model
for speech recognition
Handling intonation, stress, dialect, accent, etc
Dealing with very large vocabularies (currently,
speech systems recognize no more than a few
thousand words, not an entire language)
Accuracy is still too low for reliability (95-98
is common)

22
NLP

We mainly look at NLU here
How can a machine understand natural language?
As we know, machines dont understand, so the
goal is to transform a natural language input
(speech or text) into an internal representation
the system may be one that takes that
representation and selects an action, or merely
stores it
for instance, if the NLU system is the front end
of a database, then the goal is to form a DB
query, submit it, and respond to the user with
the DB results, or if it is the front end of an
OS, the goal is to generate an OS command
Research began in the 1940s and virtually died in
the early 1950s until progress was made in the
field of linguistics in the 1960s
Since then, a number of approaches have been
tried
Symbolic
Stochastic/probabilistic
Neural network

23
NLP
NLU on the left input text (or speech) and come
to a meaning by syntactic parsing followed
by semantic analysis NLG on the right a
planning problem how to create a sentence to
convey a given idea?
24
NLU Processes

Morphological analysis
Syntactic parsing
identifying grammatical categories
top down parser
bottom up parser
Semantic parsing
Identifying meaning
template matching
alternatives are semantic grammars and using
other forms of representations
ambiguity handled by some form of word sense
disambiguation
Discourse analysis
Combining word meanings for a full sentence
handling references
applying world knowledge causality,
expectations, inferences
Pragmatic analysis
Speech acts, cognitive states, and beliefs,
illocutionary acts

25
Parsing
Recursive Transition Network parser
Augmented Transition Network parser
26
An NLU Problem Ambiguity

At the grammatical level
words can take on multiple grammatical roles
Our company is training workings has 3
syntactic parses
List the sales of the products produced in 1973
with the products produced in 1972 has 455
parses!
At the semantic level
words can take on multiple meanings even within
one grammatical category
consider the sentence I made her duck
At the pragmatic and discourse levels
what is the meaning of it sure is cold in here
is this just a statement of discomfort, or a
request to make it warmer?
identifying the relationship for references The
chef made the boy some stew. He ate it and
thanked him who do the he and him refer to?
What does it refer to?
Some real US headlines with multiple
interpretations

27
Fun Headlines

Hospitals are Sued by 7 Foot Doctors
Astronaut Takes Blame for Gas in Spacecraft
New Study of Obesity Looks for Larger Test Group
Chef Throws His Heart into Helping Feed Needy
Include your Children when Baking Cookies

28
Stochastic Solutions Offer Promise

HMMs bayesian probabilities are often used
how?
in either case, build a corpora the probability
that a given word or a given word transition to
another word will have a specific meaning
this requires obtaining statistics on word
frequencies, collocation frequencies (phrases),
and interpretation frequencies
Problem Zipfs law
many words appear infrequently and therefore will
have low probabilities in stochastic models
rather than using the word count to determine the
probability of a given word, rank the words by
frequency of occurrence and then use the position
in the list to compute a probability
frequency ? 1 / rank
In addition, we need to filter collocations to
remove common phrases like and so, one of
the, etc, to obtain more reasonable frequency
rankings
otherwise, collocation frequencies will be
misleading toward very common phrases

29
Other Solutions

Symbolic solutions include
Logic-based models for parsing and context-free
grammars to generate parsers
finite state automota such as RTNs and ATNs
Parsing by dynamic programming
Knowledge representation approaches and case
grammars (word models) for syntactic and semantic
parsing
Ad hoc knowledge-based approaches

Neural network solutions include
Parsing (combining recurrent networks and
self-organizing maps)
Parsing relative clauses using recurrent networks
Case role assignment
Word sense disambiguation

30
NLG, Machine Translation

NLG given a concept to relate, translate it
into a legal statement
Like NLU, a mapping process, but this time in
reverse
much more straight forward than NLU because
ambiguity is not present
but there are many ways to say something, a good
NLG will know its audience and select the proper
words through register (audience context)
a sophisticated NLG will use reference and
possibly even parts of speech
Machine Translation
This is perhaps the hardest problem in NLP
becomes it must combine NLU and NLG
Simple word-to-word translation is insufficient
Meaning, references, idioms, etc must all be
taken care of
Current MT systems are highly inaccurate

31
Application Areas

MS Word spell checker/corrector, grammar
checker, thesaurus
WordNet
Search engines (more generically, information
retrieval including library searches)
Database front ends
Question-answering systems within restricted
domains
Automated documentation generation
News categorization/summation
Information extraction
Machine translation
for instance, web page translation
Language composition assistants help non-native
speakers with the language
On-line dictionaries

32
Search Engine Technology

Search engines generally comprise three
components
Web crawler
simple, non-AI, traverser of web pages/sites
given web page, accumulate all URLs, add them to
a queue or stack
retrieve next page given the URL from the queue
(breadth-first) or stack (depth-first/recursive)
convert material to text format when possible
Summary extractor
take web page and summarize its content (possibly
just create a bag of words, possibly attempt some
form of classification) store summary,
classification and URL in DB
note some engines only save summaries, others
store summaries and the entire web page, but only
use summaries during the search process
create index of terms to web pages (possibly a
hash table)
Search engine portal
accept query
find all related items in the DB via hashing
sort using some form of rating scheme
display URLs, titles and possibly brief summaries

33
Page Categorization/Summaries

The tricky part of the search engine is to
properly categorize or summarize a web page
Information retrieval techniques are common
Keywords from a bag of words
Statistical analysis to gauge similarities
between pages
Link information such as page rank, hits, hubs,
etc
Filtering
Many web pages (e.g., stores) try to take
advantage of the syntactic nature of search
engines and place meta tags in their pages that
contain all English words
Filtering is useful in eliminating pages that
attempt such tricks
Sorting
Once web pages have been found that match the
given query, how are they sorted?
using word count, giving extra credit if any of
the words are found in the pages title or the
link text, examine font size and style for
importance of the words in the document, etc

34
Page Ranking

Based on the idea of academic citation to
determine somethings importance
PR(A) (1 d) d (PR(T1) / C(T1)
PR(Tn)/C(Tn))
PR(A) page rank of page A
d a damping factor between 0 and 1 (usually
set to .85)
C(A) number of links leaving page A
T1..Tn are the n pages that point at A
The page rank corresponds to the principle
eigenvector of a normalized link matrix (that
is, a matrix of pages and their links)
One way to view page rank is to think of an
average web surfer who randomly is walking around
pages by clicking on links (and never clicking
the back button)
the page rank is in essence the probability that
this page will be reached randomly
the damping factor is the likelihood that the
surfer will get bored at this page and request
another random page (rather than following a
specific link of interest)

35
Googles Architecture

There are numerous distributed crawlers working
all the time
Web pages are compressed before being stored
Each page has a unique document ID provided by
the store server
The indexer uncompresses files and parses them
into word occurrences including the position in
the document of the given word
These word occurrences are stored in barrels to
create an index of word-to-document mappings
(using ISAM)
The Sorter resorts the barrel information by word
to create a reverse index
The URL resolver converts relative URLs into
absolute URLs

36
An Architecture for Improved Search

Search engines are limited in that they do not
take advantage of personalized knowledge
This is of course understandable given that
search engines have to support millions of users
Imagine that you had your own search engine DB,
could you provide user-specific knowledge to
improve search?

Three different techniques were tried using the
architecture to the left A search engine
provides a number (millions?) of URLs, which are
downloaded and kept locally User personalized
knowledge is applied to filter the items
found locally in order to provide only those
pages most likely to be of use
37
User Profiles
Partial user profile accumulated from text
files debugging 1100 dec 1785
degree 4138 def 4938 default 1349
define 2752 defined 1140
department 2587 dept 1328
description 2691 design 1780
deskset 6472 development 1403 different
2336 digital 1424 directory
2517 disk 1907 distributed 5127 Word
and frequency listed

The first approach merely created a bag of words
annotated by their frequency
Both the words and the frequency were derived by
accumulating text in user stored files
Word counts were summed based on which words
appeared in the document this score was used to
order the retrieved pages

38
User-Defined Knowledge Base
An alternative approach is to allow the user to
define his/her own knowledge-base of topics,
people, phrases and keywords that are of
importance Classification is done through a
user-defined hierarchy where each concept has its
own matching knowledge Two concepts are shown to
the left threshold means the number of matches
required for the category to establish
39
Learning-Driven Approach

Learning what bag of words best represents a
category by having viewers vote on which web
pages are relevant for a given query and which
are not
Voting causes weights of the words in the bag to
be altered
Implementations have included linear matrix
forms, perceptrons and neural networks
this approach can also be used for recommendation
systems