For Wednesday - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

For Wednesday

Description:

Top Down: Search the space of possible derivations of S (e.g.depth first) for ... NP N the man N saw (dead end) NP V the man V saw. NP V Det man Det the ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 40
Provided by: maryelai
Category:

less

Transcript and Presenter's Notes

Title: For Wednesday


1
For Wednesday
  • Read chapter 22, sections 4-6
  • Homework
  • Chapter 18, exercise 7

2
Program 4
  • Any questions?

3
Model Neuron(Linear Threshold Unit)
  • Neuron modelled by a unit (j) connected by
    weights, wji, to other units (i)
  • Net input to a unit is defined as
  • netj S wji oi
  • Output of a unit is a threshold function on the
    net input
  • 1 if netj gt Tj
  • 0 otherwise

4
MultiLayer Neural Networks
  • Multilayer networks can represent arbitrary
    functions, but building an effective learning
    method for such networks was thought to be
    difficult.
  • Generally networks are composed of an input
    layer, hidden layer, and output layer and
    activation feeds forward from input to output.
  • Patterns of activation are presented at the
    inputs and the resulting activation of the
    outputs is computed.
  • The values of the weights determine the function
    computed.
  • A network with one hidden layer with a sufficient
    number of units can represent any boolean
    function.

5
Basic Problem
  • General approach to the learning algorithm is to
    apply gradient descent.
  • However, for the general case, we need to be able
    to differentiate the function computed by a unit
    and the standard threshold function is not
    differentiable at the threshold.

6
Differentiable Threshold Unit
  • Need some sort of nonlinear output function to
    allow computation of arbitary functions by
    mulitlayer networks (a multilayer network of
    linear units can still only represent a linear
    function).
  • Solution Use a nonlinear, differentiable output
    function such as the sigmoid or logistic function
  • oj 1/(1 e-(netj - Tj) )
  • Can also use other functions such as tanh or a
    Gaussian.

7
Error Measure
  • Since there are mulitple continuous outputs, we
    can define an overall error measure
  • E(W) 1/2 ( S S (tkd - okd)2)
  • d?D k?K
  • where D is the set of training examples, K is
    the set of output units, tkd is the target output
    for the kth unit given input d, and okd is
    network output for the kth unit given input d.

8
Gradient Descent
  • The derivative of the output of a sigmoid unit
    given the net input is
  • oj/ netj oj(1 - oj)
  • This can be used to derive a learning rule which
    performs gradient descent in weight space in an
    attempt to minimize the error function.
  • ?wji -?(?E / ?wji)

9
Backpropogation Learning Rule
  • Each weight wji is changed by
  • ?wji ?djoi
  • dj oj (1 - oj) (tj - oj) if j is an output unit
  • dj oj (1 - oj) Sdk wkj otherwise
  • where h is a constant called the learning rate,
  • tj is the correct output for unit j,
  • dj is an error measure for unit j.
  • First determine the error for the output units,
    then backpropagate this error layer by layer
    through the network, changing weights
    appropriately at each layer.

10
Backpropogation Learning Algorithm
  • Create a three layer network with N hidden units
    and fully connect input units to hidden units and
    hidden units to output units with small random
    weights.
  • Until all examples produce the correct output
    within e or the meansquared error ceases to
    decrease (or other termination criteria)
  • Begin epoch
  • For each example in training set do
  • Compute the network output for this example.
  • Compute the error between this output and the
    correct output.
  • Backpropagate this error and adjust weights
    to decrease this error.
  • End epoch
  • Since continuous outputs only approach 0 or 1 in
    the limit, must allow for some eapproximation to
    learn binary functions.

11
Comments on Training
  • There is no guarantee of convergence, may
    oscillate or reach a local minima.
  • However, in practice many large networks can be
    adequately trained on large amounts of data for
    realistic problems.
  • Many epochs (thousands) may be needed for
    adequate training, large data sets may require
    hours or days of CPU time.
  • Termination criteria can be
  • Fixed number of epochs
  • Threshold on training set error

12
Representational Power
  • Multilayer sigmoidal networks are very
    expressive.
  • Boolean functions Any Boolean function can be
    represented by a two layer network by simulating
    a twolayer ANDOR network. But number of
    required hidden units can grow exponentially in
    the number of inputs.
  • Continuous functions Any bounded continuous
    function can be approximated with arbitrarily
    small error by a twolayer network. Sigmoid
    functions provide a set of basis functions from
    which arbitrary functions can be composed, just
    as any function can be represented by a sum of
    sine waves in Fourier analysis.
  • Arbitrary functions Any function can be
    approximated to arbitarary accuracy by a
    threelayer network.

13
Sample Learned XOR Network
3.11
6.96
-7.38
-2.03
B
-5.24
A
-3.58
-5.57
-3.6
-5.74
X
Y
  • Hidden unit A represents (X Ù Y)
  • Hidden unit B represents (X Ú Y)
  • Output O represents A Ù B
  • (X Ù Y) Ù (X Ú Y)
  • X Å Y

14
Hidden Unit Representations
  • Trained hidden units can be seen as newly
    constructed features that rerepresent the
    examples so that they are linearly separable.
  • On many real problems, hidden units can end up
    representing interesting recognizable features
    such as voweldetectors, edgedetectors, etc.
  • However, particularly with many hidden units,
    they become more distributed and are hard to
    interpret.

15
Input/Output Coding
  • Appropriate coding of inputs and outputs can make
    learning problem easier and improve
    generalization.
  • Best to encode each binary feature as a separate
    input unit and for multivalued features include
    one binary unit per value rather than trying to
    encode input information in fewer units using
    binary coding or continuous values.

16
I/O Coding cont.
  • Continuous inputs can be handled by a single
    input by scaling them between 0 and 1.
  • For disjoint categorization problems, best to
    have one output unit per category rather than
    encoding n categories into log n bits. Continuous
    output values then represent certainty in various
    categories. Assign test cases to the category
    with the highest output.
  • Continuous outputs (regression) can also be
    handled by scaling between 0 and 1.

17
Neural Net Conclusions
  • Learned concepts can be represented by networks
    of linear threshold units and trained using
    gradient descent.
  • Analogy to the brain and numerous successful
    applications have generated significant interest.
  • Generally much slower to train than other
    learning methods, but exploring a rich hypothesis
    space that seems to work well in many domains.
  • Potential to model biological and cognitive
    phenomenon and increase our understanding of real
    neural systems.
  • Backprop itself is not very biologically
    plausible

18
Natural Language Processing
  • Whats the goal?

19
Communication
  • Communication for the speaker
  • Intention Decided why, when, and what
    information should be transmitted. May require
    planning and reasoning about agents' goals and
    beliefs.
  • Generation Translating the information to be
    communicated into a string of words.
  • Synthesis Output of string in desired modality,
    e.g.text on a screen or speech.

20
Communication (cont.)
  • Communication for the hearer
  • Perception Mapping input modality to a string of
    words, e.g. optical character recognition or
    speech recognition.
  • Analysis Determining the information content of
    the string.
  • Syntactic interpretation (parsing) Find correct
    parse tree showing the phrase structure
  • Semantic interpretation Extract (literal)
    meaning of the string in some representation,
    e.g. FOPC.
  • Pragmatic interpretation Consider effect of
    overall context on the meaning of the sentence
  • Incorporation Decide whether or not to believe
    the content of the string and add it to the KB.

21
Ambiguity
  • Natural language sentences are highly ambiguous
    and must be disambiguated.
  • I saw the man on the hill with the telescope.
  • I saw the Grand Canyon flying to LA.
  • I saw a jet flying to LA.
  • Time flies like an arrow.
  • Horse flies like a sugar cube.
  • Time runners like a coach.
  • Time cars like a Porsche.

22
Syntax
  • Syntax concerns the proper ordering of words and
    its effect on meaning.
  • The dog bit the boy.
  • The boy bit the dog.
  • Bit boy the dog the
  • Colorless green ideas sleep furiously.

23
Semantics
  • Semantics concerns of meaning of words, phrases,
    and sentences. Generally restricted to literal
    meaning
  • plant as a photosynthetic organism
  • plant as a manufacturing facility
  • plant as the act of sowing

24
Pragmatics
  • Pragmatics concerns the overall commuinicative
    and social context and its effect on
    interpretation.
  • Can you pass the salt?
  • Passerby Does your dog bite?
  • Clouseau No.
  • Passerby (pets dog) Chomp!
  • I thought you said your dog didn't bite!!
  • ClouseauThat, sir, is not my dog!

25
Modular Processing
Speech recognition
Parsing
acoustic/ phonetic
syntax
semantics
pragmatics
Sound waves
words
Parse trees
literal meaning
meaning
26
Examples
  • Phonetics
  • grey twine vs. great wine
  • youth in Asia vs. euthanasia
  • yawanna gt do you want to
  • Syntax
  • I ate spaghetti with a fork.
  • I ate spaghetti with meatballs.

27
More Examples
  • Semantics
  • I put the plant in the window.
  • Ford put the plant in Mexico.
  • The dog is in the pen.
  • The ink is in the pen.
  • Pragmatics
  • The ham sandwich wants another beer.
  • John thinks vanilla.

28
Formal Grammars
  • A grammar is a set of production rules which
    generates a set of strings (a language) by
    rewriting the top symbol S.
  • Nonterminal symbols are intermediate results that
    are not contained in strings of the language.
  • S gt NP VP
  • NP gt Det N
  • VP gt V NP

29
  • Terminal symbols are the final symbols (words)
    that compose the strings in the language.
  • Production rules for generating words from part
    of speech categories constitute the lexicon.
  • N gt boy
  • V gt eat

30
Context-Free Grammars
  • A contextfree grammar only has productions with
    a single symbol on the lefthand side.
  • CFG S gt NP V
  • NP gt Det N
  • VP gt V NP
  • not CFG A B gt C
  • B C gt F G

31
Simplified English Grammar
  • S gt NP VP S gt VP
  • NP gt Det Adj N NP gt ProN NP gt PName
  • VP gt V VP gt V NP VP gt VP PP
  • PP gt Prep NP
  • Adj gt e Adj gt Adj Adj
  • Lexicon
  • ProN gt I ProN gt you ProN gt he ProN gt she
  • Name gt John Name gt Mary
  • Adj gt big Adj gt little Adj gt blue Adj gt
    red
  • Det gt the Det gt a Det gt an
  • N gt man N gt telescope N gt hill N gt saw
  • Prep gt with Prep gt for Prep gt of Prep gt in
  • V gt hit Vgt took Vgt saw V gt likes

32
Parse Trees
  • A parse tree shows the derivation of a sentence
    in the language from the start symbol to the
    terminal symbols.
  • If a given sentence has more than one possible
    derivation (parse tree), it is said to be
    syntactically ambiguous.

33
(No Transcript)
34
(No Transcript)
35
Syntactic Parsing
  • Given a string of words, determine if it is
    grammatical, i.e. if it can be derived from a
    particular grammar.
  • The derivation itself may also be of interest.
  • Normally want to determine all possible parse
    trees and then use semantics and pragmatics to
    eliminate spurious parses and build a semantic
    representation.

36
Parsing Complexity
  • Problem Many sentences have many parses.
  • An English sentence with n prepositional phrases
    at the end has at least 2n parses.
  • I saw the man on the hill with a telescope on
    Tuesday in Austin...
  • The actual number of parses is given by the
    Catalan numbers
  • 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796...

37
Parsing Algorithms
  • Top Down Search the space of possible
    derivations of S (e.g.depthfirst) for one that
    matches the input sentence.
  • I saw the man.
  • S gt NP VP
  • NP gt Det Adj N
  • Det gt the
  • Det gt a
  • Det gt an
  • NP gt ProN
  • ProN gt I

VP gt V NP V gt hit V gt took V gt saw
NP gt Det Adj N Det gt the
Adj gt e N gt man
38
Parsing Algorithms (cont.)
  • Bottom Up Search upward from words finding
    larger and larger phrases until a sentence is
    found.
  • I saw the man.
  • ProN saw the man ProN gt I
  • NP saw the man NP gt ProN
  • NP N the man N gt saw (dead end)
  • NP V the man V gt saw
  • NP V Det man Det gt the
  • NP V Det Adj man Adj gt e
  • NP V Det Adj N N gt man
  • NP V NP NP gt Det Adj N
  • NP VP VP gt V NP
  • S S gt NP VP

39
Bottomup Parsing Algorithm
  • function BOTTOMUPPARSE(words, grammar) returns
    a parse tree
  • forest ? words
  • loop do
  • if LENGTH(forest) 1 and
    CATEGORY(forest1) START(grammar) then
  • return forest1
  • else
  • i ? choose from 1...LENGTH(forest)
  • rule ? choose from RULES(grammar)
  • n ? LENGTH(RULERHS(rule))
  • subsequence ? SUBSEQUENCE(forest, i, in1)
  • if MATCH(subsequence, RULERHS(rule)) then
  • foresti...in1 / MAKENODE(RULELHS(rul
    e), subsequence)
  • else fail
  • end
Write a Comment
User Comments (0)
About PowerShow.com