CSA405: Advanced Topics in NLP - PowerPoint PPT Presentation

About This Presentation
Title:

CSA405: Advanced Topics in NLP

Description:

Exercise: fill in the blanks [a:0 b:a] Expression. Language/Relation ... A aB. November 2003. Computational Morphology III. 32. Closure Properties: bncn ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 34
Provided by: michael307
Category:
Tags: nlp | advanced | csa405 | topics

less

Transcript and Presenter's Notes

Title: CSA405: Advanced Topics in NLP


1
CSA405 Advanced Topics in NLP
  • Xerox Notation

2
3 Related Perspectives
LANGUAGE
NOTATION
3
3 Related Perspectives
REGULAR LANGUAGES
denotes
encodes
compiles into
REGULAR EXPRESSIONS
4
Xerox R.E. Notation
  • Atomic Expressions
  • Normal Symbol
  • Special Symbol
  • Complex Expressions
  • Union
  • Intersection
  • Concatenation
  • Closure
  • Other Operators
  • Abbreviations

5
Atomic Expressions
  • The simplest kind of RE is a symbol. Typically, a
    symbol is the sort of item that can appear on the
    arc of a network.
  • For example, the symbol a is an RE that
    designates the language containing the string "a"
    and nothing else
  • Multicharacter symbols such as Plur are also
    symbols, but they happen to have multicharacter
    print names.

6
Special Atomic Expressions
  • The epsilon (e) symbol 0 denotes the empty string
    language "".
  • The ANY symbol ? denotes the language of all
    single symbol strings. The empty string is not
    included in ?.

7
Empty and UniversalLanguage Machines
C
A
S
A
set of strings
I
N
Q
U
E
empty language
8
Brackets
  • A denotes the same language as A
  • can also be used to denote the empty string
    language
  • Brackets ensure unique syntax but can sometimes
    be dropped.
  • Brackets not the same as ( ) which are used for
    optional elements
  • Checkpoint what are the FSAs
  • for a
  • for (a)

9
Complex REs Union
  • If A and B are arbitrary REs, A B is the
    union of A and B which denotes the union of the
    languages denoted by A and B respectively.
  • Union is associative and commutative
  • Checkpoint Write down the strings in the
    language denoted by a b ab.

10
Complex REs Intersection
  • If A and B are arbitrary REs, A B is the
    intersection of A and B which denotes the
    intersection of the languages denoted by A and B
    respectively.
  • Intersection is associative and commutative
  • Checkpoint Write down the strings in the
    language denoted by
  • a b c d e ab d e f g

11
Complex REs Concatenation
  • If A and B are arbitrary REs A B is the
    concatenation of A and B
  • Checkpoint do the following denote the same
    languages?
  • d o g
  • dog
  • d og
  • What are the strings in the language denoted by
    ab cd

12
Concatenation of 2 Networks
a
c

b
d
a
c

b
d
13
Complex REs Closures
  • A denotes the concatenation of A with itself 0
    or more times.
  • What is the FSA for a ?
  • A (Kleene Star) denotes A 0.
  • What is the FSA for a ?

14
Complex REs Closures
  • A denotes the concatenation of A with itself 0
    or more times.
  • A (Kleene Star) denotes A 0.

a
15
Other Operations
  • Complementation A denotes the complement
    language of A the set of strings not in A
  • Minus A - B denotes the set difference of the
    languages denoted by A and B. (A-B A B)
  • Checkpoint Write a definition of complementation
    involving minus

16
Abbreviations
  • A Closure (Kleene Star)
  • (A) Optional Element
  • ? Any symbol
  • \b Any symbol other than b
  • A Complement ( ? - A )
  • 0 Empty string language
  • A ? A ?

17
String Relations
  • Ordered pair set having two members lta,bgt
    (distinct from ltb,agt).
  • A relation is simply a set of ordered pairs.
  • Some familiar relations over integers
  • lt0,1gt,lt1,2gt,lt2,3gt,lt3,4gt,lt4,5gt,.
  • lt0,0gt,lt1,1gt,lt2,4gt,lt3,6gt,lt4,8gt,.

18
Relations and Morphology
  • In morphological analysis and generation, we are
    typically interested in relations made up of
    ordered pairs of strings over lexical and surface
    languages, e.g.
  • lt"dogs","dogPL"gt, lt"walked","walk"ED"gt.

19
Describing Relations
  • Notation Regular Expressions with extra
    operations including
  • Cross Product
  • Composition
  • Mechanism Finite State Transducers (i.e. FS
    networks whose arcs are labelled with ordered
    pairs of symbols).

20
3 Related Perspectives
REGULAR RELATIONS
denotes
encodes
compilesinto
XEROX R.E. NOTATION
21
Cross Product
  • A .x. B denotes the relation that pairs every
    string of language A with every string of
    language B.
  • Example c a t .x. c h a t
  • Special case and special notation when A and B
    are symbols a .x. b ab

22
Symbol Pairs
  • Any pair of symbols ab denotes a relation that
    consists of the corresponding pair of strings,
    i.e.
  • lt"a,"b"gt
  • The left symbol a is the upper, lexical, symbol
    the right symbol b is the lower, surface symbol.
  • No distinction between aa and a

23
Checkpoint
  • A abc
  • B bcd
  • What are the elements in the relation A .x. B?

24
Composition
  • A .o. B denotes the composition of relations A
    and B.
  • Definition
  • If A contains ltx,ygt
  • And B contains lty,zgt
  • Then A .o. B contains ltx,zgt
  • A and B must be relations. If either is just a
    language, it is assumed to abbreviate the
    identity relation.

25
Examples
Expression
Language/Relation
Network

?

""
a
"a"
a
(a)
"","a"
26
Exercise fill in the blanks
Language/Relation
Expression
Network
a
"","a","aa",..
a
a 0 b
a0 ba
a b0
27
Issue
  • The class of Regular Languages is closed under
    the operations union, intersection,
    concatenation, and complementatation.
  • Is the same true of Regular Relations?
  • Is the same true if we include the operations of
    cross product and composition?

28
Closure PropertiesDefinition of P
  • Consider P ab 0c.
  • P is a relation that maps a string of zero or
    more a to an equal length string of b followed by
    zero or more c
  • lt" "," "gt, lt" ","c"gt, lt" ","cc"gt, ...
    lt"a","b"gt, lt"a","bc"gt, lt"a","bcc"gt, ...
    lt"aa","b"gt, lt"aa","bbc"gt, lt"aa","bbcc"gt, ...
  • P is a regular relation

29
Closure PropertiesDefinition of Q
  • Consider Q 0b ac.
  • Q is a relation that maps a string of zero or
    more a to an equal length string of c preceded by
    zero or more b
  • lt" "," "gt, lt" ","b"gt, lt" ","bb"gt, ...
    lt"a","c"gt, lt"a","bc"gt, lt"a","bbc"gt, ...
    lt"aa","cc"gt, lt"aa","bcc"gt, lt"aa","bbcc"gt, ...
  • N.B. Q is a regular relation

30
Closure PropertiesP Q
  • Let us now consider the intersection of P and Q.
  • lt"", ""gt, lt"a","bc"gt, lt"aa","bbcc"gt,
    lt"aaa","bbbccc"gt, ....
  • The lower side language is clearly bncn
  • Is this finite state?

31
bncn
  • This language is generated by the following
    context free grammarS ? eS ? aSb
  • It cannot be generated by a regular grammar whose
    rules must take the form A ? aA ? aB

32
Closure Propertiesbncn
  • Consequently, the language cannot be generated by
    a FSA, and the same goes for any relation
    involving that language.
  • Therefore, there is no operation on the FSTs for
    P and Q that yields their intersection.
  • If not closed under intersection, then not closed
    under complementation nor subtraction.

33
Closure Properties of Regular Languages and
Relations
Operation Regular Languages Regular
Relations Union yes
yes Concatenation
yes yes Iteration
yes
yes Intersection yes
no Subtraction yes
no Complementation yes
no Composition
n/a yes
Write a Comment
User Comments (0)
About PowerShow.com