Syntactic Pattern Recognition - PowerPoint PPT Presentation

About This Presentation
Title:

Syntactic Pattern Recognition

Description:

(I) generate the strings (sentences) accepted by L(G), (ii) check if a sentence ... i.e. Given a string x and a grammar G, construct a triangle with the top vertex ... – PowerPoint PPT presentation

Number of Views:1440
Avg rating:3.0/5.0
Slides: 19
Provided by: meenaksh
Category:

less

Transcript and Presenter's Notes

Title: Syntactic Pattern Recognition


1
Syntactic Pattern Recognition
  • Statistical PR Find a feature vector x
  • Train a system using a set of labeled patterns
  • Classify unknown patterns
  • Ignores relational information contained in the
    structure
  • Most structural methods use hierarchical
    decomposition
  • Note similarity between a sentence structure and
    pattern description

Picture A
f
Rectangle C
A
Triangle B
b
c
g
e
C
B
a
d
edge edge edge a b c
edge edge edge edge d e
f g
2
Language
  • Alphabet is a finite set of symbols, Vx1,x2,
    ,xn
  • Sentence over B is a finite string of ordered
    symbols (left to right) from V
  • Example V a,b,c, valid sentences are abb,
    abba, aaa, null
  • Length of a sentence s, s is the number of
    symbols
  • s1os2 is the concatenation of the two sentences
  • VoVoVoV Vn is the set of all sentences with n
    symbols over V
  • VVUV2UV3. is the set of all non-empty
    sentences over V
  • V is the closure of V
  • Language is an arbitrary subset L of V
  • Example V0,1, then L1 001, 110, 111, 0,
    null is a finite language
  • L2 ss 1n021m, ngt1, 1ltmlt10 is an
    infinite language

3
Languages
  • L1oL2 ss s1s2, s1 belongs to L1 and s2
    belongs to L2 is concatenation
  • L1it ss s1s2sn, ngt0, si belongs to L1 is
    the iterate of L1
  • L1oL2 and L1it are both languages
  • Example V a,b L1 aa,ab,bb L2 a,b
  • L1oL2 a3,aba,b2a,a2b,ab2,b3
  • L1it is infinite for n0,1,2
  • s is called a sub-string of t if t usv for some
    strings u,v belonging to V
  • Every string is a substring of itself as u and/or
    v can be null

4
Grammars
  • Grammar G VT, VN, P, S has 4 entities
  • VT is a set of terminal symbols, called
    primitives or constants
  • VN is a set of non-terminal symbols, called
    variables
  • VT and VN belong to V
  • P is the set of production rules A-gtB where A
    has at least one variable and B is a mix of
    variables and constants
  • S is the starting symbol or the root S belongs
    to VN
  • L(G) is a formal language ( a set of strings)
    generated by the grammar G
  • Each string is composed of only primitives
  • Each string can be derived from S using the
    production rules P
  • Example VT a,b, VN S P S-gtaSb,
    S-gtab gt L(G) anbn, ngt1
  • Grammar is used to
  • (I) generate the strings (sentences) accepted by
    L(G), (ii) check if a sentence belongs to a
    grammar, (iii) analyze the structure of a
    sentences

5
Grammar Types
UnRestricted Grammar (UR) Context Sensitive
Grammar (CS) Context Free Grammar (CF) Finite
State Grammar (FS)
Example VT a,b,c VN S, A,
B UR CS CF FS
6
Finite State Grammars, and Graphical
Representations
  • Nodes are nonterminals in VN and an additional
    terminal node T not in V
  • Productions of type Ai-gtaAj represented by edge a
    directed from Ai to Aj
  • Productions of type Ai-gta represented by edge a
    directed from Ai to T

S
a
a
B
A
a
a
a
T
For a FS grammar G, an arbitrary string
xx1x2..xn, xi in VT is in L(G) iff there exists
at least one path (x1,x2,..,xn) from S to T
7
Syntactic Pattern Recognition
C2-class problem C1 and C2 are composed of
features from a set VT Let G be a grammar such
that L(G) consists only of sentences (patterns)
from C1 Example VT a,b VN S,A PS-gtaSb
S-gtb L(G) b anbn1, ngt1 Classification
Rule x belongs to C1 iff x belongs to L(G) x
belongs to C2 iff otherwise Classification
algorithm has to correctly answer whether or not
a given string is grammatically correct.
8
Pattern Grammars
2-class problem rectangles and other
quadilaterals Select primitives a 0o edge b
90o edge c 180o edge d 270o edge Set of
rectangles If a0, b0, c0, d0 represent unit
length lines
9
Consider, a 0o horizontal unit length b 120o
unit length c 240o unit length L(G) represents
the class of equilateral triangles What is the
grammar? Make it up from domain
knowledge There is no unique solution
10
FS Grammar solution VT a,b,c VN S, A, B,
C, D, E, F, G, H, I, J, K
CS Grammar solution VT a,b,c VN S, A, B,
C, D, E, F
11
Syntax Analysis
  • Let x be the unknown pattern. Recognition task is
    finding L(Gi) such that x belongs to L(Gi)
  • i.e. Given a string x and a grammar G, construct
    a triangle with the top vertex S and the bottom
    side x inside which will be the derivation parse
    tree
  • Top-down and Bottom-up parsing methods can be used

S
x
12
Stochastic Languages
Probabilities are associated with production
rules- stochastic grammar Stochastic language is
one obtained by such a grammar Probability of
obtaining x is
13
Tree representations
A string s1 is directly derived from string s2 in
G ( ) if there exists a rule
in G such that s1 is the result of replacing
by . In general, s is derived from the
initial symbol of G, S, if there exists a
sequence of strings from which we can derive s
from S, i.e., Parsing is the reverse of
generation
14

15

16

17

18
Write a Comment
User Comments (0)
About PowerShow.com