Title: Syntactic Pattern Recognition
1Syntactic Pattern Recognition
- Statistical PR Find a feature vector x
- Train a system using a set of labeled patterns
- Classify unknown patterns
- Ignores relational information contained in the
structure - Most structural methods use hierarchical
decomposition - Note similarity between a sentence structure and
pattern description -
Picture A
f
Rectangle C
A
Triangle B
b
c
g
e
C
B
a
d
edge edge edge a b c
edge edge edge edge d e
f g
2Language
- Alphabet is a finite set of symbols, Vx1,x2,
,xn - Sentence over B is a finite string of ordered
symbols (left to right) from V - Example V a,b,c, valid sentences are abb,
abba, aaa, null - Length of a sentence s, s is the number of
symbols - s1os2 is the concatenation of the two sentences
- VoVoVoV Vn is the set of all sentences with n
symbols over V - VVUV2UV3. is the set of all non-empty
sentences over V - V is the closure of V
- Language is an arbitrary subset L of V
- Example V0,1, then L1 001, 110, 111, 0,
null is a finite language - L2 ss 1n021m, ngt1, 1ltmlt10 is an
infinite language
3Languages
- L1oL2 ss s1s2, s1 belongs to L1 and s2
belongs to L2 is concatenation - L1it ss s1s2sn, ngt0, si belongs to L1 is
the iterate of L1 - L1oL2 and L1it are both languages
- Example V a,b L1 aa,ab,bb L2 a,b
- L1oL2 a3,aba,b2a,a2b,ab2,b3
- L1it is infinite for n0,1,2
- s is called a sub-string of t if t usv for some
strings u,v belonging to V - Every string is a substring of itself as u and/or
v can be null
4Grammars
- Grammar G VT, VN, P, S has 4 entities
- VT is a set of terminal symbols, called
primitives or constants - VN is a set of non-terminal symbols, called
variables - VT and VN belong to V
- P is the set of production rules A-gtB where A
has at least one variable and B is a mix of
variables and constants - S is the starting symbol or the root S belongs
to VN - L(G) is a formal language ( a set of strings)
generated by the grammar G - Each string is composed of only primitives
- Each string can be derived from S using the
production rules P - Example VT a,b, VN S P S-gtaSb,
S-gtab gt L(G) anbn, ngt1 - Grammar is used to
- (I) generate the strings (sentences) accepted by
L(G), (ii) check if a sentence belongs to a
grammar, (iii) analyze the structure of a
sentences
5Grammar Types
UnRestricted Grammar (UR) Context Sensitive
Grammar (CS) Context Free Grammar (CF) Finite
State Grammar (FS)
Example VT a,b,c VN S, A,
B UR CS CF FS
6Finite State Grammars, and Graphical
Representations
- Nodes are nonterminals in VN and an additional
terminal node T not in V - Productions of type Ai-gtaAj represented by edge a
directed from Ai to Aj - Productions of type Ai-gta represented by edge a
directed from Ai to T
S
a
a
B
A
a
a
a
T
For a FS grammar G, an arbitrary string
xx1x2..xn, xi in VT is in L(G) iff there exists
at least one path (x1,x2,..,xn) from S to T
7Syntactic Pattern Recognition
C2-class problem C1 and C2 are composed of
features from a set VT Let G be a grammar such
that L(G) consists only of sentences (patterns)
from C1 Example VT a,b VN S,A PS-gtaSb
S-gtb L(G) b anbn1, ngt1 Classification
Rule x belongs to C1 iff x belongs to L(G) x
belongs to C2 iff otherwise Classification
algorithm has to correctly answer whether or not
a given string is grammatically correct.
8Pattern Grammars
2-class problem rectangles and other
quadilaterals Select primitives a 0o edge b
90o edge c 180o edge d 270o edge Set of
rectangles If a0, b0, c0, d0 represent unit
length lines
9Consider, a 0o horizontal unit length b 120o
unit length c 240o unit length L(G) represents
the class of equilateral triangles What is the
grammar? Make it up from domain
knowledge There is no unique solution
10FS Grammar solution VT a,b,c VN S, A, B,
C, D, E, F, G, H, I, J, K
CS Grammar solution VT a,b,c VN S, A, B,
C, D, E, F
11Syntax Analysis
- Let x be the unknown pattern. Recognition task is
finding L(Gi) such that x belongs to L(Gi) - i.e. Given a string x and a grammar G, construct
a triangle with the top vertex S and the bottom
side x inside which will be the derivation parse
tree - Top-down and Bottom-up parsing methods can be used
S
x
12Stochastic Languages
Probabilities are associated with production
rules- stochastic grammar Stochastic language is
one obtained by such a grammar Probability of
obtaining x is
13Tree representations
A string s1 is directly derived from string s2 in
G ( ) if there exists a rule
in G such that s1 is the result of replacing
by . In general, s is derived from the
initial symbol of G, S, if there exists a
sequence of strings from which we can derive s
from S, i.e., Parsing is the reverse of
generation
14 15 16 17 18