Title: 91'304 Foundations of Theoretical Computer Science
191.304 Foundations of (Theoretical) Computer
Science
- David Martin
- dm_at_cs.uml.edu
This work is licensed under the Creative Commons
Attribution-ShareAlike License. To view a copy of
this license, visit http//creativecommons.org/lic
enses/by-sa/2.0/ or send a letter to Creative
Commons, 559 Nathan Abbott Way, Stanford,
California 94305, USA.
2g/re/p
- What does grep do?
- (int float)_rec.emp becomes
- (?)(int float)_rec(?)emp(?)
- What does it mean?
- How does it work?
- Regular expression ! NFA ! DFA ! state reduction
- Then run DFA against each line of input, printing
out the lines that it accepts
3This course so far
- 1.1 Introduction to languages DFAs
- 1.2 NFAs and DFAs recognize the same class of
languages - 1.3 REX generates the same class of languages
- Three different programming "languages" specified
in different levels of formality that solve the
same types of computational problems - Four, if you count GNFAs
- Five, if you count UFAs
4Strategies
- If you're investigating a property of regular
languages, then as soon as you know L 2 REG, you
know there are DFAs, NFAs, Regexes that describe
it. Use whatever representation is convenient - But sometimes you're investigating the properties
of the programs themselves changing states,
adding a to a regex, etc. Then the knowledge
that other representations exist might be
relevant and might not
5All finite languages are regular
- Theorem (not in book) FIN µ REG
- Proof Suppose L 2 FIN.
- Then either L , or L s1, s2, ?, sn where
n2N and each si2?. - A regular expression describing L is, therefore,
either or - s1 s2 ? sn QED
- Note that this proof does not work for n1
6Picture so far
ALL
Each point is a language in this Venn
diagram REG L(DFA) L(NFA) L(REX)
L(UFA) L(GNFA) ? FIN
REG
is there a language out here?
FIN
"the class of languages generated by DFAs"
71.4 Nonregular languages
- For each possible language L,
- µ L. So is the smallest language. And is
regular - L µ ?. So ? is the largest language. And ? is
regular - Yet there are languages in between these two
extremes that are not regular
8A nonregular language
- B 0n 1n n 0
- ?, 01, 0011, 000111, ?
- is not regular
- Why?
- Q how many bits of memory would a DFA need in
order to recognize B? - A there appears to be no single number of bits
that's big enough to work for every element of B - Remember, the DFA needs to reject all strings
that are not in B
9Other examples
- C w20,1 n0(w) n1(w)
- Needs to count a potentially unbounded number of
'0's... so nonregular - D w20,1 n01(w) n10(w)
- Needs to count a potentially unbounded number of
'01' substrings... so ?? - Need a technique for establishing nonregularity
that is more formal and... less intuitive?
10Proving nonregularity
- To prove a language that a language is
nonregular, you have to show that no DFA
whatsoever recognizes the language - Not just the DFA that is your best effort at
recognizing the language - The pumping lemma can be used to do that
- The pumping lemma says that every regular
language satisfies the "regular pumping property"
(RPP) - Given this, if we can show that a language like B
doesn't satisfy the RPP, then it's not regular
11Pumping lemma, informally
- Roughly "if a regular language contains any
'long' strings, then it contains infinitely many
strings" - Start with a regular language and suppose that
some DFA M(Q,?,?,q0,F) for it has Q10 states. - What if M accepts some particular string s where
sc1c2?c15 so that s15?
q0
12Pigeonhole principle
- With 15 input characters, the machine will visit
at most 16 states - But there are only 10 states in this machine
- So clearly it will visit at least one of its
states more than once - Let rpt be our name for the first state that is
visited multiple times on that particular input s - Let acc be our name for the accepting that s
leads to, namely, ?(q0,s) acc - Let y be our name for the leftmost substring of s
for which ?(rpt, y)rpt - Since there are no ? transitions in a DFA, a
state being "visited multiple times" means that
it read at least one character. Therefore, y gt
0
13sequence of states that M visits after
readingthe characters below
gt0
10
After reading c1? c10 (first 10 chars of s), M
must have already been to state rpt and returned
to it at least once... because there are only 10
states in M. Of course the repetition could have
been encountered earlier than 10 characters too...
14sequence of states that M visits after
readingthe characters below
gt0
10
Assigning new names to the pieces of s...
15sequence of states that M visits after
readingthe characters below
gt0
10
Assigning new names to the pieces of s... So s
xyz as shown above. With these names, the other
constraints can be written y gt 0 xy 10
16M accepts other strings too
17M accepts other strings too
- Consider the string xz
- ?(q0,x) rpt
- ?(rpt,z) acc (from previous slide)
- So xz 2 L(M) too
18M accepts other strings too
- Consider the string xyyz
- ?(q0,xy)rpt (from 2 slides ago)
- ? (rpt,y)rpt (from same previous result)
- ? (rpt,z)acc (from same previous result)
- So xyyz2 L(M) also
- Apparently we can repeat y as many times as we
want
19p-regular-pumpable strings
- Definition (not in textbook) A string s is said
to be p-regular-pumpable in a language L µ ? if
there exist x,y,z 2 ? such that - sxyz ("x,y,z are a decomposition of s")
- ygt0
- xy p
- For all i 0,
- x yi z 2 L ("the y part of s can be pumped
to produce other strings in the language") - It follows that s must be a member of L for it to
be p-pumpable - The 15-character string s in the previous example
was 10-pumpable in L(M)
20p-regular-pumpable languages
- Definition A language L is p-regular-pumpable if
- for every s 2 L such that s p, the string s
is p-pumpable in L - in other words, "every long enough string in L is
pumpable" - Our previous example language was
15-regular-pumpable
21RPP(p) and RPP
- Definition RPP(p) is the class of languages that
are p-regular-pumpable. In other words,RPP(p)
Lµ? L is p-regular-pumpable - Definition RPP is the class of languages that are
p-regular pumpable for some p. In other
words, - Lots of notation and apparent complexity, but the
idea is simple RPP is the class of languages in
which every long string is pumpable
22Pumping lemma
- Theorem 1.37 (rephrased) If Lµ? is recognized
by a p-state DFA, then L 2 RPP(p) - Proof Just like our example, but use p instead of
the constant 15 (number of states) - Corollaries
- REG µ RPP
-
Primary application of Pumping Lemma
23Proving a language nonregular
- First unravel these definitions, but it amounts
to proving that L is not a member of RPP. Then
it follows that L isn't regular - Proving that L isn't in RPP allows you to
concentrate on the language rather than
considering all possible proposed programs that
might recognize it
24Unraveling RPP a direct rephrasing
- Rephrasing L is a member of RPP if
- There exists p0 such that
- For every s2L satisfying s p,
- There exist x,y,z 2 ? such that
- sxyz
- ygt0
- xy p
- For all i 0,
- x yi z 2 L
(9 p) (8 s) (9 x,y,z) (8 i) !!!Pretty complicated
25Question from last time
- (Question) Didn't you earlier say "regular
languages are closed under concatenation"? - (Answer) No, I wrote that REG is closed under
concatenation - Subtle but important distinction. REG (the class
of all regular languages) is closed under
language concatenation - If A,B2REG then AB2REG
- That does not mean that each regular language is
itself closed under string concatenation - 10, 1 2 REG but 101 10, 1
26Nonregularity proof by contradiction
- Claim Let B 0n 1n n 0 . Then B is not
regular - Proof We show that B is not a member of RPP by
contradiction. - So assume that B 2 RPP (and hope to reach a
contradiction soon). Then there exists p 0
associated with the definition in RPP. - We let s 0p 1p. (Not the exact same variable
as in the RPP property, but an example of one
such possible setting of it.) Now we know that s
2 B because it has the right form.
27Proof continued
- Now s 2p p. By assumption that B 2 RPP,
there exist x,y,z such that - sxyz ( 0p 1p, remember)
- ygt0
- xy p
- For all i 0,
- x yi z 2 B
- Part (3) implies that xy 2 0 because the first
p-many characters of sxyz are all 0 - So y consists solely of '0' characters
- ... at least one of them, according to (2)
28Proof continued
- But consider
- s xyz xy1z 0p 1p (where we started)
- y consists of one or more '0' characters
- so xy2z contains more '0' characters than '1'
characters. In other words, - xy2z 0py 1p
- so xy2z B 0n 1n n 0 .
- This contradicts part (4)!!
- Since the contradiction followed merely from the
assumption that B2RPP (and right and meet and
true reasoning about which we have no doubt),
that assumption must be wrong QED
29Observations
- We needed (and got) a contradiction that was a
necessary consequence of the assumption that B 2
RPP and then relied on the Theorem 1.37
corollaries - RPP mainly concerns strings that are longer than
p - So you should concentrate on strings longer than
p... - even though p is a variable. But clearly
0p1pgtp - In our example we didn't "do" much after our
initial choice of s and thinking about the
implications we found a contradiction right away - Many other choices of s would work, but many
don't, and even some that do work require more
complex argumentsfor example, s0bp/2c1
1bp/2c1 - Choosing s wisely is usually the most important
thing
30Picture so far
ALL
Each point is a language in this Venn diagram
RPP
We'll see anexample later
0(101)
REG
0101, ?
FIN
B 0n 1n n 0
31More on contradictions
- Consider this shortcut attempt to prove that B
0n 1n n 0 is not regular - Proof Suppose B2 RPP. By RPP,
- There exists p0 such that
- For every s2B satisfying s p,
- There exist x,y,z 2 ? such that
- sxyz
- ygt0
- xy p
- For all i 0,
- x yi z 2 B
- So let s (1010)p. Then s B, which is
inconsistent with the RPP statement.
Contradiction??
NO
32Simplifying RPP proofs
- I find it easier to forget about contradiction
proofs and instead prove directly that a language
is not in RPP - So we need a direct, formal version of of the
statement that L RPP
33Unraveling RPP (repeat)
- Rephrasing L is a member of RPP if
- There exists p0 such that
- For every s2L satisfying s p,
- There exist x,y,z 2 ? such that
- sxyz
- ygt0
- xy p
- For all i 0,
- x yi z 2 L
(9 p) (8 s) (9 x,y,z) (8 i) !!!Pretty complicated
34Unraveling non-RPP
- Rephrasing L is not in RPP if
- For every p0
- There exists some s2L satisfying s p such
that - For every x,y,z 2 ? satisfying 1-3
- sxyz,
- ygt0, and
- xy p
- There exists some i 0 for which
- x yi z L
(8 p) (9 s) (8 x,y,z) (9 i) Still complicated
but you don't have to use contradiction now
35A direct proof of nonregularity
- Let Dan2 n0 ?,a1,a4,a9, ? ('a' is just
some character). Then D is not regular. - Proof idea The pumping lemma says there's a
fixed-size loop in any DFA that accepts long
strings. You can repeat the characters in that
loop as many times as you want to get longer
strings that the machine accepts. Each time you
add a repetition you grow the pumped string by a
constant length. - But the spacing between strings in D above keeps
changing it's never constant. So D doesn't have
the pumping property.
36A direct proof of nonregularity
- Let Dan2 n0 ?,a1,a4,a9, ?. Then D is
not in RPP and thus not regular. - Proof Let p0 and set sa(p1)2. Then s2D and
sgtp (so such an s certainly exists). - Now let x,y,z2? be any strings satisfying
- xyz s a(p1)2
- ygt0, and
- xy p
- Our goal is to produce some i such that xyiz D
37Direct proof continued
- (We'll actually show that xy0z D)
- Observe that yaj for some 1 j p, so
- xy0z a(p1)2-j lt (p1)2
- Since j p we know that -j -p and thus
- xy0z (p1)2 - j
- (p1)2 - p
- p2 p 1
- gt p2
- In other words, xy0z has gt p2 characters and lt
(p1)2 characters. So xy0z is not a perfect
square and thus xy0z D. QED
38Direct or contradiction proof?
- Both work fine... it's your choice
- But you must clearly state what you are doing
- If proof by contradiction, say so
- If direct proof, say so
39Game theory formulation
- The direct proof technique can be formulated as a
two-player game - You are the player who wants to establish that L
is not pumpable - Your opponent wants to make it difficult for you
to succeed - Both of you have to play by the rules
40Game theory continued
- The game has just four steps.
- Your opponent picks p0
- You pick s2L such that s p
- Your opponent chooses x,y,z 2 ? such that sxyz,
ygt0, and xy p - You produce some i 0 such that xyiz L
41Game theory continued
- If you are able to succeed through step 4, then
you have won only one round of the game - Like winning one round of Tic-tac-toe
- Do example for a member of D
- To show that a language is not in RPP you must
show that you can always win, regardless of your
opponent's legal moves - Realize that the opponent is free to choose the
most inconvenient or difficult p and x,y,z
imaginable that are consistent with the rules
42Game theory continued
- So you have to present a strategy for always
winning and convincingly argue that it will
always win - So your choices in steps 2 4 have to depend on
the opponent's choices in steps 1 3 - And you don't know what the opponent will choose
- So your choices need to be framed in terms of the
variables p, x, y, z
43Game theory continued
- Ultimately it is not very different from the
direct proof - But it states clearly what choices you may make
and what you may not a common cause of errors
in proofs - Repeat previous proof in this framework
44A direct proof of nonregularity
Step 1, opponent's choice
Step 2, your choice and reasoning
- Let Dan2 n0 ?,a1,a4,a9, ?. Then D is
not in RPP and thus not regular. - Proof Let p0 and set sa(p1)2. Then s2D and
sgtp (so such an s certainly exists). - Now let x,y,z2? be any strings satisfying
- xyz s a(p1)2
- ygt0, and
- xy p
- Our goal is to produce some i such that xyiz D
Step 3, opponent's choice
45Direct proof continued
- (We'll actually show that xy0z D)
- Observe that yaj for some 1 j p, so
- xy0z a(p1)2-j lt (p1)2
- Since j p we know that -j -p and thus
- xy0z (p1)2 - j
- (p1)2 - p
- p2 p 1
- gt p2
- In other words, xy0z has gt p2 characters and lt
(p1)2 characters. So xy0z is not a perfect
square and thus xy0z D. QED
Step 4, your choice
Step 4, your reasoning