Title: Strings
1Strings
- A string over a set A is a finite sequence of
elements from A.
- The set of elements from which the strings are
built is called - an alphabet.
Definition. An alphabet is a nonempty, finite set
of indivisible symbols. We are going to denote
it by ?.
- Any program is a string of keywords, variable
names, and - permissible symbols.
- A programming language should satisfy general
rules (grammar) - to be understood by computer (compiler). These
rules are studied - by the formal theory of programming languages.
2Definition. A string w over an alphabet ? is a
sequence of symbols, w a1 a2 an, where each
ai? ?, 1? i ? n.
- The number of symbols is called the length of
the string, wn. - There is one special string that has zero
length (contains no symbols). - It is called the empty string and has special
notation , ?. - w0 ? w ?.
- ? is not an element of any alphabet, ?? ?.
Example. Let ? a, b, c. Find all possible
strings with length less or equal 3 built from
?.
Length 0 ?
Length 1 a b c
Length 2 aa ab ac ba bb bc ca cb cc
Length 3 aaa aab aac aba abb abc aca
acb acc
3- Two strings u and v can be concatenated to form
a single - string uv, that consists of the symbols of
string u, followed by - symbols of of string v. The length uv
u v .
- Concatenation is associative (uv)w u(vw),
- but not commutative uv ? vu (the order is
important!).
4Definition. Any set of strings over some alphabet
is called a language.
Examples Set of all executable computer programs
is a language.
Alphabet itself is a language as well (the
language of all one-symbol words).
Since languages are sets, we can apply all set
operations to languages union, intersection and
set difference. There is one operation
specific for languages concatenation of two
languages L1?L2 uv u ? L1 and v ? L2
A L1?L2 ? L2?L1
5Example. Take the alphabet ? a, b, c.
Consider two languages over alphabet ? L1 a,
ab and L2 b, bc, c. Find L1?L2 and L2?L1.
We need to take every string from L1 and
concatenate with every string from L2 . In this
way we get L1?L2 strings ab, abc, ac,
abb, abbc, abc. Note, that not all strings are
distinct, like abc. L1?L2 ab, abc, ac, abb,
abbc .
In the same way L2?L1 ba, bab, bca, bcab,
ca, cab.
The cardinality L1?L2 is the number of
distinct strings, resulting from concatenation .
In general, L1?L2 ? L1 ? L2 and L1?L2
? L2?L1 In the example L1?L2 5lt
L1 ? L2 6.
6In particular, we can consider the concatenation
of an alphabet ? with itself ??? is the
language of all two-symbol words. Notation ???
?2
Example ?a, b, ??? ?2 aa, ab, ba, bb
Similarly, ?3 ?2??, the language that consists
of all 3-symbol words ?3 aaa, aba, baa, bba,
aab, abb, bab, bbb.
So, we can define recursively for any ngt1 ?n
?n-1??
To make this recursive definition agree with the
basis case n 1, ? ?0?? , zero power ?0 is
defined as ?0 ?, (no matter what is ? ).
Then ??? ? x x ? ? x x ? ? ?
What is ? ? ?2?
What is ? ? ?2 ? ?3 ? ? ?n ?
7Kleene star notation ? ?0 ? ?1 ? ?2 ?
So, ? is the (infinite) set of all possible
words over alphabet ?, including empty string ?.
Example. ? 0, 1. ? is an infinite set of all
possible bit strings. (or all binary numbers
including numbers with leading 0s and empty
string).
Any language L over alphabet ? is a subset of ?
, L ? ? .
Note that ??? , because ? ? ? ?1,
?0.
A language L may contain ? , or may not.
8Example. Consider two languages over alphabet ?
a L1aa, L2?, aa, aaaa. What is L1?
By definition of Kleene star L1 L10 ? L11 ?
L12 ? ??aa ?aaaa ?aaaaaa ?
?, aa, aaaa, aaaaaa, infinite set of
strings of even length build from symbol a.
What is L2?
L2 L20 ? L21 ? L22 ? ???, aa,
aaaa ??, aa, aaaa, aaaaaa, aaaaaaaa?
?, aa, aaaa, aaaaaa,
L1
9Definition. A string u is called a substring of v
if there exist two strings x and y, such that v
xuy, and x, y ? ?
Definition. A string u is called a prefix of v if
there exists a string x ? ?, such that v ux.
Similarly, a string u is called a suffix of v if
there exists a string y ? ?, such that v yu.
10Theorem 1. Let A, B and C be sets of strings.
Then (A?B)?C A?C?B?C
Proof. a) We need to prove the equality of two
sets of strings. We can do it by
double-inclusion, i. e. to show that i)
(A?B)?C ? A?C?B?C and ii) A?C?B?C ? (A?B)?C
11i) To prove (A?B)?C ? A?C?B?C, its suffices to
show that for any string w, w?(A?B)?C ?
w?A?C ?B?C
Take any w? (A?B)?C
(dfn of concat)
??x, y, such that w xy and x?(A?B) and y?C
? (x?A or x?B) and y ?C (dfn of ? )
? (x?A and y?C) or (x?B and y?C) (distributive
property)
? w? A?C or w?B?C
(dfn of concat)
? w? A?C ?B?C
(dfn of ? )
12ii) To prove that A?C?B?C ?(A?B)?C, we need to
show that for any string w, w? A?C?B?C ? w
?(A?B)?C
Take any w ? A?C ?B?C
? w? A?C or w? B?C (dfn of ?)
??x, y, such that w xy and (x ?A and y ?C) or
(x ?B and y ?C) (dfn of concat)
So we can have two cases. In the first case, (x
?A and y ?C) implies that (x ?A?B and y ?C)
because A ?(A?B).
In the second case, (x ?B and y ?C) implies that
(x ?A?B and y ?C) because B ?(A?B). So, in
either case we have
? w ?(A?B)?C (dfn of concat)
So, we proved A?C?B?C ?(A?B)?C and
(A?B)?C?A?C?B?C, that means (A?B)?C A?C?B?C
13Theorem 2. Let A, B and C be sets of strings.
Then (A?B)?C ? A?C?B?C
Proof. To prove subset relation we need to show
that for any string w, w?(A?B)?C ? w?A?C?B?C.
Why not to prove A?C?B?C ? (A?B)?C as well?
Lets try. Take arbitrary w?A?C?B?C ? w?A?C and
w?B?C .
?(?x, y, wxy, x?A and y?C) and (?u,v, wuv, u?B
and v?C)
Can we imply xyuv ? x u ?
No, because the same string abc may come from
a?bc and ab?c
Example. A a, B ab, C c, bc.
Then A?B, (A?B)?C.
A?Cac, abc
B?Cabc, abbc
abc ?A?C?B?C, but we can not imply that abc
?(A?B)?C
14Using set operations to specify languages.
- The specification of a language requires an
unambiguous - description of the strings that belong to the
language.
- Set notations can be used for strict definitions
of languages. - Consider a few examples of set notations for
languages
- The language over a, b that consists of the
strings - containing the substring bb.
L1 a, b?bb?a, b The set a, b permits
any number of a's and b's to precede and follow
the occurrence of bb.
2) The language L2 consists of all strings that
begin with aa and end with bb.
L2aa?a, b?bb.
153) The language L3 consists of all strings that
begin with aa or end with bb.
L3aa?a, b?a, b?bb.
4) The set of even-length strings
L4aa, ab, bb, ba.
16Regular Languages
Regular languages are the simplest and satisfy
some restrictions.
Definition. Let ? be an alphabet. A regular
language over ? is defined recursively as
follows i) Basis ?, ?, a, for any a??
are regular. ii) Recursive Step If X and Y are
regular, then X?Y, X?Y and X are
regular languages. iii) Closure. X is regular
language over ? only if it can be obtained
from the basis elements by finite number of
applications of the recursive step.
17Example. Show that Laba, bba is regular
language.
Consider all steps a, b? are regular by
Basis.
abab is regular as concatenation of
regular languages.
baba is regular as concatenation of
regular languages.
a?ba, b is regular as the union of regular
languages
a, b is regular as Kleene closure of regular
language
aba, bba is regular as concatenation of
regular languages
All finite languages are regular. Infinite
languages may be not.
18 Regular languages are often described by
algebraic expressions called regular expressions.
Regular expressions are used to abbreviate
the specification of regular languages.
Definition. Let ? be an alphabet. A regular
expression over ? is defined recursively as
follows i) Basis ?, ?, a are regular
expressions for all a??. ii) Recursive Step
Let u and v be regular expressions over ?. Then
(uv), u?v, u are regular expressions. iii)
Closure u is a regular expression over ? only if
it can be obtained from the basis elements by
finite number of applications of the recursive
step.
19Examples of regular expressions over alphabet
?a, b ?, ?, a, b, ?a, b, aba, (ab)?a,
a?b, ab, etc.
For each regular expression E we might be able to
associate a regular language L(E) following the
following rules L(?)?, L(?)?, L(a)a, L
(RS)L(R)?L(S), L(R?S)L(R)?L(S), L(R)L(R)
20Example. Let's find the language of the regular
expression abc over ?a, b, c.
L (abc)L(a)?L(bc)
a ? L(b)?L(c)
a?b?c
So, the language described by expression abc
consist of string a and strings that start with
one b followed by any number of cs.
L a, b, bc, bcc, bccc,
21Describe the language for each of the following
regular expressions.
L1a, b
1) ab
L2a, bc
2) abc
L3c, a, ab, abb,
3) abc
L4a, b, ab, abb, , bc, bcc,
4) abbc
5) abcac
L5ac, b, ab, bc, abc, aabc,
22Distinct regular expressions may represent the
same language ab and ba represent the same
language a, b.
Two expressions R and S are considered equal if
they represent the same language, i.e. L(R)L(S).
Properties of Regular Expressions 1) properties
RTTR R??RR
RRR (RS)TR(ST)
232) ?properties R????R? R????RR (RS)T
R(ST)
3) Distributive properties R(ST)RSRT (ST)R
SR ST
4) Closure properties ??? RRR(R)RR
R? R(? R)(?R) R?R R R(R
Rk) for any k?1 R? R R2 Rk?1Rk
R for any k?1 RRR R 7) (RS)(RS
)(RS)(RS)RR(SR) R(SR)(RS)R (RS)
?(RS)S (RS)?R(RS)