CS 3240: Languages and Computation - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

CS 3240: Languages and Computation

Description:

A DFA is a five-tuple consisting of: Alphabet. A set of states Q. A transition function d : Q Q ... set of LR(0) items consisting of the first components of all ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 55
Provided by: ble87
Category:

less

Transcript and Presenter's Notes

Title: CS 3240: Languages and Computation


1
CS 3240 Languages and Computation
  • Course Review

2
Full Compiler Structure
Scanner
  • Most compilers have two pass

Parser
Start
Semantic Action
Semantic Error
Code Generation
CODE
3
The Big Picture
  • Parsing
  • Translating code to rules of grammar. Building
    representation of code.
  • Scanning
  • Converting input text into stream of known
    objects called tokens. Simplifies parsing
    process.
  • Grammar dictates syntactic rules of language
    i.e., how legal sentence could be formed
  • Lexical rules of language dictate how legal word
    is formed by concatenating alphabet.

4
Regular Expressions
  • Symbols and alphabet
  • A symbol is a valid character in a language
  • Alphabet is set of legal symbols
  • Typically denoted as ?
  • Metacharacters/metasymbols
  • Defining reg-ex operations
  • Escape character (\)
  • Empty string ? and empty set ?
  • Basic regular expressions
  • Basic operations union, concatenation, repetition

5
DFA
  • A DFA is a five-tuple consisting of
  • Alphabet ??
  • A set of states Q
  • A transition function d Q??? ? Q
  • One start state q0
  • One or more accepting states F ? Q
  • A language accepted by a DFA is the set of
    strings such that DFA ends at an accepting state
    after processing the string

6
NFA
  • Nondeterministic Finite Automata
  • Same input may produce multiple paths
  • Allows transition with an empty string or
    transition from one state to different states
    given a character
  • NFAs and DFAs are equivalent in power
  • Proof by construction
  • They differ only in implementation detail
  • Regular languages are closed under regular
    operations
  • If a language is regular, then it can be
    described by a regular expression.

7
Pumping lemma
  • Pumping lemma
  • For every regular language L, there is a finite
    pumping length p, s.t. ? s?L and s?p, we can
    write sxyz with 1) x yi z ? L for every
    i?0,1,2,2) y ? 13) xy ? p

8
State Machines
  • Lexical analyzer is a state machine
  • State machines are very similar to finite automata

Letter Digit
Letter
2
3
Identifier
Letter Digit
9
Context-Free Grammar
  • A context-free grammar (V, S, R, S) is a grammar
    where all rules are of the form A ? x, with
    A?V and x?(V?S)
  • A string is accepted if there is a derivation
    from S to the string
  • Representation of derivation by ? or parse
    trees
  • Left-most and right-most derivations

10
Ambiguity
  • A string w?L(G) is derived ambiguously if it has
    more than one derivation tree (or equivalently
    if it has more than one leftmost derivation (or
    rightmost)).
  • A grammar is ambiguous if some strings are
    derived ambiguously.
  • Some languages are inherently ambiguous

11
Chomsky normal form
  • Method of simplifying a CFG
  • Definition A context-free grammar is in Chomsky
    normal form if every rule is of one of the
    following forms
  • A ? BC
  • A ? a
  • where a is any terminal and A is any variable,
    and B, and C are any variables or terminals other
    than the start variable
  • if S is the start variable then
  • the rule S ? e is the only permitted ? rule

12
Pushdown automata
  • Similar to finite automata, but for CFGs
  • PDAs are finite automata with a stack
  • Theorem A language is context free if and only
    if some pushdown automaton recognizes it

13
Pumping Lemma for CFL
Theorem For every context-free language L, there
is a pumping length p, such that for any string
s?L and s?p, we can write suvxyz with1) u vi
x yi z ? L for every i?0,1,2,2) vy ? 13)
vxy ? p Note that 1) implies that uxz ? L
(take i0), requirement2) says that v,y cannot
be the empty strings e and condition 3) is
useful in proving non-CFL.
14
Parser Classification
  • Parsers are broadly broken down into
  • LL - Top-down parsers
  • L - Scan left to right
  • L - Traces leftmost derivation of input string
  • LR - Bottom-up parsers
  • L - Scan left to right
  • R - Traces rightmost derivation of input string
  • LL is a subset of LR
  • Typical notation
  • LL(0), LL(1), LR(1), LR(k)
  • Number (k) refers to maximum look ahead
  • Lower is better!

15
Top-down Parsing
  • Top-down parsing
  • Recursive-descent Recursive or non-recursive
  • LL(1) parsing Table-driven, stack-based
    implementation similar to Pushdown Automata
  • Removal of left recursions
  • Why?
  • Left recursions may lead to infinite loop
  • How?
  • EBNF
  • Immediate left recursion
  • Indirect left recursion
  • Left factorizing
  • LL(1) parsing
  • First set and Follow set

16
First Set
  • Let X be a grammar symbol (a terminal or
    nonterminal) or ?. Then the set First(X) is
    defined as follows
  • If X is a terminal or ?, then First(X)X.
  • If X is nonterminal, then for each production
    rule X ? X1X2...Xn, First(X) contains
    First(X1)-?.
  • If for some iltn, First(X1),...First(Xi) all
    contain ?, then First(X) contains First(Xi1)-?
  • If First(X1),...First(Xn) all contain ?, then
    First(X) contains ?
  • First(?) for any string ? X1X2...Xn is defined
    using rules 2--4.

17
Follow Set
  • Given a nonterminal A, the set Follow(A) is
    defined as
  • If A is start symbol, then is in Follow(A)
  • If there is a production rule B??A?, then
    Follow(A) contains First(?)-?
  • If there is a production rule B??A? and ???, then
    Follow(A) contains Follow(?)
  • Notes
  • is needed to indicate end of string
  • ? is never member of Follow set

18
LR Parsing
  • Traverse rightmost derivation in reverse order
  • Also uses a stack
  • Main actions of LR parsing Shift and reduce
  • LR(0)
  • What are LR(0) items?
  • How is DFA for LR(0) defined?
  • Simple LR(1)
  • How does SLR(1) extends LR(0)?
  • General LR(1)
  • What are LR(1) items?
  • How is DFA for LR(1) items defined?
  • Their corresponding parsing algorithms
  • Conflicts of actions

19
General LR(1) Parsing
  • Also called canonical LR(1)
  • More complex than SLR(1)
  • An efficient variation is LALR(1)
  • Key difference between General LR(1) and SLR(1)
    is that lookahead is built into DFA
  • LR(1) items differ from LR(0) items, as they
    include a single lookahead token in each item
  • e.g., A ? .?, a

20
GLR(1) Parsing Algorithm
  • If statecontains any item A??.X?,a and current
    input token is X, then
  • shift X
  • push new state ?(A??.X?,a,X)
  • If state contains any item A??.,a and next
    input token is a, then
  • remove ? and corresponding states from stack
  • If rule is S?S.,, then accept if input token
    is
  • backup the DFA to the state s from which
    construction of ? began
  • push A and new state ?(s,A) onto stack

21
LALR(1) Parsing
  • LR(1) has too high complexity (too many states)
  • How to reduce number of states?
  • If LR(1) items in two states differ only by
    lookahead variables, then merge the states
  • Definition The core of a state of the DFA of
    LR(1) items is the set of LR(0) items consisting
    of the first components of all LR(1) items in the
    state.

22
Principles of LALR(1) Parsing
  • First principle (observation)
  • The core of a state of the LR(1) DFA is a state
    of the LR(0) DFA
  • Second principle (observation)
  • Given two states s1 and s2 of the LR(1) DFA with
    the same core, if there is a transition
    t1?(s1,X), then there is a transition t2?(s2,X)
    where t1 and t2 have the same core
  • Based on these principles, we merge LR(1) states
    with the same cores with a set of lookahead
    symbols in each item

23
Grammar Relationships
24
Attribute Grammars
  • Attributes Property of a programming language
    construct
  • Data type, value of expressions, etc.
  • Attribute grammar collection of attribute
    equations or semantic rules associated with the
    grammar rules of a language
  • Each attribute equation in general has form
    A.a f (X1.a1, X1.a2,..., X1.ak, ...
    Xm.a1, Xm.a2,..., Xm.ak)

25
Dependencies of Attribute Equations
  • Synthesization
  • Attribute of LHS depends on attributes of RHS
  • E.g., arithmetic expressions
  • Inheritance
  • Attribute is inherited from attributes of LHS
    to RHS or between symbols of RHS

26
Equivalence between S- and L-Attributed Grammars
  • Theorem Given an attribute grammar, all
    inherited attributes can be changed into
    synthesized attributes by modification of the
    grammar without changing the language.
  • However, it may be difficult to change the
    grammar in practice.

27
Symbol Table
  • Compilers use symbol table to keep track of
    various names encountered in program
  • Symbol table entries
  • Main fields Name, Attributes
  • Attribute-field contains various info, including
    binding information, type of name etc.
  • Interact with all phases of compilation
  • Basic operations insert, lookup, delete

28
Type Expressions
  • The type of a language construct will be denoted
    by a type- expression.
  • Type-expressions are either basic types or they
    are constructed from basic types using type
    constructors.
  • Basic types boolean, char, integer, real,
    type_error, void
  • array(I,T) where T is a type-expression and I is
    an integer-range. E.g. int A10has the type
    expression array(0,..,9,integer)
  • We can take cartesian products of
    type-expressions. E.g.
  • struct entry char letter int value is of
    type (letter x char) x (value x integer)

29
Type Expressions,II
  • Pointers.int aaaa is of type pointer(integer).
  • Functions
  • int divide(int i, int j)is of type integer x
    integer ? integer
  • Representing type expressions as trees
  • e.g. char x char ? pointer(integer)

?
x
pointer
char
integer
char
30
Type Systems
  • A Type-system collection of rules for assigning
    type-expressions to the variable parts of a
    program.
  • A type-checker implements a type-system.
  • It is most convenient to implement a type-checker
    within the semantic rules of a syntax-directed
    definition (and thus it will be implemented
    during translation).
  • Many checks can be done statically (at
    compilation).
  • Not all checks can be done statically. E.g. int
    A10 int i printf(d,Ai)

31
Formal definition of TM
  • Definition A Turing machine is a 7-tuple
    (Q,?,?,?,q0,qaccept,qreject), where Q, ?, and ?
    are finite sets and
  • Q is the set of states,
  • ? is the input alphabet not containing the
    special blank symbol
  • ? is tape alphabet, where ????,
  • ? Q???Q???L,R is the transition function,

32
Formal definition of a TM
  • Definition A Turing machine is a 7-tuple
    (Q,?,?,?,q0,qaccept,qreject), where Q, ?, and ?
    are finite sets and
  • q0?Q is the start state,
  • qaccept?Q is the accept state, and
  • qreject?Q is the reject state, where
    qreject?qaccept

33
TM configurations
  • The configuration of a Turing machine is the
    current setting
  • Current state
  • Current tape contents
  • Current tape location
  • Notation uqv
  • Current state q
  • Current tape contents uv
  • Only symbols after last symbol of v
  • Current tape location first symbol of v

34
Equivalence of machines
  • Theorem Every multitape Turing machine has an
    equivalent single tape Turing machine
  • Proof method By construction.

35
Equivalence of machines
  • Theorem Every nondeterministic Turing machine
    has an equivalent deterministic Turing machine
  • Proof method construction
  • Proof idea Use a 3-tape Turing machine to
    deterministically simulate the nondeterministic
    TM. First tape keeps copy of input, second tape
    is computation tape, third tape keeps track of
    choices.

36
Decidability
  • A language is decidable if some Turing machine
    decides it
  • Not all languages are decidable
  • How to show a language is decidable?
  • Write a decider that decides it
  • Accepts w iff w is in the language
  • Must halt on all inputs

37
Summary of Decidable Languages
38
Turing Machine Acceptance Problem
  • Consider the following language
  • ATMltM,wgt M is a TM that accepts w
  • Theorem ATM is Turing-recognizable
  • Theorem ATM is undecidable
  • Proof idea Construct a universal Turing machine
    recognizes, but does not decide, ATM

39
Undecidable languages
Turing recognizable
Co-Turing recognizable
Decidable
40
The Halting Problem HALTTM
  • HALTTM ltM,wgt M is a TM and M halts on input
    w
  • Theorem HALTTM is undecidable
  • Proof Idea (by contradiction)
  • Show that if HALTTM is decidable then ATM is also
    decidable

41
Reductions and Decidability
  • To prove a language is decidable, we have
    converted it to another language and used the
    decidability of that language
  • Example use decidability of EDFA to determine
    decidability of EQDFA
  • Thus, we reduce the problem of determining if
    EQDFA is decidable to the problem of determining
    if EDFA is decidable

42
Reductions and Undecidability
  • To prove a language is undecidable, we have
    assumed its decidable and found a contradiction
  • Example assume decidability of HALTTM and show
    ATM is decidable which is a contradiction
  • In each case, we have to do a computation to
    convert one problem to another problem
  • What kind of computations can we do?

43
Rices Theorem
  • Determining whether a TM satisfies any
    non-trivial property is undecidable
  • A property is non-trivial if
  • It depends only on the language of M, and
  • Some, but not all, Turing machines have the
    property
  • Examples Is L(M) regular? A CFG? Finite?

44
Linear Bounded Automata
  • LBA definition TM that is prohibited from moving
    head off right side of input.
  • machine prevents such a move, just like a TM
    prevents a move off left of tape
  • How many possible configurations for a LBA M on
    input w with wn, m states, and p??
  • counting gives mnp

45
Mapping Reducibility
  • Definition Language A is mapping reducible to
    language B, written A?mB, if there is a
    computable function f? ? ?, where for every w,
  • w ? A iff f(w) ? B
  • The function f is called the reduction of A to B.

B
A
46
Applications of Mapping Reductions
  • If A ?m B and B is decidable, then A is decidable
  • If A ?m B and A is undecidable, then B is
    undecidable
  • If A ?m B and B is Turing-recognizable, then A is
    Turing-recognizable
  • Equivalently, A ?m B
  • If A ?m B and A is not Turing-recognizable, then
    B is not Turing-recognizable

47
Complexity relationships
  • Theorem Let t(n) be a function, where t(n) ? n.
    Then every t(n) time multitape TM has an
    equivalent O(t2(n)) time single-tape TM
  • Proof idea Consider structure of equivalent
    single-tape TM. Analyzing behavior shows each
    step on multi-tape machine takes O(t(n)) on
    single tape machine

48
Determinism vs. non-determinism
  • Definition Let P be a non-deterministic Turing
    machine. The running time of P is the function
    fN?N, where f(n) is the maximum number of steps
    that P uses on any branch of its computation in
    any input of length n.

49
NP-completeness
  • A problem C is NP-complete if finding a
    polynomial-time solution for C would imply PNP
  • Definition A language B is NP-complete if it
    satisfies two conditions
  • B is in NP, and
  • Every A in NP is polynomial time reducible to B

50
Cook-Levin theorem
  • SAT ltBgtB is a satisfiable Boolean expression
  • Theorem SAT is NP-complete
  • If SAT can be solved in polynomial time then any
    problem in NP can be solved in polynomial time

51
Showing a problem in NP-complete
  • Two steps to proving a problem L is NP-complete
  • Show the problem is in NP
  • Demonstrate there is a polynomial time verifier
    for the problem
  • Show some NP-complete problem can be polynomially
    reduced to L

52
Space Complexity Classes
Definition Let fN?N be a function. The space
complexity classes SPACE(f(n)) and NSPACE(f(n))
are the following sets of languages SPACE(f(n))
L there is a TM that decides the
language L in space O(f(n))
NSPACE(f(n)) L there is a
nondeterministic
TM that decides the
language L in space O(f(n))
53
Savitchs Theorem
Theorem For any function fN?N with f(n)?nwe
have NSPACE(f(n)) ? SPACE((f(n))2). In other
words Nondeterminism does not give you much
extra for space complexity classes. Compare this
with our (lack of) understanding ofthe time
complexity classes TIME and NTIME.
54
A Hierarchy of Classes
EXPTIME
PSPACE
NP
P
P ? NP ? PSPACENPSPACE ? EXPTIME
We dont know how to prove P?PSPACEor
NP?EXPTIME. But we do know P?EXPTIME
Write a Comment
User Comments (0)
About PowerShow.com