Compilers and Language Translation - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Compilers and Language Translation

Description:

How does a parser know how to construct the parse tree? ... It is possible to construct two parse trees of x=x y z using the 2nd grammar. ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 52
Provided by: VIP
Category:

less

Transcript and Presenter's Notes

Title: Compilers and Language Translation


1
Chapter 9
  • Compilers and Language Translation

2
The Compilation Process
  • Phase I Lexical analysis
  • Phase II Parsing
  • Phase III Semantics and code generation
  • Phase IV Code Optimization

3
Introduction
  • High-level languages are more difficult to
    translate than assembly languages.
  • Assembly language and machine language are
    related 1-to-1.
  • The relationship between a high-level language
    and machine language is 1-to-many.

4
Compiler
  • The piece of software that translates high-level
    programming language codes into machine language
    codes.
  • Two distinct goals of compiler
  • Correctness
  • Efficient and conciseExample 2x02x12x50000

5
The Compilation Process
Scanner
Parser
Code Generator
Optimizer
Objectfile
6
Lexical Analysis
  • The compiler examines the individual characters
    in the source program and groups them into
    syntactical units, called tokens, that will be
    analyzed in succeeding stages.
  • Analogous to grouping letters into words prior to
    analyzing text.

7
Parsing
  • During this stage the sequence of tokens formed
    by the scanner is checked to see whether it is
    syntactically correct according to the rules of
    the programming language.
  • Equivalent to checking whether the words in the
    text form grammatically correct sentences.

8
Semantic Analysis and Code Generation
  • If the high-level language statement is
    structurally correct, then the compiler analyzes
    its meaning and generates the proper sequence of
    machine language instructions to carry out these
    actions.

9
Code Optimization
  • The compiler takes the generated code and see
    whether it can be made more efficient, either by
    making it run faster, or having it occupy less
    memory.

10
Phase I Lexical Analysis
  • Scanner, or lexical analyzer, groups input
    characters into tokens.
  • Examplea b 319 - delta
  • The scanner discards nonessential characters,
    such as blanks and tabs, and the group the
    remaining characters into high-level syntactic
    symbols such as symbols, numbers, and operators.

11
Token Classifications
  • Token type Classification number
  • symbol 1
  • number 2
  • Others (3),(4),-(5),(6) (7), if(8), else
    (9), ( 10, ) 11

12
Phase II Parsing
  • During the parsing phase, a compiler determines
    whether the tokens recognized by the scanner fit
    together in a grammatically meaningful way.
  • Analogous to the operation of diagramming a
    sentence.

13
Example
  • To prove the sequence of words
  • The man bit the dog
  • is a correctly formed sentence.

14
Another Example
  • The man bit the

15
Programming Language Example
  • Statement a b c

16
Parse Tree
  • The structure shown in the previous example is
    called a parse tree.
  • It starts from the individual tokens a,,b,,c
    and show how these tokens can be grouped together
    into predefined grammatical categories such as
    ltsymbolgt, ltaddition operatorgt and ltexpressiongt
    until the desired goal is reached. (in this case,
    ltassignment statementgt)

17
Grammars, Languages and BNF
  • How does a parser know how to construct the parse
    tree?
  • The parser must be given a formal description of
    the syntax, the grammatical structure, of the
    language that it is going to analyze.
  • Most widely used notation for representing the
    syntax of programming language is called BNF, an
    acronym for Backus-Naur form.

18
BNF
  • The syntax of a language is specified as a set of
    rules, also called productions.
  • The entire collection of rules is called a
    grammar.
  • BRN ruleleft-hand sidedefinition

19
BNF Example
  • ltassignment statementgtltsymbolgtltexpressiongt
  • The rule says that the syntactical construct
    called ltassignment statementgt is defined as a
    ltsymbolgt followed by the token followed by the
    syntactical construct called ltexpressiongt

20
Terminal/Nonterminals
  • BNF uses two types of objects on the right hand
    side of a productions
  • Terminals actual tokens of the language
    recognized and returned by a scanner.
  • Nonterminals an intermediate grammatical
    category used to help explain and organize the
    language.

21
Goal Symbol
  • The goal symbol is the highest-level nonterminal.
  • When goal symbol has been produced, the parser
    has finished building the tree, and the
    statements have been successfully parsed.
  • The collection of all statements that can be
    successfully parsed is called the language
    defined by a grammar.

22
Meta-symbols
  • Meta-symbol used to describe the characteristics
    of another language.
  • BNF has five meta-symbols
  • lt
  • gt
  • OR, Exltdigitgt0123456789
  • L null string
  • Exltsigned integergt ltsigngtltnumbergt ltsigngt
    -L

23
Fundamental Rule of Parsing
  • If, by repeated applications of the rules of the
    grammar, a parser can convert the sequence of
    input tokens into the goal symbol, then that
    sequence of tokens is a syntactically valid
    statement of the language.

24
Example
  • A three-rule grammar
  • ltsentencegtltnoungtltverbgt
  • ltnoungt beesdogs
  • ltverbgtbuzzbite
  • Example 1 Dogs bite.
  • Example 2 Bees dogs.

25
Another Example
  • Grammar for a simplified assignment statement
  • ltassignment statementgtltvariablegtltexpressiongt
  • ltexpressiongtltvariablegtltvariablegtltvariablegt
  • ltvariablegt xyz

26
Generated Parse Tree
27
Wrong Path
28
How to parse?
  • The process of parser is a complex sequence of
    applying rules, building grammatical constructs,
    seeing whether things are moving toward the
    correct answer (the goal symbol). If not, undo
    the rule just applied and try another.
  • Look-ahead parsing algorithm looking down the
    road a few tokens to see what would happen if a
    certain choice were made.

29
Example
Not possible to build a parse tree with the
grammar.
30
Major Challenge
  • Design a grammar that
  • Includes every valid statement that we want to be
    in the language
  • Excludes every invalid statement that we do not
    want to be in the language

31
Assignment Statement (2nd try)
  • ltassignment statementgtltvariablegtltexpressiongt
  • ltexpressiongtltvariablegtltexpressiongtltexpression
    gt (recursive definition)
  • ltvariablegt xyz

32
Resulting Parse Tree
33
Using Recursive Definition
34
Validity vs. Ambiguity
  • It is possible to construct two parse trees of
    xxyz using the 2nd grammar.? Two different
    meanings.
  • X(xy)z xx(yz)

35
If-else grammar
36
Parse Tree
37
Phase III Semantics and Code Generation
  • ltsentencegtltnoungtltverbgt
  • ltnoungt beesdogs
  • ltverbgtbuzzbite
  • Possible combinations
  • Dogs bite.
  • Dogs bark.
  • Bees bite.
  • Bees bark.
  • Not all combinations make sense.

38
Semantics and Code Generation
  • A compiler examines the semantics of a
    programming language statement. It analyzes the
    meaning of the tokens and tries to understand the
    actions they perform.
  • If the statement is meaningless, it is
    semantically rejected. Otherwise it is translated
    into machine language.

39
Example
  • The statement sumab
  • is syntactically correct.
  • But what if the variables are defined as follows
  • char a
  • double b
  • int sum

40
Semantic Records
  • Each nonterminal symbol is associated with a
    semantic record, a data structure that stores
    information about a nonterminal, such as the
    actual name of the object and its data type.

41
Semantic Records (II)
  • Grows gradually.

42
Another Situation
43
Two-Stage Process
  • Semantic analysis a pass over the parse tree to
    determine whether all branches of the tree are
    semantically valid.
  • Code generation the compiler makes a 2nd pass
    over the parse tree to produce the translated
    code.

44
Example
45
Example (contd)

46
Example (contd)
47
Example (contd)
48
Example (contd)

49
Code Optimization
  • To make the code more efficient
  • Local optimization
  • Global optimization
  • Different from programmer optimization with
    compiler tools such as
  • Visual development environments
  • On-line debuggers
  • Reusable code libraries

50
Local Optimization
  • Look at a very small block of instructions and
    try to improve it.
  • Possible approaches
  • Constant evaluation x11
  • Strength reduction xx2
  • Eliminating unnecessary operations

51
Global Optimization
  • Look at large segments of program and decide how
    to improve performance.
  • A much harder problem.
Write a Comment
User Comments (0)
About PowerShow.com