Title: Compiler Design Chapter 4
1Compiler Design - Chapter 4
Abstract Syntax - Semantic Actions
Abstract Syntax Trees
2Semantic Actions in a Compiler
- Compiler must
- Recognize if a sentence belongs to a
grammar/language - then do something with that sentence using
semantic actions - Recursive Descent Parser semantic actions are
interspersed with control flow of parsing
actions - JavaCC semantic actions are fragments of Java
code attached to productions
3Semantic Actions in a Compiler
- For each terminal/non-terminal symbols, we
associate a type of semantic value representing
phrase derived from that symbol
Token (terminal)
ID carries type string NUM carries type int
ID(match0) NUM(3)
Semantic values
4Semantic Action in a Compiler
Production
Values ofmatched terminals and nonterminals
Semantic Action Return a value whose type
is associated with A
5Semantic Actions in Recursive Descent Parser
- Needs type of non-terminals and tokens
- Assume a lookup table mapping identifiers (e.g.,
variables) to their values (integers)
6Semantic Actions in Recursive Descent Parser
7Automatically Generated Parsers
- JavaCC parser specification
- set of grammar rules, each annotated with a
semantic action (Java statement) - executes action when parser reduces rule
- JavaCC Comparison Tools - JTB
- generates syntax tree classes and inserts
semantic actions into grammar to build syntax
trees - syntax trees supported by generated code not as
abstract as one would desire - SableCC
- no action code attached to grammar rules
- automatically generates syntax tree classes
- generated parser will build syntax trees
8JavaCC version
- Grammar in JavaCC
- S -gt E
- E -gt T ( T - T)
- T -gt F ( F / F)
- F -gt id num ( E )
- Note E gt T E E -gt T E - T E e
9Concrete Syntax (Parse) Trees
- Separating syntax (parsing) and semantics
- Improves modularity
- Parse Tree
- Data structure traversed in later compiler
phases
- Concrete Syntax Tree
- One leaf node per terminal symbol
- One internal node per non-terminal symbol
- Inconvenient to use directly
- Depend too much on grammar structure
- Removal of left-recursion and left-factoring
introduces extra non-terminals and productions - Many punctuation symbols redundant for semantic
analysis
10A Simple Example
- Consider the grammar
- E ? num ( E ) E E
- Input string
- 5 (2 3)
- After lexical analysis
- a list of tokens (with associated semantic
values) - num5 ( num2 num3 )
- During parsing we build a parse tree
11Example of Concrete Parse Tree
E
- Traces the operation of the parser
- Does capture the nesting structure
- But too much information
- Parentheses
- Single-successor nodes
E
E
num5
(
E
)
E
E
num2
num3
12Abstract Syntax (Parse) Trees
- Abstract Syntax
- Clean interface between parser and later
compiler phases - Abstract Syntax Tree
- Represents structure of phrases with all parsing
issues resolved (by concrete parse tree) - No semantic interpretation
13Example of Abstract Syntax Tree
PLUS
PLUS
2
5
3
- Also captures the nesting structure
- But abstracts from the concrete syntax
- more compact and easier to use
- What does semantic analysis care about?
14Dependency Graph
A value (with associated type) must be computed
after all its successors in the dependency graph
have been computed
E
E2
E1
num5
5
(
E3
)
E4
E5
num2
2
num3
3
15Concrete Syntax vs. Abstract Syntax
Concrete Syntax
Meant for Parsing
Abstract Syntax
Meant for building abstract syntax tree for
semantic analysis
16JavaCC grammar with semantic actions
Concrete Syntax
Concrete for parsing
Abstract for semantic analysis
Abstract Syntax
17Abstract Syntax Classes
- Terminal - abstract classes
- Production subclasses
- Each symbol in RHS of rule field in class
- Each class eval function that returns the value
of the represented expression
18 Positions
- One-pass Compiler
- Lexical Analysis, Parsing and Semantic Analysis
done simultaneously - If type error must be reported current
position of the lexical parser approximation
of source position of the error - Current Position in a global variable is kept
error message prints value of this variable
19Positions
- Compiler that uses abstract-syntax tree data
structures - Parsing and Semantic Analysis do not have to be
done in one pass - Complicates the production of error messages
- Lexer reaches EOF before semantic analysis
starts - Current position of lexer (EOF) not useful to
report line number of semantic error - Source file position of each node in abstract
syntax tree must be remembered to report
positions of semantic errors
20Remembering Positions
- Use pos variables position in source file from
which abstract syntax were derived - Useful error messages can then be created
- Lexer must pass start position and end
position of each token to parser - Abstract Syntax Tree Types (e.g. Exp)
- can be augmented with a position field
- constructor takes a pos argument to initialize
this field - position of leaf nodes from positions of
tokens from lexical analyzer - position of internal nodes from position of
subtrees tedious but straight-forward