Title: A brief yacc tutorial
1A brief yacc tutorial
- Saumya Debray
- The University of Arizona
- Tucson, AZ 85721
2Yacc Overview
- Parser generator
- Takes a specification for a context-free grammar.
- Produces code for a parser.
Output C code implementing a parser function
yyparse() file default y.tab.c
Input a set of grammar rules and actions
yacc (or bison)
3Scanner-Parser interaction
- Parser assumes the existence of a function int
yylex() that implements the scanner. - Scanner
- return value indicates the type of token found
- other values communicated to the parser using
yytext, yylval (see man pages). - Yacc determines integer representations for
tokens - Communicated to scanner in file y.tab.h
- use yacc -d to produce y.tab.h
- Token encodings
- end of file represented by 0
- a character literal its ASCII value
- other tokens assigned numbers ? 257.
4Using Yacc
lexical rules
grammar rules
y.output
flex
yacc
describes states, transitions of parser
(useful for debugging)
yacc -v
yacc -d ? y.tab.h
lex.yy.c
y.tab.c
yylex()
yyparse()
tokens
parsed input
input
5int yyparse()
- Called once from main() user-supplied
- Repeatedly calls yylex() until done
- On syntax error, calls yyerror() user-supplied
- Returns 0 if all of the input was processed
- Returns 1 if aborting due to syntax error.
- Example
- int main() return yyparse()
-
6yacc input format
- A yacc input file has the following structure
- definitions
-
- rules
-
- user code
required
optional
Shortest possible legal yacc input
7Definitions
- Information about tokens
- token names
- declared using token
- single-character tokens dont have to be declared
- any name not declared as a token is assumed to be
a nonterminal. - start symbol of grammar, using start
optional - operator info
- precedence, associativity
- stuff to be copied verbatim into the output
(e.g., declarations, includes) enclosed in
8Rules
- Rule RHS can have arbitrary C code embedded,
within . E.g. - A B1 printf(after B1\n) x 0 B2 x
B3 -
- Left-recursion more efficient than
right-recursion - A A x rather than A x A
9Conflicts
- Conflicts arise when there is more than one way
to proceed with parsing. - Two types
- shift-reduce default action shift
- reduce-reduce default reduce with the first
rule listed - Removing conflicts
- specify operator precedence, associativity
- restructure the grammar
- use y.output to identify reasons for the conflict.
10Specifying Operator Properties
- Binary operators left, right, nonassoc
- left '' '-'
- left '' '/'
- right '
- Unary operators prec
- Changes the precedence of a rule to be that of
the token specified. E.g. - left '' '-'
- left '' '/
- Expr expr expr
- expr prec
-
Operators in the same group have the same
precedence
11Specifying Operator Properties
- Binary operators left, right, nonassoc
- left '' '-'
- left '' '/'
- right ''
- Unary operators prec
- Changes the precedence of a rule to be that of
the token specified. E.g. - left '' '-'
- left '' '/'
- Expr expr expr
- expr prec
-
Operators in the same group have the same
precedence
Across groups, precedence increases going down.
12Specifying Operator Properties
- Binary operators left, right, nonassoc
- left '' '-'
- left '' '/'
- right '
- Unary operators prec
- Changes the precedence of a rule to be that of
the token specified. E.g. - left '' '-'
- left '' '/'
- Expr expr '' expr
- '' expr prec ''
-
Operators in the same group have the same
precedence
Across groups, precedence increases going down.
The rule for unary has the same (high)
precedence as
13Error Handling
- The token error is reserved for error
handling - can be used in rules
- suggests places where errors might be detected
and recovery can occur. - Example
- stmt IF '(' expr ')' stmt
- IF '(' error ')' stmt
- FOR
-
Intended to recover from errors in expr
14Parser Behavior on Errors
- When an error occurs, the parser
- pops its stack until it enters a state where the
token error is legal - then behaves as if it saw the token error
- performs the action encountered
- resets the lookahead token to the token that
caused the error. - If no error rules specified, processing halts.
15Controlling Error Behavior
- Parser remains in error state until three tokens
successfully read in and shifted - prevents cascaded error messages
- if an error is detected while parser in error
state - no error message given
- input token causing the error is deleted.
- To force the parser to believe that an error has
been fully recovered from - yyerrok
- To clear the token that caused the error
- yyclearin
16Placing error tokens
- Some guidelines
- Close to the start symbol of the grammar
- To allow recovery without discarding all input.
- Near terminal symbols
- To allow only a small amount of input to be
discarded on an error. - Consider tokens like ), , that follow
nonterminals. - Without introducing conflicts.
17Error Messages
- On finding an error, the parser calls a function
- void yyerror(char s) / s points to an error
msg / - user-supplied, prints out error message.
- More informative error messages
- int yychar token no. of token causing the error.
- user program keeps track of line numbers, as well
as any additional info desired.
18Error Messages example
- include "y.tab.h"
- extern int yychar, curr_line
- static void print_tok()
-
- if (yychar lt 255)
- fprintf(stderr, "c", yychar)
-
- else
- switch (yychar)
- case ID
- case INTCON
-
-
-
- void yyerror(char s)
-
- fprintf(stderr,
- "line d s",
- curr_line,
- s)
- print_tok()
19Debugging the Parser
- To trace the shift/reduce actions of the parser
- when compiling
- define YYDEBUG
- at runtime
- set yydebug 1 / extern int yydebug /
20Adding Semantic Actions
- Semantic actions for a rule are placed in its
body - an action consists of C code enclosed in
- may be placed anywhere in rule RHS
- Example
- expr ID symTbl_lookup(idname)
- decl type_name tval id_list
21Synthesized Attributes
- Each nonterminal can return a value
- The return value for a nonterminal X is
returned to a rule that has X in its body,
e.g. - A X
- X
- This is different from the values returned by the
scanner to the parser!
value returned by X
22Attribute return values
- To access the value returned by the ith symbol in
a rule body, use i - an action occurring in a rule body counts as a
symbol. E.g. - decl type tval 1 id_list
symtbl_install(3, tval) - To set the value to be returned by a rule, assign
to - by default, the value of a rule is the value of
its first symbol, i.e., 1.
1
2
3
4
23Example
- / A variable declaration is an identifier
followed by an optional subscript, e.g., x or
x10 /
- var_decl ident opt_subscript
- if ( symtbl_lookup(1) ! NULL )
- ErrMsg(multiple
declarations, 1) - else
- st_entry symtbl_install(1)
- if ( 2 ARRAY )
- st_entry-gtbase_type
ARRAY - st_entry-gtelement_type
tval -
- else
- st_entry-gtbase_type tval
- st_entry-gtelement_type
UNDEF -
-
-
- opt_subscript INTCON ARRAY
- / null /
INTEGER -
24Declaring Return Value Types
- Default type for nonterminal return values is
int. - Need to declare return value types if nonterminal
return values can be of other types - Declare the union of the different types that may
be returned - union
- symtbl_ptr st_ptr
- id_list_ptr list_of_ids
- tree_node_ptr syntax_tree_ptr
- int value
-
- Specify which union member a particular grammar
symbol will return - token ltvaluegt INTCON, CHARCON
- type ltst_ptrgt identifier
- type ltsyntax_tree_ptrgt expr, stmt
terminals
nonterminals
25Conflicts
- A conflict occurs when the parser has multiple
possible actions in some state for a given next
token. - Two kinds of conflicts
- shift-reduce conflict
- The parser can either keep reading more of the
input (shift action), or it can mimic a
derivation step using the input it has read
already (reduce action). - reduce-reduce conflict
- There is more than one production that can be
used for mimicking a derivation step at that
point.
26Example of a conflict
- Grammar rules
- S ? if ( e ) S / 1 /
Input if ( e1 ) if ( e2 ) S2 else S3 - if ( e ) S else S / 2 /
- Parser state when input token else
- Input already seen if ( e1 ) if ( e2 ) S2
- Choices for continuing
- 1. keep reading input (shift)
- else part of innermost if
- eventual parse structure
- if (e1) if (e2) S2 else S3
- 2. mimic derivation step using
- S ? if ( e ) S (reduce)
- else part of outermost if
- eventual parse structure
- if (e1) if (e2) S2 else S3
shift-reduce conflict
27Handling Conflicts
- General approach
- Iterate as necessary
- Use yacc -v to generate the file y.output.
- Examine y.output to find parser states with
conflicts. - For each such state, examine the items to figure
why the conflict is occurring. - Transform the grammar to eliminate the conflict
28Understanding y.output
S 0 S 0 / epsilon /
Input grammar
y.output file
0 accept S end 1 S 0 S 0 2
1 shift/reduce conflict (shift 1, reduce 2)
on 0 state 1 .
artificial rule, introduced by yacc
rules from input grammar
rule numbers
indicates which state has conflict
information about state 1
29Understanding y.output contd
no. of state with conflict
- 1 shift/reduce conflict (shift 1, reduce 2) on
0
- nature of the conflict
- shift 1 shift and go to state 1
- reduce 2 reduce using rule 2 (S ? ?)
input token on which the conflict occurs
Whats going on?
- havent reached midpoint of input string
- seen 0, looking to process S
- ? if input has 0, continue processing
- ? shift
state 1 S 0 . S 0 S .
- Reached midpoint of input string
- next 0 in input is from second half of string
- if input is 0, reduce using (S ? ?)