Title: PL/0 and the 655 Project
1 2CIS 655 Project PL/0
- Niklaus Wirth, Algorithms Data Structures
Programs, 1976, Prentice-Hall (ISBN
0-13-022418-9) - PL/0 subset of Pascal
- Illustrated the way the Pascal P-code compiler
built - http//www.cs.rochester.edu/u/www/courses/
254/PLzero/guide/guide.html - 655 Project (100 pts, 40 of total grade)
- Project proposal (10 pts for turning in,
revisable) - Parser
- Intermediate step (non-graded) (but 10 pts for
turning in) - Input in syntax of programming language youre
building a compiler/interpreter for - Some kind of output, maybe with XML markup
- Develop your own test cases
- Freedom to make the project into something youll
enjoy and be proud of
3Traditional 655 Programming Project(s)
- Hybrid compiler / interpreter for small language
with C/Java-style syntax - Transform a high-level language to low-level form
- Reasonable use of tools and software encouraged
- Primarily individual, some 2-person groups
- Build on what you already know (e.g., 560)
- Project done in stages
- Proposals in class, demonstrations, documentation
- Develop your own appropriate tests
- Software to CIS computers using submit command
- Possible alternatives may be proposed
- No Perl implementations (use C or Java)
4655 Project Ideas
- Small language like C or Pascal or Basic
- Mathis has used since 1971 in various forms
- Lisp / Scheme interpreter
- Very similar to other sections of 655
- XML parser, XSLT processor
- Experimental but of current interest
- Project Tests
- Responsible for writing test cases
- Grader will review, not do
- Illustrate all the features youre claiming
- Illustrate all your error checking
- Project you can describe with pride
5655 Project Options
- Encourage you to make this into something youll
enjoy and be proud of - Flexibility probably unusual
- Available resources (books, Internet, etc.)
- Acknowledge their use
- Do significant work of your own
- Many different backgrounds and interests
- Proposals required as a first step you may want
an alternative language or alternative
techniques. - Target machine PL/0 machine (easy to
expand)Java Virtual Machine (JVM) has too many
checks - Evolving write-up and software together
6Project Basics
- Language processor
- Define the syntax rules for simple language you
want to process (PL/0, C subset, Lisp) - Convert from high-level language to low-level
directly machine executable version - Using PL/0 machine and interpreter is easiest
- Documentation about use of your processor(use
cases and user documentation) - Test cases to illustrate your projects
capabilities (use cases to test cases andtest
driven development)
7Project Stages
- Proposal (preliminary write-up) for project
- By e-mail to grader
- Simple parser for simple imperative language
- To exercise submit process
- Simple interpreter
- (step that doesn't have to be turned in)
- Final complete project
- significant write-up electronic submission
- No additional program in Lisp/Scheme
8655 Project Unified Process
- Unified Software Development Process (Rational)
- Unified Modeling Language (UML)
- Only to begin understanding, not required to use
- Unified Process
- Inception Phasebegin understanding the problem
and what you might do - Spiral Approachtry to have some partial version
at each stage - Project Report - proposal - introductory part
- Risk analysis small steps rather than being
overwhelmed, some small test programs
9Possible Approaches to Project
- Do a good job with what you know
- Use Resolve/C to implement compiler
- Extension of what you did in CSE 321 560
- Add skills described in this course
- Use Pascal PL/0 example as a guide
- Implement in Java as talked about in class
- Investigate additional skills
- Use Visual C or Lisp for different language
- Define your own approach to the project
- Project Step 1 Detailed Proposal
- Test Driven Development (based on use cases) for
refinement - Refactoring improving design of existing code
10Test Input
- Its your language, your implementation, and you
know the features and restrictions, therefore - You supply the test input (lots of tests)
- Sample input programs/syntax in your language
- Intermediate results of tokenizing
- Intermediate results of code generation
- Top-level execution
- Tell the grader what he should expect when
running the tests and why you chose what you did
(show off this or that feature, exercise an error
message, clever program in your language) - Illustrate the capabilities of your language and
processor - Syntax error processing not required
- Grader not testing your project, but evaluating
if you adequately tested - Even if code is generated by JavaCC or done with
pre-built classes (StringTokenizer for example) - Build test cases and automatically run them at
some point in build
11(No Transcript)
12Why Study PL/0
- Need to look at large programs
- PL/0 is a classic
- Understand how Pascal (and Algol and Ada) works
- Local variables
- Recursion
- How recursive descent parsing works
- How typical language features added
- Code generation
- Working of a computer interpreter/emulator
- See how everything is brought together
13PL/0 program structure
- Code for body of procedure after declarations of
subprograms (main code is at end of listing) - Initialize keyword arrays, operator symbols,
mnemonics, and so forth - Initialize variables controlling scanning
(getting the individual characters), lexical
analysis (forming tokens), and parsing - Call the ltblockgt recognizing routine
- Note that block ends with a call to listcode
- Call the virtual machine interpreter
- Machine code kept in an array between phases
- Need to add to the output capabilities of PL/0
14Simple Syntax Processing Prerequisites
- 321, 560, 625 basics of processing simple
langs. - 655 to advance and unify that understanding
- (multi-char) symbols are syntax components
- Low-level, read a character at a time and build
symbols / tokens (PL/0 getsym, getch, low-level) - Higher-level implementation language might have
string tokenizer in language (Java,
StringTokenizer) - Compiler generating tools lex/yacc, JavaCC
- Wirth approach to describing syntax graphs as
flow charts for programming a parser recursive
descent - Textbook Chapters 3 4
15Specification of Syntax
- PL/0
- How the nesting of expression, term and factor in
PL/0 work together and generate code - How the nesting of recognition routines has the
effect of static scoping - Project questions and answers
- UML Use Case Modeling
- General Problem of Describing Syntax
- Recursive Descent Parsing
- Attribute Grammars
- Describing the Meanings of Programs
Dynamic Semantics
16Unstated Assumptions
- Input program read top-to-bottom, left-to-right,
with no backtracking - Things declared before they are used
- No redefining at same level
- Inner declarations hidden by nesting
- Inner can locally hide outer declarations
- Other information about the language not
specified with the BNF - Identifier length
- Maximum integer value
- Other restrictions on your compiler
- Symbol table size
- Code array size
- Specify these in your description of your
language processor - Recognize the restrictions youve implied
17Syntax, semantics, language
- Syntax - the form or structure of the
expressions, statements, and program units - Semantics - the meaning of the expressions,
statements, and program units - Sentence - string of characters over some
alphabet (maybe what are usually words) - Language - set of sentences
- Lexeme - lowest level syntactic unit of a
language (e.g., , sum, begin) - Token - category of lexemes (e.g., identifier)
18Language (following Wirth)
- L L ( T, N, P, S )
- Vocabulary T of terminal symbols
- Set N of non-terminal symbols(grammatical
categories) - Set P of productions (syntactical rules)
- Symbol S (from N) called the start symbol
- Language is set of sequences of terminal symbols
that can be generated (directly or indirectly
(thats his points 3 and 4)
19Backus Normal Form (1959)
- Invented by John Backus to describe Algol 58
- BNF is equivalent to context-free grammars
- A metalanguage is a language used to describe
another language. - In BNF, abstractions are used to represent
classes of syntactic structures--they act like
syntactic variables (also called nonterminal
symbols) - e.g. ltwhile_stmtgt -gt while ltlogic_exprgt do
ltstmtgt - This is a rule it describes the structure of a
while statement
20Syntax rules
- A rule has a left-hand side (LHS) and a
right-hand side (RHS), and consists of terminal
and non-terminal symbols - A grammar is a finite nonempty set of rules
- An abstraction (or non-terminal symbol) can have
more than one RHS - ltstmtgt -gt ltsingle_stmtgt begin ltstmt_listgt
end - Syntactic lists are described in BNF using
recursion - ltident_listgt -gt ident ident, ltident_listgt
- A derivation is a repeated application of rules,
starting with the start symbol and ending with a
sentence (all terminal symbols)
21An example grammar
- ltprogramgt -gt ltstmtsgt
- ltstmtsgt -gt ltstmtgt ltstmtgt ltstmtsgt
- ltstmtgt -gt ltvargt ltexprgt
- ltvargt -gt a b c d
- ltexprgt -gt lttermgt lttermgt lttermgt - lttermgt
- lttermgt -gt ltvargt const
22An example derivation
- ltprogramgt gt ltstmtsgt
- gt ltstmtgt
- gt ltvargt ltexprgt
- gt a ltexprgt
- gt a lttermgt lttermgt
- gt a ltvargt lttermgt
- gt a b lttermgt
- gt a b const
23Derivation explanation
- Every string of symbols in the derivation is a
sentential form - A sentence is a sentential form that has only
terminal symbols - A leftmost derivation is one in which the
leftmost non-terminal in each sentential form is
the one that is expanded - A derivation may be neither leftmost nor
rightmost - Parse tree is a hierarchical representation of a
derivation
24Parsing another view
25Static Semantics
- Other information about the language not
specified with the BNF - Identifier length
- Maximum integer value
- Other restrictions on your compiler
- Symbol table size
- Code array size
- Specify these in your description of your
language processor - Recognize the restrictions youve implied
26Unstated Assumptions
- Input program read top-to-bottom, left-to-right,
with no backtracking - Things declared before they are used
- No redefining at same level
- Inner declarations hidden by nesting
- Inner can locally hide outer declarations
27Ambiguity Right Recursive
- A grammar is ambiguous iff if and only if it
generates a sentential form that has two or more
distinct parse trees - If we use the parse tree to indicate precedence
levels of the operators, we cannot have ambiguity - Operator associativity can also be indicated by a
grammar - ltexprgt -gt ltexprgt ltexprgt const (ambiguous)
- ltexprgt -gt ltexprgt const const (unambiguous)
- Left recursive (left associative)(recursive
descent will require right recursive)
28Extended BNF (abbreviations)
- Optional parts are placed in brackets ()
- ltproc_callgt -gt ident ( ltexpr_listgt)
- Put alternative parts of RHSs in parentheses and
separate them with vertical bars - lttermgt -gt lttermgt ( -) const
- Put repetitions (0 or more) in braces ()
- ltidentgt -gt letter letter digit
29BNF / EBNF
- BNF
- ltexprgt -gt ltexprgt lttermgt
- ltexprgt - lttermgt
- lttermgt
- lttermgt -gt lttermgt ltfactorgt
- lttermgt / ltfactorgt
- ltfactorgt
- EBNF
- ltexprgt -gt lttermgt ( -) lttermgt
- lttermgt -gt ltfactorgt ( /) ltfactorgt
30Syntax Graphs
- Put the terminals in circles or ellipses and put
the non-terminals in rectangles - Connect with lines with arrowheads
- e.g., Pascal type declarations
31Wirths Rules
- B1 Reduce system of syntax graphs to a few of
reasonable size (not consistent with modern Java) - B2 Translate each graph to a procedure according
to subsequent rules - B3 Sequence of elements translates to
- begin T(S1) T(S2) T(Sn) endor T(S1)
T(S2) T(Sn) - procedure TSx()begin TS1() getsym()
TS2() getsym() - end
32lttermgt -gt ltfactorgt ( /) ltfactorgt
-
- Pascal commentbegin factor while sym in
times, slash do begin mulop
sym getsym factor gen_proper_op end
end
33lttermgt -gt ltfactorgt ( /) ltfactorgt
-
- void term()
- factor() / parse the first factor/
- while (next_token aster_code
- next_token slash_code)
- lexical() / get next token /
- factor() / parse the next factor /
-
-
34Recursive Descent Parsing
- Parsing - constructing a parse / derivation tree
for a given input string - Lexical analyzer is called by the parser
- A recursive descent parser traces out a parse
tree in top-down order it is a top-down parser - Each non-terminal in the grammar has a subprogram
associated with it the subprogram parses all
sentential forms that the nonterminal can
generate - The recursive descent parsing subprograms are
built directly from the grammar rules - Recursive descent parsers cannot be built from
left-recursive grammars
35PL/0 Program Structure
- Initialize keyword arrays, operator symbols,
mnemonics, and so forth - Initialize variables controlling scanning
(getting the individual characters), lexical
analysis (forming tokens), and parsing - Call the ltblockgt recognizing routine
- Note that block ends with a call to listcode
- Call the virtual machine interpreter
- Machine code kept in an array between phases
- Need to add to the output capabilities of PL/0
36Blocks and Static Scoping
- Blocks are different than sequences of statements
or compound statements - Blocks can include declarations
- Sort of like a single use subprogram used and
defined here - Where can blocks appear?
- Ada almost anywhere a statement could be
- Pascal only as bodies of procedures
- Java inner classes
37Data Specific to a Procedure
- To be able to return from call
- Program address of its call (return address)
- Address of data segment of caller
- Keep in data segment of procedure as
- RA (return address) DL (dynamic link)
- Location of variables
- Relative address only (since memory dynamic)
- Displacement off base address of appropriate data
segment (locally B register or by descending
chain of static links) - What does static scoping mean here?
38Example of Static Scoping
- void a local variable one void b
local variable two void c
local variable three // beginning
of code for c reference one,
two call b // end of c //
beginning of code for b reference one,
two call c // end of b // beginning of
code for a call b // end of a - a ? b ? c ? b
39Example of Static Scoping
- In a, one is local
- In b, two is local
- In b, one is a single static level out
- In c, three is local
- In c, two is a single static level out
- In c, one is double static levels out
- Then c calls b
- In b, one is still a single static level out
40Block Recognition Processing
- Block(level, symbolTableStartingIndex)
- Page 13, left
- ltblockgt ltconst_declgt ltvar_declgt
ltproc_declgt ltstatement_bodygt - ltproc_declgt procedure ltnamegt ltblockgt
- Recognize inner block
- Block(currentLevel1, currentSymbolTableIndex)
- Jump around decalrations
- tx0 tx tabletx0.adrcx gen(jmp,0,0)
... codetabletx0.adr.acx
tabletx0.adrcx statement() gen(opr,0,0)
return
41Symbol Table and Static Scope
- Variable declaration storage allocated by
incrementing DX (data index) by 1 - Initially DX is 3 to allocate space for the
block mark (RA, DL, and SL) - Symbol table (table)
- enter enter object into table
- Nested in block which determines static scoping
- Recursive calls make table act like a stack
- position - find identifier id in table
- Linear search backward
42Blocks and Scoping
- Nesting blocks does scope
- Restoring symbol table pointers makes symbol
table work like stack - Inner definitions lost to outer contexts
- Idea make symbol table work like a tree(one
branch along a tree looks like a stack)
43PL/0 Virtual Machine
- Section 5.10 (page 6 of handout)
- Stack machine primary data store is stack
- push, pop, insert or retrieve from within
- Operations on top of stack (add, test, etc.)
- Program store array named code
- Unchanged during interpretation
- I instruction register
- P program address register
- Data store array named S stack
44Example of Static Scoping (Repeat)
- void a local variable one void b
local variable two void c
local variable three // beginning
of code for c reference one,
two call b // end of c //
beginning of code for b reference one,
two call c // end of b // beginning of
code for a call b // end of a - a ? b ? c ? b
45Example of Static Scoping (Repeat)
- In a, one is local
- In b, two is local
- In b, one is a single static level out
- In c, three is local
- In c, two is a single static level out
- In c, one is double static levels out
- Then c calls b
- In b, one is still a single static level out
46Stack of PL/0 Machine (Fig. 5.7)
DL RA SL
DynamicLink
A local vars
B local vars
C local vars
B
StaticLink
B local vars
T
47Data Specific to a Procedure (again)
- To be able to return from call
- Program address of its call (return address)
- Address of data segment of caller
- Keep in data segment of procedure as
- RA (return address) DL (dynamic link)
- Location of variables
- Relative address only (since memory dynamic)
- Displacement off base address of appropriate data
segment (locally B register or by descending
chain of static links) - What does static scoping mean here?
48Machine Definition
- Jave5 enum example of machine operations
- PL/0 virtual machine emulator
- But how do high-level (programming language)
structures relate to low-level (machine level)
structures? - Control structures
- Data structures
- Program component re-combination
49Compilation Mapping
- Input c a b
- Output
- Symbol table c and corresponding location
- save to later generate store
- Symbol table a and corresponding location
- save to generate addition
- Symbol table b and corresponding location
- end of statement, generate saved operations
- Stack oriented machine code load a, load b,
add, store c
50PL/0 Code Generation
- (page 7 of handout)
- Addresses are generated as pairs of numbers
indicating the static level difference and the
relative displacement within a data segment. - But how does the compiler figure this out?
- PL/0 code
- Other questionhow does PL/0 handle forward
references?
51PL/0 Machine Commands
- LIT load numbers (literals) onto the stack
- LOD fetch variable values to top of stack
- STO store values at variable locations
- CAL call a subprogram
- INT allocate storage by incrementing stack
pointer (T) - JMP - transfer of control
- (new program address - P)
- JPC - conditional transfer of control
- OPR - arithmetic and relational operators
52More on PL/0 Code Generation
- fct (lit, opr, lod, sto, cal, int, jmp, jpc)
- instruction packed record f fct function
code l 0 .. levmax level a 0 ..
amax displacement address end - procedure gen (x fct y, z integer)
- begincodecx.f x codecx.l y
codecx.a zcx cx 1 - end
- procedure listcodevar i integer
- begin list code generated for this bockfor i
cx0 to cx-1 do writeln(i, mnemoniccodei.f
5, codei.l 3, codei.a 5) - end
53PL/0 Interpreter
- t0 b1 p0 initialize
registersS10 s20 s30 (initialize
memoryrepeat instruction fetch
loop icodep pp1 With i do case f of
decode instruction lit begin tt1
sta end opr case a of 1 st
-st 2 begin tt-1 st st
st1 end end jmp pa sto
begin sbase(l)ast writeln(st) tt-
1 end cal begin generate new block
mark st1base(l) st2b st3
p bt1 pa end enduntil p0
not a good way to end
54Project Virtual Machine
- Can use the design of the PL/0 one
- Operations in PL/0 are integer orientedyou
probably want to add to this - Can also use other machine designs
- Hybrid approach compiles to intermediate form,
then interprets that - Direct interpretation possible if clearly
proposed - Idea add output whenever computation done
- Idea build some messages in that could be output
with new opr instructions
55Adding to PL/0
- Predefined variable names
- New operator
- Built-in function
- Pre-defined function
- New statement type
56Adding Predefined (Variable) Names
- procedure block has 2 parameters
- lev (the nesting level for the block)
- tx (starting index for the symbol table)
- The nested procedure enter is what puts symbols
(variable names) into the symbol table - Right-side page 14 of handout
- Initialize symbol table
- Make initial call of block non-zero table index
- Can initialize or do other things not normal in
the user visible input language
57Adding New Operator
- Add to getsym to recognize new symbol
- Look in condition, expression, term, factor
- Is new operator parallel to one of those
operators? - Basically another option in code generation
- If not like existing operators,add new syntactic
construct. - New action add to PL/0 machine
- Generate new instruction gen(opr,0,14)
(square) - Implement new instruction functionality(page 14,
left-side)14 begin st stst end - Add it into list of mnemonics
58Adding Built-in Function
- Design new indicator for symbol table
- Put function name in symbol table
- Parser will recognize as defined name(there will
be no way for user to put in) - In termif symident then
iposition(id) case kind of
constant variable procedure
built-in begin getsym left paren
expression
getsym right paren
gen (opr, 0, new-thing) end
59Adding Pre-defined Function
- Another approach
- Put entry into symbol table
- Make it a regular procedure
- Initialize the code array to represent the code
that might have been generated - Adding - New statement type
- Add new syntax into body of statement(page 12)
- Look at call as an example
- Syntactic sugar
60How To Start on the Project
- Get your tokenizer working
- This is the getsym procedure of the Pascal
version of PL/0 distributed in class - Can also be done with classes in C and Java
- Read in sample programs in the language youre
trying to compile and output the tokens (with
some other information) - Benefits
- Written some programs in your language
- Can leave the output statements for debugging
61Requirements Analysis
- Part of Object-Oriented Analysis Design (O-O
AD) - Collect potential requirements
- Ask users (or think about) how users will use the
system - For incremental development, rank normal and
exceptional flows - Use case diagrams UML (Unified Modeling
Language) for talking - Use case documents for details
- Non-functional (and other) requirements(security,
background tracing, existing s/w) - Use Case to Use Your Processor
- Command line approach
- Graphical user interface
- Use Cases ? Boundary Classes
- Boundary class façade pattern
- Implementing those is successive refinement
62UML Use Case Modeling
- program actions from the user viewpointe.g.,
directions for the grader of how to execute your
program - begin developing different aspects of the program
and planning its eventual actions as soon as
possible - Boundary class is a façade for interaction
user
command lineinterpreter
compilation
execution
63Software Development Steps
- Narrative requirements from the users
- Requirements analysis to be sure needs are well
understood - Use cases user perspective
- Use case analysis from the development
perspective - Use case analysis from a testing perspective
- Identification of boundary classes in the design
- Determination of business rules and logic
- Design of supporting classes (behind façade)
- rest of the design and development process
64 - PL/0 and the 655 Project
- Lisp and then XML/XSLT ?