Compiler - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Compiler

Description:

Uses common words rather than abbreviated mnemonics. C, C , Java, Fortran, QuickBasic ... Converts mnemonics (assembly code) into object code. Two- pass assembler : ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 52
Provided by: bagginsNo
Category:

less

Transcript and Presenter's Notes

Title: Compiler


1
Introduction
  • Compiler

2
References
  • Textbook
  • Compilers Principles, Techniques, and Tools,
    Alfred V.Aho, Ravi Sethi, and Jeffrey D. Ullman,
    Second Edition Addison-Wesley ,2007
  • References
  • Programming Language Processors in Java.
    Compilers and Interpreters, D.A. Watt and D.F.
    Brown, Pearson Education Ltd.
  • Assessment 25 Coursework 75 Final Exam

3
Objectives
  • To introduce principles, techniques, and tools
    for compiler construction
  • To obtaining the knowledge what a compiler does
    and how to build one.

4
Course Outline
  • 1. Introduction, Structure of a Compiler
  • 2. Lexical Analysis Tokens, Regular Expressions
  • 3. Parsing Context-free grammars, predictive
  • 4. Abstract Syntax Semantic actions, abstract
    parse trees
  • 5. Semantic Analysis Symbol tables, bindings,
    type-checking
  • 6. Stack Frames Representation and Abstraction
  • 7. Intermediate Code Representation trees,
    translation

5
Why we need to know compilers?
  • All software is written in a programming
    language.
  • Learning about compilers will teach you a lot
    about the programming languages you already know.
  • Seeing the development of a compiler gives you a
    feeling for how programs work.
  • A great example of interplay between theory and
    practice.
  • Many algorithms and models you will use in
    compilers are fundamental, and will be useful to
    you elsewhere
  • automata, regular expressions (lexing)
  • context-free grammars, trees (parsing)
  • hash tables (symbol table)
  • dynamic programming, graph coloring (code gen.)

6
Why study compilers?
  • Compilers Improve Programming Productivity
  • To enhance understanding of programming languages
  • To have an in-depths knowledge of low-level
    machine executables
  • To write compilers and interpreters for various
    programming languages and domain-specific
    languages Examples Java, JavaScript, C, C, C,
    Modula-3, Scheme, ML, Tcl/Tk, Database Query
    Lang., Mathematica, Matlab, Shell-Command-Language
    s, Awk, Perl, your .mailrc file, HTML, TeX,
    PostScript, Kermit scripts, .....
  • To learn interesting compiler theory and
    algorithms.
  • To learn the beauty of programming in modern
    programming language
  • To learn how to use them well.
  • To learn how to write them.
  • To illuminate programming language design. .As
    an example of a large software system.
  • To motivate interest in formal language theory.

7
Computer Organization
Application
Compiler
Operating System
Hardware
8
History, Programming Languages
  • Machine coding (binary programming punch holes)
    (first generation)
  • The computers native language, binary digits
    (0s, 1s)
  • 0100 0001 0110 1110 0100 0001
  • 0001 0010 1100 0100 0000 1101
  • Programming in machine code is
  • very slow,
  • error prone,
  • requires a detailed knowledge of the relevant
    computer architecture,
  • difficult to understand other peoples code,
  • code becomes obsolete if the machine is
    changed.
  • Assembly Language (second generation)
  • One-to-one correspondence to machine language
  • MOV AX, 5h
  • MOV DX, 3h
  • ADD A
  • Assembler translates assembly language programs
    into machine language

9
History, Programming Languages (High-Level
Languages)
  • Procedural Languages (third generation)
  • Instructions translate into machine language
    instructions
  • Uses common words rather than abbreviated
    mnemonics
  • C, C, Java, Fortran, QuickBasic
  • A 3
  • B A 2 - 1
  • D A / B A5
  • Compiler - translates the entire program at once
  • Interpreter - translates and executes one source
    program statement at a time

10
History, Programming Languages (High-Level
Languages)
  • Nonprocedural Languages (fourth generation)
  • Allows the user to specify the desired result
    without having to specify the detailed procedures
    needed for achieving the result.
  • Standard Query Language (SQL)
  • Natural Language Programming Languages
  • (fifth generation (intelligent) languages).
  • Translates natural languages into a structured,
    machine-readable form

11
High-Level Languages
  • Expressions such as , -, , /
  • Data Types simple types (e.g. Boolean, int,
    float) as well as composite structures (records)
    and arrays
  • - can be defined by the programmer
  • Control Structures allow programming of
    selective computation as well as iterative
    computation
  • Declaration introduce identifiers to indicate
    const. Values, variables, procedures etc.
  • Abstraction separation of concerns i.e. break a
    problem up and deal with sub-sets
  • Encapsulation (data abstraction) grouping
    relevant relations and selectively hiding
    specific information (e.g. classes)

12
Why high-level languages?
  • Understandability (readability)
  • Naturalness (languages for different
    applications)
  • Portability (machine-independent)
  • Efficient to use (development time)

13
Language Processors
  • Editors ( to enter text) they can process text
    based on the logical structure of the text.
  • Translator translates text from one language to
    another
  • Compiler translates from a high-level language
    to low-level language
  • Interpreter takes a text (in a particular
    language) and runs it immediately
  • Assembler translates from an assembly language
    into the corresponding machine code. assembly
    languages easier to produce as output and is
    easier to debug

14
Language Processors
  • Simulator, Emulator Machine code is interpreted
    ? machine code
  • e.g. Simulate a processor on an
    existing processor.
  • Preprocessor Extended high-level language ?
    high-level Language. Preprocessors Sometimes
    called before the actual compilation process e.g.
    Remove comments, include the text of other
    files, and perform macro substitutions (replace
    shorthand notation with longer piece of text)
  • Natural language translators
  • e.g. Chinese ? English

15
Assembler
  • The Assembler is responsible for translating the
    target codeusually assembly codeinto an
    executable machine code.
  • The assembly code is a mnemonic version of
    machine code in which
  • 1. Names are used instead of binary codes for
    operations (Code Table).
  • 2. Names are used for operands instead of memory
    locations (Symbol Tables).
  • Assembly level programming
  • - improves the productivity,
  • - is less error prone,
  • - is somewhat easier to understand,
  • - code is as efficient as the machine code.
  • but
  • - it requires detailed knowledge of a computer
    architecture,
  • - code is machine dependent,
  • - code is obsolete when a machine is changed.
  • It became soon apparent that we need to do the
    programming in a machine independent language
    (HLL)

16
Compilers Interpreters
  • Interpreters are another class of translators
  • Compiler translates a program once and for all
    into target language. C
  • Interpreter effectively translates a source
    program every time it is run. Basic
  • Compilers and interpreters (highbred) are used
    together Java
  • Java compiled into Java byte code,
  • byte code interpreted by a Java Virtual Machine
    (JVM).

17
What is a Compiler?
  • A compiler is program that reads a program
    written in one language (source language) and
    translates it into an equivalent program in
    another language (target language) .

Compiler
Target Program
Source Program
Error
18
Compiler
  • Source programs Many possible source languages,
    from traditional, to application specific
    languages.
  • Programming languages (High-level)
  • Modeling languages
  • Document description languages
  • Database query languages
  • Target programs Another programming language,
    often the machine language of a particular
    computer system.
  • High-level programming language
  • Low-level programming language (assembler or
    machine code)
  • Application-specific target language
  • Error messages Essential for program development

19
Do we need Compilers?
  • Machines understand only 1s and 0s. High-level
    languages, make it easier for the user to program
    in, but not for the machine to understand.
  • Once the programmer has written and edited the
    program (in an Editor), it needs to be translated
    into machine language (1s and 0s) before it can
    be executed.
  • compilers are used to do this conversion

20
Where are compilers used?
  • Implementation of programming languages
  • C, C, Java, Lisp, Prolog, SML, Haskell, Ada,
    Fortran.
  • Document processing
  • DVI ? PostScript,
  • Word documents ? PDF
  • Natural language processing
  • NL ? database query language ? database commands
  • Hardware design
  • silicon compilers, CAD data ? machine
    operations, equipment lists
  • Report generation
  • CAD data ? list of parts,
  • All kinds of input/output translations
  • various UNIX text filters, . . .

21
Interpreter
  • Given the program source code and the run-time
    input, Interpret the source code directly, i.e.
    parse and simulate it, statement by statement
    (syntax-directed interpretation)
  • UNIX shells (command line interpreter)
  • Early interpreters for BASIC, LISP, APL
  • Good for debugging
  • Very slow But ok for small scripts

22
Compiler / Translator and Interpreter
  • A translator is used to produce an equivalent
    program in another language (e.g. from C to
    Pascal)
  • Compiler is a translator that generally takes in
    a higher level language (e.g. C) and transforms
    it into a low level language
  • (usually object or machine code).
  • Compiler/Translator produce the entire output
    code before executing
  • Interpreter compiles and executes a statement at
    a time before moving on to the next statement

23
Compiler / Translator and Interpreter
compiler
  • The machine-language target program produced by a
    compiler is usually much faster than an
    interpreter at mapping inputs to outputs .
  • An interpreter, however, can usually give better
    error diagnostics than a compiler
  • because it executes the source program statement
    by statement.

24
Interpreters versus Compilers
The tradeoffs between compilation and
interpretation?
  • Compilers typically offer more advantages when
  • programs are deployed in a production setting
  • programs are repetitive
  • the instructions of the programming language are
    complex
  • Interpreters typically are a better choice when
  • we are in a development/testing/debugging stage
  • programs are run once and then discarded
  • the instructions of the language are simple
  • the execution speed is overshadowed by other
    factors
  • e.g. on a web server where communications costs
    are much higher than execution speed

25
Hybrid compiler / interpreter
26
How does Java work?
A benefit of this arrangement in Java is that
bytecodes compiled on one machine can be
interpreted on another machine, perhaps across a
network.
27
Program execution
  • Three phases of execution
  • Compile time"
  • 1. Source program ? object program (compiling)
  • 2. Linking, loading ? absolute program
  • "Run-time
  • Large programs are often compiled in pieces, so
    the relocatable machine code may have to be
    linked together with other relocatable object
    files and library files into the code that
    actually runs on the machine.
  • The linker resolves external memory addresses,
    where the code in one file may refer to a
    location in another file.
  • The loader then puts together all of the
    executable object files
  • into memory for execution
  • 3. Input ? output

28
Loader and Linker
  • The machine code generated by the Assembler can
    be executed only if allocated in Main Memory
    starting from the address 0.
  • Since this is not possible the Loader will alter
    the relocatable addresses of the code to place
    both instructions and data in the right place in
    Main Memory.
  • The starting free address, L, in Main Memory to
    allocate the program is called the Relocation
    Factor.
  • The Loader must
  • 1. Add to each relocatable address the relocation
    factor L
  • 2. Leave unaltered each absolute addresse.g.,
    address of I/O devices.
  • The Linker links together the different
    files/modules of a single program and, possibly,
    adds library files.

29
The phases of a compiler
Lexical analyser
Syntax analyser
Symbol table manager
Error Handler
Semantic analyser
Intermediate code generator
Code optimizer
Code generator
30
Analysis-Synthesis Model of Compilation
  • There are two parts of compilation
  • Part1, Analysis breaks up the source program
    into constituent pieces and creates an
    intermediate representation of the source
    program.
  • Part2, Synthesis constructs the desired target
    program from the intermediate representation. It
    requires the most specialized techniques

31
Part1, Analysis of the Source Program
  • Analysis consists of three phases
  • Lexical (Linear or Scanning) read from
    left-to-right and grouped into tokens that are
    sequences of characters having a collective
    meaning.
  • Syntax Analysis (Hierarchical or Parsing)
    characters or tokens are grouped hierarchically
    into nested collections with collective meaning.
  • Semantic Analysis certain checks are performed
    to ensure that the components of a program fit
    together meaningfully

32
Lexical Analysis (Linear Analysis/ Scanning)
  • Input Sequence of characters
  • Output Tokens (basic symbols, groups of
    successive characters which belong together
    logically).
  • Translate the input program, entered as a
    sequence of characters, into a sequence of words
    or symbols (tokens). For example, the keyword for
    should be treated as a single entity, not as a 3
    character string.
  • position initial rate 60
  • The assignment statement would be grouped into
    the following tokens
  • 1. The identifier position
  • 2. The assignment symbol
  • 3. The identifier initial
  • 4. The plus sign
  • 5. The identifier rate
  • 6. The multiplication sign
  • 7. The number 60
  • Note the blank separating the characters of
    these tokens would normally be eliminated during
    lexical analysis

33
Lexical Analysis
S o m e o n e b r e a k s t h e i c e
final initial rate 60
Lexical Analysis
Lexical Analysis
id1 id2 id3 60
Someone breaks the ice
34
Syntax Analysis (Hierarchical Analysis or
Parsing)
  • Input Sequence of tokens
  • Output Parse tree, error messages
  • It involves grouping the tokens of the source
    program into grammatical phrases that are used
    by the compiler to synthesize output. Usually,
    the grammatical phrases of the source program are
    represented by a parse tree such as the
    following
  • Determine the structure of the program, for
    example, identify the components of each
    statement and expression and check for syntax
    errors.

35
Syntax Analysis
Someone breaks the ice
id1 id2 id3 60
Syntax Analysis
Syntax Analysis
sentence
subject
verb
object
Someone breaks the ice
36
Semantic Analysis
  • Input Parse tree symbol table
  • Output annotated tree (abstract tree with
    attributes) symbol table variables information on
    their type ...
  • Checks the source program for semantic errors and
    gathers type information for subsequent code
    generation phase
  • It uses the hierarchy structure determined by the
    syntax-analysis phase
  • Check that the program is reasonable, for
    example, that it does not include references to
    undefined variables.
  • An important component of semantic analysis is
    type checking

37
Semantic Analysis
Someone plays the piano
(meaningful)
Semantic Analysis
The piano plays someone
(meaningless)
38
Part2, Synthesis
  • Internal form
  • Intermediate Code Generation as a program for an
    abstract machine. It should be easy to produce
    and easy to translate into the target program.
  • Internal form, hopefully improved
  • Code Optimization attempts to improve the
    intermediate code. The program can be fixed
    during the code optimization phase.
  • Machine code/assembly code Generation memory
    locations are selected for each of the variables
    used by the program. Intermediate instructions
    are each translated into a sequence of machine
    instructions that perform the same task. A
    crucial aspect is the assignment of variables to
    registers.

39
Intermediate Code Generation
Intermediate Code Generation
temp1 i2r ( 60 ) temp2 id3 temp1 temp3
id2 temp2 id1 temp3
40
Code Optimization
temp1 i2r ( 60 ) temp2 id3 temp1 temp3
id2 temp2 id1 temp3
Code Optimization
temp1 id3 60.0 id1 id2 temp1
41
Code Optimization
temp1 id3 60.0 id1 id2 temp1
Code Generator
MOVF rate, R2 MULF 60, R2 MOVF initial, R1
ADDF R2, R1 MOVF R1, position
42
Symbol Table
  • Help for other phases during compilation
  • A symbol table is a data structure containing a
    record for each identifier, with fields for the
    attributes of the identifier. The data structure
    allows us to find the record for each identifier
    quickly and to store or retrieve data from that
    record quickly.

43
Error Handler
  • Discover an error.
  • Write an error message.
  • Correct the error (or guess, very difficult!)
  • Restart from the error (try to continue)
  • Each phase can encounter errors. However, after
    detecting an error, a phase must somehow deal
    with that error, so that compilation can proceed,
    allowing further errors in the source program to
    be deducted.

44
Examples of error messages
  • Lexical analysis
  • Faulty sequence of characters
    which does not result in a token,
  • e.g.Ö, 5EL, K, string
  • Syntax analysis
  • Syntax error (e.g. missing semicolon), (4 (y
    5) - 12))
  • Semantic analysis
  • Type conflict, e.g. HEJ5
  • Code optimization
  • Uninitialized variables, anomaly detection.
  • Code generation
  • Too large integers, run out of memory.
  • Table management
  • Double declaration, table overflow.
  • A good compiler finds an error at the earliest
    occasion.
  • Usually, some errors are left to run time array
    index out of bounds

45
Inside the Compiler
Sequence of character
scanner
Lexical Analysis
sequence of tokens
Syntactic Analysis/ Parsing
parser
Abstract Syntax Tree (AST)
Contextual Analysis/ checking Static Semantics
checker
verified/ annotated AST
Optimization and Code Generation
Optimizer code generator
46
Language Processing System
Performs Macro-processing, File inclusion,
Rational reprocessor, Language extension
skeletal source program
Converts mnemonics (assembly code) into object
code. Two- pass assembler 1. denote storage
locations for identifiers in symbol table 2.
translate code into machine code, translate
locations into addresses
Preprocessor
source program
Split into 6 phases. Produces assembly code. Some
compilers include the assembler too.
Compiler
target assembly program
Assembler
Reads file, placing relocatable addresses into
proper locations in memory
Links other object library files with object
code
relocatable machine code
Loader/Linker
47
The Phases of a Compiler
48
Compiler Pass
  • compiler often finds it convenient to process the
    entire source program several times before
    generating code
  • Each of these repetitions is called a pass
  • A collection of phases is done only once (single
    pass) or multiple times (multi pass)
  • Single pass usually requires everything to be
    defined before being used in source program
  • Multi pass compiler may have to keep entire
    program representation in memory
  • A multi pass compiler makes several passes
    over the program. The output of a preceding
    phase is stored in a data structure and used by
    subsequent phases.

49
Single Pass Compiler
A single pass compiler makes a single pass over
the source text, parsing, analyzing and
generating code all at once.
Dependency diagram of a typical Single Pass
Compiler
Compiler Driver
calls
Syntactic Analyzer
calls
calls
Contextual Analyzer
Code Generator
50
Compiler passes
Dependency diagram of a typical Multi Pass
Compiler
Compiler Driver
calls
calls
calls
Syntactic Analyzer
Contextual Analyzer
Code Generator
51
Type checking, identify operators operands
Decompose statement into tokens
Lexical analyser
Detects errors, Reports errors
Parsing, check order of tokens with grammar,
create Abstract Syntax Tree
Syntax analyser
Symbol table manager
Error Handler
Semantic analyser
Stores record for each identifier and its
attributes
Improve speed, efficiency
Intermediate code generator
Code optimizer
Generates final assembly code
First translation create temp. sub-result
variables
Code generator
Write a Comment
User Comments (0)
About PowerShow.com