Pythons Implementation - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Pythons Implementation

Description:

Compile the source string (a Python module, statement or expression) ... compile; if absent or zero these statements do influence the compilation, ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 31
Provided by: paulpr5
Category:

less

Transcript and Presenter's Notes

Title: Pythons Implementation


1
Pythons Implementation
2
The Python Interpreter
Parser
Compiler
Virtual Machine
Parse Tree
Code objects
3
Definitions
  • Raw code text file or string with Python code
  • Parser Parses raw code and generates parse tree
  • Compiler Walks parse tree and generates code
    objects (bytecodes plus metadata)

4
How Python Handles Modules
  • When a Python script loads a module, Python
  • parses the file
  • Compiles in-memory
  • Generates code objects.
  • Saves a backup to a .pyc (with an 8-bit header)
  • This will optimize further loading by eliminating
    the parsing and compiling steps.
  • Smarter than Java!!!

5
How does Python load a .pyc?
  • gtgtgt import sys
  • gtgtgt file sys.path1 '/string.pyc
  • gtgtgt f open(file, 'r')
  • gtgtgt import marshal
  • gtgtgt magic f.read(4)
  • gtgtgt magic
  • '\002\231\231\000
  • gtgtgt mtime f.read(4)
  • gtgtgt mtime
  • '\326\360-
  • gtgtgt c marshal.load(f)
  • gtgtgt c
  • ltcode object ? at 100e2c74, file
    "/ufs/guido/lib/python/string.py", line 0gt
  • http//www.python.org/search/hypermail/python-1994
    q1/0308.html

6
The Parser
  • Python uses an ancient parser generation
    technology called pgen.
  • This is in the parser directory.
  • Pgen has no look-ahead at all.
  • Positive spin this helps keep Python simple
  • Before building Python you build pgen and pgen
    builds the Python parser.

7
Pgen
  • You feed a grammar (grammar/grammar) into pgen
  • It generates two files
  • Python/graminit.c" gets the grammar as a bunch
    of initialized data
  • Include/graminit.h" gets the grammar's
    non-terminals as defines.

8
What does the parser do?
  • Reads text files or strings.
  • Generates a parse tree
  • Python documentation often wrongly calls this an
    Abstract Syntax Tree
  • Well talk later about what an AST really is

9
The Parser Module
  • Python gives access to the parser as a module.
  • For instance you can parse an expression
  • gtgtgt import parser
  • gtgtgt st parser.expr('a 5')
  • gtgtgt print st
  • ltparser.st object at 0x15c7f0gt

10
Printing the parse tree
  • gtgtgt import pprint
  • gtgtgt pprint.pprint(st.totuple())
  • (258,
  • (313,
  • (292,
  • (293,
  • (294
  • (295,
  • (297,
  • (298,
  • (299,
  • (300,
  • (301,
  • (302, (303, (304, (305, (1, 'a'))))),
  • (14, ''),
  • (302, (303, (304, (305, (2,
    '5'))))))))))))))), (4, ''), (0, ''))

11
Rewriting it nicely
  • def rewrite(st_tuple)
  • symbol_or_token st_tuple0
  • rest list(st_tuple1)
  • if(token.ISTERMINAL(symbol_or_token))
  • symbol_or_token token.tok_namesymbol_or
    _token
  • else
  • symbol_or_token symbol.sym_name
    symbol_or_token
  • rest map(rewrite, rest)
  • return symbol_or_tokenrest

12
Looking at the parse tree
  • gtgtgt pprint.pprint(rewrite(st.totuple()))
  • 'eval_input',
  • 'testlist',
  • 'test',
  • 'and_test',
  • 'not_test',
  • 'comparison',
  • 'expr',
  • 'xor_expr',
  • 'and_expr',
  • 'shift_expr',
  • 'arith_expr',
  • 'term', 'factor', 'power', 'atom',
    'NAME', 'a',
  • 'PLUS', '',
  • 'term',
  • 'factor', 'power', 'atom',
    'NUMBER', '5', 'NEWLINE', '',
    'ENDMARKER', ''

13
Parse trees are ugly!
  • They are totally dependent on niggly details of
    the grammar.
  • Changes to the grammar breaks code working on the
    parse tree.
  • They have too many levels that are usually not of
    interest.
  • They keep track of irrelevant details like the
    newlines and indents.

14
The problem with parse trees
  • A parse tree is a record of the rules (and
    tokens) used to match some input text whereas a
    syntax tree records the structure of the input
    and is insensitive to the grammar that produced
    it. Note that there are an infinite number of
    grammars for any single language and hence every
    grammar will result in a different parse tree
    form for a given input sentence because of all
    the different intermediate rules.
  • - http//www.jguru.com/faq/view.jsp?EID814505

15
A better strategy
  • Pass information from the parser to the compiler
    through true Abstract Syntax Trees
  • An abstract syntax tree is a far superior
    intermediate form precisely because of this
    insensitivity and because it highlights the
    structure of the language not the grammar.
  • http//www.jguru.com/faq/view.jsp?EID814505

16
The AST Branch of Python
  • Jeremy Hylton is leading a project to migrate
    Python to an AST model.
  • Once it becomes easier to work with Python
    compiler output, it will be easier to optimize
    Python code and evolve Python
  • It will also become easier to write Python
    compilers and translators.
  • Join the compiler-sig for more information.

17
Aside the Compiler module
  • While we wait for the AST branch to be finished,
    there is an AST module as part of the Compiler
    project
  • The Compiler project is a Python compiler
    re-implemented in Python.
  • It uses the standard Python parser but then
    converts to AST before compilation to bytecodes.
  • The only problem it is really slow.

18
The Compiler Module
  • gtgtgt import compiler
  • gtgtgt compiler.parse("a3")
  • Module(None, Stmt(Discard(Add((Name('a'),
    Const(3))))))
  • gtgtgt compiler.parse("a a 3")
  • Module(None, Stmt(Assign(AssName('a',
    'OP_ASSIGN'), Add((Name('a'), Const(3))))))
  • http//mail.python.org/pipermail/compiler-sig/2002
    -March/000093.html

19
AST of an expression
  • gtgtgt compiler.parse("a a 3")
  • Module(None,
  • Stmt(
  • Assign(
  • AssName('a', 'OP_ASSIGN'),
  • Add(
  • (Name('a'),
  • Const(3))))))

20
Back to the real world
  • In current Python, there are no ASTs. Just parse
    trees (mislabelled as ASTs).
  • gtgtgt import parser
  • really a parse tree!
  • gtgtgt st parser.expr('a 5')
  • gtgtgt code st.compile(Blah')

21
Shortcut to get a code object
  • The compile built-in function generates a code
    object directly from text.
  • gtgtgt help(compile)
  • Help on built-in function compile
  • compile(...)
  • compile(source, filename, mode, flags,
    dont_inherit) -gt code object
  • Compile the source string (a Python module,
    statement or expression)
  • into a code object that can be executed by
    the exec statement or eval().
  • The filename will be used for run-time error
    messages.
  • The mode must be 'exec' to compile a module,
    'single' to compile a
  • single (interactive) statement, or 'eval' to
    compile an expression.
  • The flags argument, if present, controls
    which future statements influence
  • the compilation of the code.
  • The dont_inherit argument, if non-zero, stops
    the compilation inheriting
  • the effects of any future statements in
    effect in the code calling
  • compile if absent or zero these statements
    do influence the compilation,
  • in addition to any features explicitly
    specified.

22
What attributes does this thing have?
  • gtgtgt def getAllAttributes(obj)
  • ... for key in dir(obj)
  • ... print key, repr(getattr(obj, key))
  • ...

23
Interesting Attributes
  • gtgtgt code compile("a5", "Blah", "eval")
  • gtgtgt getAllAttributes(code)
  • co_argcount 0
  • co_cellvars ()
  • co_code 'e\x00\x00d\x00\x00\x17S'
  • co_consts (5,)
  • co_filename 'Blah'
  • co_firstlineno 0
  • co_flags 64
  • co_freevars ()
  • co_lnotab ''
  • co_name '?'
  • co_names ('a',)
  • co_nlocals 0
  • co_stacksize 2
  • co_varnames ()

24
Code objects from functions
  • gtgtgt def foo()
  • ... print "Hello world"
  • ...
  • gtgtgt foo.func_code
  • ltcode object foo at 0098C5A0, file "ltstdingt",
    line 1gt

25
The actual instructions
  • Stored in codeobj.co_code as bytecodes
  • Each byte is a code.
  • We can disassemble code objects and functions
    with the dis module.
  • gtgtgt def foo()
  • ... print "Hello world"
  • ...
  • gtgtgt import dis
  • gtgtgt dis.dis(foo)
  • 2 0 LOAD_CONST 1
    ('Hello world')
  • 3 PRINT_ITEM
  • 4 PRINT_NEWLINE
  • 5 LOAD_CONST 0 (None)
  • 8 RETURN_VALUE

26
Stack machine
  • Python keeps local information on a stack to be
    manipulated (yes, even stackless Python does
    this!)
  • This stack is unrelated to the stack that
    stackless Python has less of. )

27
Decoding the instructions
  • LOAD_CONST    consti
  • Pushes "co_constsconsti" onto the stack.
  • PRINT_ITEM
  • Prints TOS to the file-like object bound to
    sys.stdout. There is one such instruction for
    each item in the print statement.
  • PRINT_NEWLINE
  • Prints a new line on sys.stdout. This is
    generated as the last operation of a print
    statement, unless the statement ends with a
    comma.
  • RETURN_VALUE
  • Returns with TOS to the caller of the function.

28
Understanding the bytecodes
  • 2 0 LOAD_CONST 1 ('Hello world')
  • 3 PRINT_ITEM
  • 4 PRINT_NEWLINE
  • 5 LOAD_CONST 0 (None)
  • 8 RETURN_VALUE
  • gtgtgt print foo.func_code.co_consts
  • (None, 1, 2)

29
Another Example
  • gtgtgt def foo()
  • ... print 1 2
  • ...
  • gtgtgt dis.dis(foo)
  • 2 0 LOAD_CONST 1 (1)
  • 3 LOAD_CONST 2 (2)
  • 6 BINARY_ADD
  • 7 PRINT_ITEM
  • 8 PRINT_NEWLINE
  • 9 LOAD_CONST 0 (None)
  • 12 RETURN_VALUE

30
What does binary_add do?
Write a Comment
User Comments (0)
About PowerShow.com