Implementation of the Python Bytecode Compiler - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Implementation of the Python Bytecode Compiler

Description:

What to expect from this talk. Intended for developers. Explain key data structures and control ... stmt = For(expr target, expr iter, stmt* body, stmt* orelse) ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 25
Provided by: jhyl5
Category:

less

Transcript and Presenter's Notes

Title: Implementation of the Python Bytecode Compiler


1
Implementation of the Python Bytecode Compiler
  • Jeremy Hylton
  • Google

2
What to expect from this talk
  • Intended for developers
  • Explain key data structures and control flow
  • Lots of code on slides

3
The New Bytecode Compiler
  • Rewrote compiler from scratch for 2.5
  • Emphasizes modularity
  • Work was almost done for Python 2.4
  • Still uses original parser, pgen
  • Traditional compiler abstractions
  • Abstract Syntax Tree (AST)
  • Basic blocks
  • Goals
  • Ease maintenance, extensibility
  • Expose AST to Python programs

4
Compiler Architecture
Tokenizer
Source Text
Tokens
Parser
Parse Tree
AST Converter
AST
Code Generator
Blocks
__future__
Symbol Table
Assembler
bytecode
Peephole Optimizer
bytecode
bytecode
5
Compiler Organization
6
Tokenize, Parse, AST
  • Simple, hand-coded tokenizer
  • Synthesizes INDENT and DEDENT tokens
  • pgen parser generator
  • Input in Grammar/Grammar
  • Extended LL(1) grammar
  • ast conversion
  • Collapses parse tree into abstract form
  • Future extend pgen to generator ast directly

7
Grammar vs. Abstract Syntax
  • compound_stmt if_stmt while_stmt for_stmt
    try_stmt funcdef
  • if_stmt 'if' test '' suite ('elif' test ''
    suite) 'else' '' suite
  • for_stmt 'for' exprlist 'in' testlist '' suite
    'else' '' suite
  • suite simple_stmt NEWLINE INDENT stmt DEDENT
  • test and_test ('or' and_test) lambdef
  • and_test not_test ('and' not_test)
  • not_test 'not' not_test comparison
  • comparison expr (comp_op expr)
  • comp_op 'lt''gt''''gt''lt''ltgt''!''in''no
    t' 'in''is''is' 'not

stmt For(expr target, expr iter, stmt body,
stmt orelse) If(expr test, stmt body,
stmt orelse) expr BinOp(expr left,
operator op, expr right) Compare(expr
left, cmpop ops, expr comparators)
Call(expr func, expr args, keyword
keywords, expr? starargs, expr? kwargs)

8
AST node types
  • Modules (mod)
  • Statements (stmt)
  • Expressions (expr)
  • Expressions allowed on LHS have context slot
  • Extras
  • Slots, comprehension, excepthandler, arguments
  • Operator types
  • FunctionDef is complex
  • Children in two namespaces

9
Example Code
  • L
  • for x in range(10)
  • if x gt 5
  • L.append(x 2)
  • else
  • L.append(x 2)

10
Concrete Syntax Example
  • (if_stmt,
  • (1, 'if'),
  • (test,
  • (and_test,
  • (not_test,
  • (comparison,
  • (expr,
  • (xor_expr,
  • (and_expr,
  • (shift_expr,
  • (arith_expr, (term, (factor,
    (power, (atom, (1, 'x')))))))))),
  • (comp_op, (21, 'gt')),
  • (expr,
  • (xor_expr,
  • (and_expr,
  • (shift_expr,
  • (arith_expr, (term, (factor,
    (power, (atom, (2, '5')))))))))))))),
  • (11, ''),

11
Abstract Syntax Example
  • For(Name('x', Load),
  • Call(Name('range', Load), Num(10)),
  • If(Compare(Name('x', Load), Lt, Num(5)),
  • Call(Attribute(Name('L', Load),
  • Name('append', Load)),
  • BinOp(Name('x', Load), Mult,
  • Num(2)))
  • Call(Attribute(Name('L', Load),
  • Name('append', Load)),
  • BinOp(Name('x', Load), Add,
  • Num(2)))))

12
Our Goal Bytecode
  • 2 0 BUILD_LIST 0
  • 3 STORE_FAST 1 (L)
  • 3 6 SETUP_LOOP 71 (to 80)
  • 9 LOAD_GLOBAL 1 (range)
  • 12 LOAD_CONST 1 (10)
  • 15 CALL_FUNCTION 1
  • 18 GET_ITER
  • gtgt 19 FOR_ITER 57 (to 79)
  • 22 STORE_FAST 0 (x)
  • 4 25 LOAD_FAST 0 (x)
  • 28 LOAD_CONST 2 (5)
  • 31 COMPARE_OP 4 (gt)
  • 34 JUMP_IF_FALSE 21 (to 58)
  • 37 POP_TOP
  • 5 38 LOAD_FAST 1 (L)
  • 41 LOAD_ATTR 3 (append)
  • 44 LOAD_FAST 0 (x)
  • 47 LOAD_CONST 3 (2)
  • 50 BINARY_MULTIPLY
  • 51 CALL_FUNCTION 1
  • 54 POP_TOP
  • 55 JUMP_ABSOLUTE 19
  • gtgt 58 POP_TOP
  • 7 59 LOAD_FAST 1 (L)
  • 62 LOAD_ATTR 3 (append)
  • 65 LOAD_FAST 0 (x)
  • 68 LOAD_CONST 3 (2)
  • 71 BINARY_ADD
  • 72 CALL_FUNCTION 1
  • 75 POP_TOP
  • 76 JUMP_ABSOLUTE 19
  • gtgt 79 POP_BLOCK

13
Strategy for Compilation
  • Module-wide analysis
  • Check future statements
  • Build symbol table
  • For variable, is it local, global, free?
  • Makes two passes over block structure
  • Compile one function at a time
  • Generate basic blocks
  • Assemble bytecode
  • Optimize generated code (out of order)
  • Code object stored in parents constant pool

14
Symbol Table
  • Collect basic facts about symbols, block
  • Variables assigned, used params, global stmts
  • Check for import , unqualified exec, yield
  • Other tricky details
  • Identify free, cell variables in second pass
  • Parent passes bound names down
  • Child passes free variables up
  • Implicit vs. explicit global vars

15
Name operations
  • Five different load name opcodes
  • LOAD_FAST array access for function locals
  • LOAD_GLOBAL dict lookups for globals, builtins
  • LOAD_NAME dict lookups for locals, globals
  • LOAD_DEREF load free variable
  • LOAD_CLOSURE loads cells to make closure
  • Cells
  • Separate allocation for mutable variable
  • Stored in flat closure list
  • Separately garbage collected

16
Class namespaces
  • class Spam
  • id id(1)
  • 1 0 LOAD_GLOBAL 0 (__name__)
  • 3 STORE_NAME 1 (__module__)
  • 2 6 LOAD_NAME 2 (id)
  • 9 LOAD_CONST 1 (1)
  • 12 CALL_FUNCTION 1
  • 15 STORE_NAME 2 (id)
  • 18 LOAD_LOCALS
  • 19 RETURN_VALUE

17
Closures
  • def make_adder(n)
  • x n
  • def adder(y)
  • return x y
  • return adder
  • return make_adder

def make_adder(n) 2 0 LOAD_FAST 0
(n) 3 STORE_DEREF 0 (x) 3 6
LOAD_CLOSURE 0 (x) 9 LOAD_CONST 1
(ltcodegt) 12 MAKE_CLOSURE 0 15
STORE_FAST 2 (adder) 5 18 LOAD_FAST
2 (adder) 21 RETURN_VALUE
def adder(y) 4 0 LOAD_DEREF 0 (x) 3
LOAD_FAST 0 (y) 6 BINARY_ADD
7 RETURN_VALUE
18
Code generation input
  • Discriminated unions
  • One for each AST type
  • Struct for each option
  • Constructor functions
  • Literals
  • Stored as PyObject
  • ast pass parses
  • Identifiers
  • Also PyObject
  • string
  • typedef struct _stmt stmt_ty
  • struct _stmt
  • enum ..., For_kind8,
  • While_kind9, If_kind10,
  • ... kind
  • union
  • struct
  • expr_ty target
  • expr_ty iter
  • asdl_seq body
  • asdl_seq orelse
  • For
  • struct
  • expr_ty test
  • asdl_seq body
  • asdl_seq orelse
  • If
  • int lineno

19
Code generation output
  • Basic blocks
  • Start with jump target
  • Ends if there is a jump
  • Function is graph of blocks
  • Instructions
  • Opcode argument
  • Jump targets are pointers
  • Helper functions
  • Create new blocks
  • Add instr to current block
  • struct instr
  • unsigned char i_opcode
  • int i_oparg
  • struct basicblock_ i_target
  • int i_lineno
  • // plus some one-bit flags
  • struct basicblock_
  • int b_iused
  • int b_ialloc
  • struct instr b_instr
  • struct basicblock_ b_next
  • int b_startdepth
  • int b_offset
  • // several details elided

20
Code generation
  • One visitor function for each AST type
  • Switch on kind enum
  • Emit bytecodes
  • Return immediately on error
  • Heavy use of C macros
  • ADDOP(), ADDOP_JREL(),
  • VISIT(), VISIT_SEQ(),
  • Hides control flow

21
Code generation example
  • static int compiler_if(struct compiler c,
    stmt_ty s)
  • basicblock end, next
  • if (!(end compiler_new_block(c)))
  • return 0
  • if (!(next compiler_new_block(c)))
  • return 0
  • VISIT(c, expr, s-gtv.If.test)
  • ADDOP_JREL(c, JUMP_IF_FALSE, next)
  • ADDOP(c, POP_TOP)
  • VISIT_SEQ(c, stmt, s-gtv.If.body)
  • ADDOP_JREL(c, JUMP_FORWARD, end)
  • compiler_use_next_block(c, next)
  • ADDOP(c, POP_TOP)
  • if (s-gtv.If.orelse)
  • VISIT_SEQ(c, stmt, s-gtv.If.orelse)
  • compiler_use_next_block(c, end)
  • return 1

22
Assembler
  • Lots of fiddly details
  • Linearize code
  • Compute stack space needed
  • Compute line number table (lnotab)
  • Compute jump offsets
  • Call PyCode_New()
  • Peephole optimizer
  • Integrated at wrong end of assembler
  • Constant folding, simplify jumps

23
AST transformation
  • Expose AST to Python programmers
  • Simplify analysis of programs
  • Generate code from modified AST
  • Example
  • Implement with statement as AST transform
  • Ongoing work
  • BOF this afternoon at 315, Preston Trail

24
Loose ends
  • compiler package
  • Should revise to support new AST types
  • Tricky compatibility issue
  • Revise pgen to generate AST directly
  • Develop toolkit for AST transforms
  • Extend analysis, e.g. PEP 267
Write a Comment
User Comments (0)
About PowerShow.com