Detecting Space Leaks in a Lazy Language - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Detecting Space Leaks in a Lazy Language

Description:

Informal definition: waste of heap space due to improper usage. In some languages (C , Pascal) ... Heap Profiling (C. Runciman, N. R jemo) Pros. visual. Cons ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 17
Provided by: zhanyo
Category:

less

Transcript and Presenter's Notes

Title: Detecting Space Leaks in a Lazy Language


1
Detecting Space Leaks in a Lazy Language
  • Zhanyong Wan
  • Yale University Dept of Computer Science
  • Advisor Paul Hudak

2
What is a Space Leak?
  • Informal definition waste of heap space due to
    improper usage
  • In some languages (C, Pascal)
  • no GC
  • user may forget to release memory
  • not the topic
  • In some other languages (Java, ML, Haskell)
  • type-safe
  • automatic GC
  • hold something unneeded
  • this is the topic
  • more specific lazy FP languages

3
What is Lazy?
  • Lazy (call-by-need)
  • an expression is not evaluated until its value is
    demanded by the computation.
  • select x y z
  • if x then y else z
  • select (agtb) (f a) (f b)
  • exactly one of (f a) and (f b) is evaluated.
  • a shared value is evaluated at most once.
  • double x x x
  • double (f a)
  • (f a) is evaluated only once.
  • Whats Cool
  • eliminates unnecessary work
  • allows infinite data structures
  • frees users from low-level memory usage
  • can use huge data structures (while eager
    languages encourage recursive functions)
  • Whats not Cool
  • hard to predict how much memory a program will
    use
  • makes space leaks
  • easy to introduce
  • hard to explain
  • hard to locate
  • hard to remove
  • Space leak has become a major problem for lazy
    languages.

4
Earlier Work
  • Heap Profiling (C. Runciman, N. Röjemo)
  • Pros
  • visual
  • Cons
  • not at compile-time
  • no automatic analysis -- relies on the programmer
  • whats unsolved can still be hard
  • Sized Type System (J. Hughes et al)
  • programmer provides sized type signatures
  • type-check
  • detect leak (manual!)
  • Pros
  • automatic type checking
  • modular
  • Cons
  • incorrect signature will cause valid program to
    be rejected
  • still requires human intuition (unsystematic)
  • linear data types only
  • How hard is this?
  • unsolved for years
  • Why is it so hard?
  • no good model to reason about memory usage at a
    suitably abstract level

5
Mission Impossible?
  • The ideal analysis
  • automatic -- no human intervention
  • static -- no run-time hustle, detects space leaks
    before it happens
  • finds the source of the space leaks
    (constructive)
  • scalable (modular)
  • Can it be done?
  • partially yes
  • undecidable in general
  • approximation
  • may err
  • serves as a hint
  • What we have achieved
  • automatic -- almost
  • static -- yes
  • finds source of leaks -- sometimes
  • scalable -- yes

6
Whats Covered Today
  • our ideas
  • what is consumption rate
  • how is it related to space leak
  • how we detect leaks using consumption rates
  • tech details
  • a small lazy language called lazy
  • a renaming mechanism to distinguish different
    occurrences of the same variable
  • consumption rate analysis
  • encoding the criterion for leak
  • example
  • how we solve an example in J. Hughes et als
    paper
  • summary
  • our achievements and limitations
  • open problems

7
The Idea -- Consumption-Rate Analysis
  • Whats consumption rate?
  • how fast one data structure is consumed to
    produce the result data structure
  • in lazy languages, a data structure is
    constructed on-demand
  • Why should we be interested in it?
  • it captures the nature of lazy evaluation
  • the unexpanded part of a value is stored as a
    closure
  • the expanded part is remembered if its shared,
    or will be GC-ed when no longer in use
  • different rates unbound buffer space leak
  • How to acquire it?
  • by what we call consumption-rate analysis
  • an equation system based on structural induction
  • incomputable generally. we choose to overshoot

8
A Language Called Lazy
  • First-order, purely functional, lazy, monomorphic
  • Data types Integers, Lists of integers
  • Syntax
  • c ? -2 -1 0 1 2 integer constants
  • x ? x y z variables
  • f ? f g h function names
  • op ? gt strict binary operators
  • e ? c
  • x empty list e1e2 list
    constructions e1 op e2 strict binary
    operations if e1 then e2 else
    e3 conditionals case x1 of -gt e1 x2x3 -gt
    e2 pattern matchings f e1 en function
    applications
  • p ? fi x1 xn ei programs
  • Symantics
  • Standard first-order

9
An example of leak (from J. Hughes et al)
  • fil xs case xs of
  • -gt
  • yys -gt case ys of
  • -gt
  • zzs -gt ((yz)/2)fil zs
  • add xs ys case xs of
  • -gt
  • uus -gt case ys of
  • -gt
  • vvs -gt (uv)add us vs
  • net in add (fil in) (fil (fil in))

buffer
i
i
2i
4i
fil
add
in
out
4i
i
fil
fil
2i
10
Definition Consumption-length/rate
  • The consumption length of an integer expression e
    w.r.t. a variable x
  • Clxe k,where k the length of the prefix of
    x needed to compute the value of e.
  • k 0 or 1 when x is an integer variable,
    representing conventional strictness.
  • The consumption rate function of a list
    expression e w.r.t. a variable x
  • Crxe n k,where k the length of the prefix
    of x needed to compute the first n elements of e.
  • Again, k 0 or 1 when x is an integer.
  • Now we need a way to calculate Clxe (read
    length x e) and Crxe (read rate function x
    e).

11
Step 1 Renaming
  • A renaming transformation
  • S Var ? Exp ? Exp
  • Purpose
  • distinguish different free occurrences of a
    variable (say, x)
  • free unbound in a case x ... expression
  • why case x ... is special
  • case x of -gt e1 x1x2 -gt e2
  • it introduces aliases aliases mean sharing
  • it gives us information about x we dont need to
    explore the branch
  • Examples
  • S x x f x / g x
  • x(1) f x(2) / g x(3)
  • S x f x - case x of -gt z yys -gt g x x
  • f x(1) - case x(2) of -gt z yys -gt g x(2)
    x(2)

12
Step 2 Consumption Length
  • Clxe maxi Clx(i)S x e
  • Clxc 0
  • Clxx 1
  • Clxx1 0, for ?(x1 ? x)
  • Clxe1 op e2 Clxe1 Clxe2
  • Clxif e1 then e2 else e3 Clxe1 Clxe2
    Clxe3
  • Clxcase x of -gt e1 x2x3 -gt e2 max
    Clxe2, 1 Clx3e2
  • Clxcase x1 of -gt e1 x2x3 -gt e2 Clxe1
    Clxe2
  • Clxf e1 ek ?ki1 ni, where
  • ni Crxei (Clxief ) if xi is a list, or
  • ni (Crxei ) ? (Clxief ) otherwise
  • ef is the body of function f, i.e. f x1 xk
    ef
  • ??? ?, ? ? ?
  • ? ?????? ?

13
Step 3 Consumption Rate
  • Crxe n maxi Crx(i)S x e n
  • Crxx n n
  • Crxx1 n 0, for ?(x1 ? x)
  • Crx n 0
  • Crxe1 e2 n Clxe1 Crxe2 (n - 1)
  • Crxif e1 then e2 else e3 n Clxe1
    Crxe2 n Crxe3 n
  • Crxcase x of -gt e1 x2x3 -gt e2 n max
    Crxe2 n, 1 Crx3e2 n
  • Crxcase x1 of -gt e1 x2x3 -gt e2 n
    Crxe1 n Crxe2 n
  • Crxf e1 ek n ?ki1 ni, where
  • ni Crxei (Crxief n) if xi is a list, or
  • ni (Clxei) ? (Crxief n) otherwise
  • ef is the body of function f, i.e. f x1 xk
    ef
  • A linear approximation for Cr
  • Crx e n ? Rtxe ? n b
  • only Rtxe (the consumption rate) needs to be
    computed.

14
Step 4 Encoding the Criterion for Leak
  • For each sub-expression e in the program, and
    each list variable x in e, add the following
    equation
  • Rtx(1) e Rtx(2) e Rtx(k) e
  • where x occurs free k times in e and e S x
    e
  • For each pattern-matching expression case x of
    -gt e1 x2x3 -gt e2 in the program, add
  • Rtx e2 Rtx3 e2
  • inconsistency of the equation system space
    leak,the offending equation source of the leak
  • the existence of space leak is reduced tothe
    existence of solution of the equation system
  • how to determine this
  • heuristic
  • partial solution might be enough
  • divide-and-conquer (modularity)
  • numerical methods
  • people (the last resort)

15
Example Revisited
  • Now try our analysis on the filter example
  • Crxsfil xs n 1 Crxsfil xs (n - 2)
  • Crxsadd xs ys n 1 Crxsadd xs ys (n - 1)
  • Crysadd xs ys n 1 Crysadd xs ys (n - 1)
  • ? Rtxsfil xs 2
  • Rtxsadd xs ys Rtxsadd xs ys 1
  • ? Rtin(1)add (fil in(1)) (fil (fil in(2)))
  • ( Rtxsadd xs ys ) ? ( Rtxsfil xs ) 1 ? 2
    2
  • Rtin(2)add (fil in(1)) (fil (fil in(2)))
  • ( Rtysadd xs ys ) ? ( Rtxsfil xs ) ? (
    Rtxsfil xs )
  • 1 ? 2 ? 2 4
  • contradicts with
  • Rtin(1)add (fil in(1)) (fil (fil in(2)))
  • Rtin(2)add (fil in(1)) (fil (fil in(2)))

16
Conclusion
  • Our achievements
  • static analysis
  • automatic, unless the consistency/inconsistency
    can not be proved
  • finds source of leaks sometimes
  • modular (can be thought of as a type-based
    analysis)
  • the consumption rates are conservative --
    potential uses
  • Our limitations
  • linear data type only -- so is Hughes approach
  • first-order
  • transform higher-order program into first-order
    first
  • may fail to produce an answer
  • unsound and incomplete
  • so is Hughes
  • impossible to achieve both
  • Open problems
  • the existence of solutions of the equation system
  • higher-order functions
  • arbitrary algebraic data types
Write a Comment
User Comments (0)
About PowerShow.com