Title: Trustless Grid Computing in ConCert
1Trustless Grid Computing in ConCert
- Robert Harper
- Carnegie Mellon University
2Acknowledgements
- Co-PIsKarl Crary, Frank Pfenning, Peter Lee.
- SupportNSF ITR program.
- Students (who do the real work)Chang, Delap,
Dreyer, Kliger , Magill, Moody, Murphy, Petersen,
Sarkar, Vanderwaart, Watkins. - Thank you to GALT Organizers for the invitation!
3Grid Computing
- The network as computer.
- Exploit idle resources on the network.
- Many ad hoc grids.
- SETI_at_HOME
- FOLDING_at_HOME
- But what is a general grid model?
- Application model?
- Trust model?
- Participation model?
4Application Model
- Current applications focus on cycles.
- Massively parallel (depth 1) problems.
- SETI_at_Home, Folding_at_Home, many others.
- Current approaches are centralized.
- SETI data goes back and forth to UCB.
- UCB assigns tasks to hosts.
- Most grids are local to a project/site.
- All machines in a lab.
- But there are well-known global grids too.
5Application Model
- Can we handle to depth gt 1 problems?
- Eg, theorem provers, search problems.
- Introduce scheduling dependencies.
- Is a decentralized grid feasible?
- Avoid bottlenecks at collection / distribution
points. - Reduce hot spots for network traffic.
- What about resource locality?
- Data resident at a site.
- Delivery of results.
6Participation Model
- Active intervention required.
- Must download code, apply upgrades.
- Must decide on which grids to participate.
- Motivation to participate?
- At scale, largely altruism, coolness.
- Ad hoc grids on an intranet.
- Economic models? (Cf Lillibridge, et al.)
7Trust Model
- Currently, hosts trust applications.
- Denial of service attacks.
- Privacy/secrecy attacks.
- Accidental misbehavior (e.g., SETI).
- Applications may also trust hosts.
- Spoofed answers.
- Collusion among participants.
- Can we minimize trust?
- Reduce risks.
- Permit passive participation.
8The ConCert Grid Model
- One computer, many keyboards.
- Decentralized scheduling.
- Emphasis on code mobility.
- Policy-based participation.
- Declarative statement of participation criteria.
- Applications must prove compliance.
- Dependency-based scheduling.
- Arbitrary depth.
- And/or dependencies.
- Inspired by CILK/NOW.
9The ConCert Network
Client
Hosts
10Host Setup
Peer-to-Peer Discovery Protocol
Locator
Scheduler
Distributed Scheduler
Worker
Loader/Verifier/Runner
11Locator
- Variant of Gnutella ping-pong protocol.
- Start with well-known neighbors.
- Ping known sites, expect pong back.
- Record whoever pings you.
- We do not (yet) bother with anonymity.
- Not hard to generalize.
- Establishes and maintains the grid.
- Periodic update of accessibility.
12Scheduler
- Work-stealing model.
- Who has work to do?
- Grab work, compute result, deliver to owner.
- Fully decentralized.
- Dependency-based scheduling.
- Supports depth gt 1, dont care and dont know
parallelism. - Well-founded dependencies only no cycles.
- Cf. join calculus, JOCaml (more general).
- Maintain ready and waiting queues.
- Ready queue available for stealing.
- Wait queue awaiting satisfying assignment.
13Chords
- The unit of work on the grid is a chord.
14Chords
- Semantics
- Idempotent can always be re-run.
- Non-blocking runs to completion (but may create
more chords). - Communication via dependencies. Satisfying
assignment passed on activation. - Dependencies
- And/or dependencies on results of other chords.
- Certificate
- Proof of compliance with host policy.
- Generated by a certifying compiler.
15Worker
- Steal work from (self or) neighbor.
- Fetch chord from host.
- Typically arguments dependencies.
- Code cached at host to reduce traffic.
- Verify safety certificate.
- Load and execute as a DLL.
- Currently combined with verification.
- Should verify at most once (cache result).
- Deliver result to owner.
16Moving Chords Around
A client submits work, broken into chords, to the
local conductor.
17Moving Chords Around
Idle peers steal chords to work on. Chords have
destinations for their answers, shown by color.
18Moving Chords Around
Some chords spawn new cords. They might depend on
other chords before they can run. The destination
of F and G is the green node, since they will be
used to fill Hs dependencies.
19Moving Chords Around
When a chord finishes, the result is sent to its
destination. The client interprets and displays
the results. Simultaneously, unfinished chords
continue to be stolen...
20Moving Chords Around
When the green node has answers for F and G, H is
then ready to be stolen.
21Grid Programming
- What is a good language for grid applications?
- Functional language is natural.
- Manage chord creation, distribution, and
coordination. - Permit binding to local resources.
- Compiler generation of safety certificates.
22A Low-Level Grid Language
- Popcorn/Grid a rudimentary language.
- Compiles to Typed Assembly Language.
- Compliance checking type checking.
- Programmer handles marshalling.
- Separate program for each chord.
- Proof-of-concept for basic applications.
- But too simple for real work.
- Used to build early demos.
23A Low-Level Grid Language
- Chords are essentially continuations.
- my_cord string witness ! string.
- Witness satisfying assignment of dependencies.
- Cf join patterns.
- Chords are typically dispatch functions
- Input entry point arguments.
- Unmarshall arguments, branch to designated entry
point.
24A High-Level Grid Language
- ML/Grid
- One program for client and its chords.
- Compiler handles marshalling, distribution,
coordination. - Compiles to TAL(T).
- Currently, TALx86.
- Eventually, TALT (more on this later).
- Run-time checks enforce restrictions.
- Chord cannot perform I/O.
- Client is not a cord.
- Want a static type discipline. (more on this
later.)
25High-Level Grid Model
- Fundamental abstraction task.
- Type ? task is a task returning a value of type
?. - Compiles down to chord model on the grid.
- Primitive operations
- spawn (unit ! ?) ! ? task
- sync ? task ! ?
- relax ? task list ! ? ? task list
- Sufficient to build richer mechanisms.
- Eg, continuation-based parallelism.
26Current Demos
- Available after talk if youre interested.
- Runs remotely at CMU.
- Demo uses a single node, but supports any number
of hosts. - GML ray-tracer.
- From ICFP 01 contest.
- Depth 1.
- Written in Popcorn/Grid.
27Current Demos
- Chess player.
- Depth gt 1, and-or dependencies.
- Written in Popcorn/Grid.
- Uses jamboree search algorithm, but a woeful
board evaluation function. - Simple theorem prover.
- Depth gt 1, and-or dependencies.
- Written in ML/Grid.
- Intuitionistic propositional logic.
- MLL prover runs on simulator (could be ported).
28Two Foundational Questions
- What is an appropriate type system for a grid
programming language? - Enforce mobility constraints.
- Clean type system to support development,
compilation, certification. - What safety policies can we support?
- How to state policies?
- How to prove compliance?
- How to support multiple policies?
29Modalities for Mobility
- Curry-Howard interpretation of modal logic.
- Cf. related ideas by Cardelli, Gordon, Walker, et
al. - Modalities enforce locality and mobility
constraints by type checking. - Hosts are possible worlds.
- Each host provides an execution site for chords.
- Accessibility between possible worlds.
- A B means that may move from A to B.
30Modalities for Mobility
- Accessibility should be an equivalence
- Reflexive can stay here.
- Transitive can move from host to host.
- Symmetric can go back to source.
- This suggests looking at S5 modal logic.
- Appropriate for RST accessibility.
- Intuitionistic variant for computational
interpretation.
31A Candidate Type System
- Necessity ( A) an A anywhere.
- Classifies mobile code of type A.
- Enforces marshalling and access restrictions.
- Runnable at any accessible site.
- Possibility ( A) an A somewhere.
- Classifies remote code of type A.
- Expresses resource locality.
- Can only depend on remote resources.
- Other modalities are imaginable.
- Walker broadcast/multicast modality.
32Modalities for Mobility
- Truth (local) typing judgment
Possible (Remote) Resources
True (Local) Resources
Valid (Mobile) Resources
33Mobility as Necessity
- Validity (mobile) typing judgement
- Mobile does not use local resources.
34Mobility as Necessity
- Box marshal value and bindings.
- Values of boxed type are mobile code available
here that can run anywhere.
35Mobility as Necessity
- Unboxing extract and run mobile code.
- Implicit un-marshalling
36Locality as Possibility
- Possible (somewhere) typing judgement
- What is here is somewhere
37Locality as Possibility
- Go to remote site, rendering local resources
possible. - Can only use specified remote resources!
38Locality as Possibility
- Create a local proxy
- Access it
39Joint Possibility
- Cannot consider only a single possibility at a
time! - A and B both possible does not imply that they
are true at the same world. - Could resort to explicit worlds.
- M A _at_ w means M is of type A at w.
- Seems unnatural for our grid model in which code
moves spontaneously.
40Joint Possibility
- Solution take joint possibility as primitive.
- Possibility context is clustered into records
- Possibility modality for a cluster f?g
41Joint Possibility
42Warning Work in Progress
- Cut elimination for a sequent variant.
- Work in progress.
- Needs of proof suggested clustering.
- It really is S5.
- Relate to explicit worlds formulation.
- Cf Alex Simpsons PhD work.
- Operational interpretation.
- Abstract machine for mobile code.
- Type safety proof for the semantics.
43Policies and Certification
- Policy should specify what is permissible.
- Memory safety (no wild pointers)
- Control safety (no illegal instructions or jumps)
- Current approaches specify how to ensure
compliance. - Fix a particular type system (or equivalent) and
certificate format, baked into TCB. - For example, PCC or TAL certified code formats.
44Foundational Certification
- This raises two issues
- Flexibility different type systems for different
problems. - Robustness what if the committed type system is
unsound? - Moving the type system out of the TCB solves both
problems. - Appel emphasizes robustness.
- Were concerned with flexibility.
45Foundational Safety
- Specify operational semantics of target
architecture. - Fully realistic, e.g., IA-32 OS RTS.
- No unsafe transitions.
- Safety target does not get stuck.
- Any type system must come with a proof of
progress and preservation. - Experience shows that these proofs may be
mechanized fairly easily (using Twelf).
46Foundational Certification
Certified Binary of Grid Application
47Foundational Certification
- Object code compiled as a DLL.
Compiled Machine Code
48Foundational Certification
- Annotations facilitate type checking.
Typing Annotations for Object Code
49Foundational Certification
- Type checking program (written in Twelf)
Type Checking Program
50Foundational Certification
- Proof that type checks ) safe (i.e., partial
correctness of checker)
Soundness Proof for Type Checker
51Examples
- TALT
- Similar to TALx86, but more realistic and with a
safety proof. - Safety proof is mechanically checked.
- Structured as a safety proof for an abstract
machine plus a simulation lemma of AM on target
architecture. - TALT Resource Bounds
- Goal ensure that object code yields processor at
set intervals. - Precludes denial of CPU service.
52Resource Bound Certification
- Type system enforces upper bound on yield
interval. - A parameter of the type system.
- Uses dependent types to manage counts.
- Type correctness proves that code yields at
specified interval. - Ensures that grid application plays nicely with
other programs in system.
53Resource Bound Certification
- Rudimentary method
- Conservative instruction counting.
- Approximations arise at join points.
- Yield processor at start of every basic block.
- Cf GC check at block entry.
- Type checking proves that each block can complete
before yield is required. - Otherwise, compiler must split the block.
54Resource Bound Certification
- Better methods are under development.
- Better analysis across procedure calls.
- Based on Feeleys balanced polling.
- Still conservative, because fully static.
- Adding run-time checks reduces yields.
- Too many yields leads to poor utilization.
- Minor yield use run-time checks to recalibrate
static approximation, doing a major yield to
acquire more time. - Major yield actually yield the processor.
55A Meta-Grid?
- ConCert Conductor represents one model of grid
computing. - Compute-intensive, distributed scheduling.
- Not much reason to believe this is canonical.
- Can we support a variety of models inside of a
single meta-grid? - Applications choose grid model.
- Hosts are indifferent to programming model.
56A Meta-Grid?
- The ur-grid
- A TCP port.
- Foundational code certification.
- A grid framework
- Scheduler, recovery model, host policy.
- Runs application cords.
57A Meta-Grid?
- Key capability safe dynamic loading and linking.
- Current ConCert framework must be certified
against host safety policy. - It must be able to load application policies and
application code. - Requires a general theory of safe linking.
- Network type system
- Theory of marshalling / certification.
58Summary
- Declarative approach to safe grids.
- Passive, policy-based participation model.
- Logic and proof technology for specifying
policies and proving compliance. - Close interplay between systems building and
foundational theory. - Type systems for mobile code.
- Type systems for various safety policies.
59Thanks!
- Web site http//www.cs.cmu.edu/concert.
- Demonstration available after talk.
- Any questions or comments?
60Some Current Problems
- Failures.
- Fail-stop model is easily supported.
- Demonic failures require result certification.
- Abandoning chords.
- Or-dependencies are satisfied by first chord to
deliver answer. - Parent must be prepared to receive result long
after it is no longer needed. - Result sharing.
- Grid-wide cache of answers?
61Result Certification
- Host proves validity of answer.
- Avoid need for application to trust hosts.
- Avoid byzantine agreement problems.
- Some applications naturally admit result
certification. - For theorem prover the proof.
- For factoring, the factors.
- General result certification methods?
- Work-stealing model precludes random allocation /
redundancy methods (SETI, Bayanihan). - Centralized methods are not robust or scalable.
62Result Certification
- A crazy idea use the PCP theorem.
- Use interactive dialog to spot-check a proof.
- Host proves that it ran given code on given data.
- Execution trace is a proof that it did.
- But traces can be huge!
- Engage in a dialog with O(1) rounds to check
proof with high probability. - Avoids need to transmit trace itself.
- But the representation is enormous!
- And the implementation is prohibitively complex
(at present, at least).
63Foundational Certification