Trustless Grid Computing in ConCert - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Trustless Grid Computing in ConCert

Description:

Chang, Delap, Dreyer, Kliger , Magill, Moody, Murphy, ... Idempotent: can always be re-run. Non-blocking: runs to completion (but may create more chords) ... – PowerPoint PPT presentation

Number of Views:15

Avg rating:3.0/5.0

Slides: 42

Provided by: RobertH197

Category:

more less

Transcript and Presenter's Notes

Title: Trustless Grid Computing in ConCert

1
Trustless Grid Computing in ConCert

Robert Harper
Carnegie Mellon University

2
Acknowledgements

Co-PIsKarl Crary, Frank Pfenning, Peter Lee.
SupportNSF ITR program.
Students (who do the real work)Chang, Delap,
Dreyer, Kliger , Magill, Moody, Murphy, Petersen,
Sarkar, Vanderwaart, Watkins.
Thank you to GALT Organizers for the invitation!

3
Grid Computing

The network as computer.
Exploit idle resources on the network.
Many ad hoc grids.
SETI_at_HOME
FOLDING_at_HOME
But what is a general grid model?
Application model?
Trust model?
Participation model?

4
Application Model

Current applications focus on cycles.
Massively parallel (depth 1) problems.
SETI_at_Home, Folding_at_Home, many others.
Current approaches are centralized.
SETI data goes back and forth to UCB.
UCB assigns tasks to hosts.
Most grids are local to a project/site.
All machines in a lab.
But there are well-known global grids too.

5
Application Model

Can we handle to depth gt 1 problems?
Eg, theorem provers, search problems.
Introduce scheduling dependencies.
Is a decentralized grid feasible?
Avoid bottlenecks at collection / distribution
points.
Reduce hot spots for network traffic.
What about resource locality?
Data resident at a site.
Delivery of results.

6
Participation Model

Active intervention required.
Must download code, apply upgrades.
Must decide on which grids to participate.
Motivation to participate?
At scale, largely altruism, coolness.
Ad hoc grids on an intranet.
Economic models? (Cf Lillibridge, et al.)

7
Trust Model

Currently, hosts trust applications.
Denial of service attacks.
Privacy/secrecy attacks.
Accidental misbehavior (e.g., SETI).
Applications may also trust hosts.
Spoofed answers.
Collusion among participants.
Can we minimize trust?
Reduce risks.
Permit passive participation.

8
The ConCert Grid Model

One computer, many keyboards.
Decentralized scheduling.
Emphasis on code mobility.
Policy-based participation.
Declarative statement of participation criteria.
Applications must prove compliance.
Dependency-based scheduling.
Arbitrary depth.
And/or dependencies.
Inspired by CILK/NOW.

9
The ConCert Network
Client
Hosts
10
Host Setup
Peer-to-Peer Discovery Protocol
Locator
Scheduler
Distributed Scheduler
Worker
Loader/Verifier/Runner
11
Locator

Variant of Gnutella ping-pong protocol.
Start with well-known neighbors.
Ping known sites, expect pong back.
Record whoever pings you.
We do not (yet) bother with anonymity.
Not hard to generalize.
Establishes and maintains the grid.
Periodic update of accessibility.

12
Scheduler

Work-stealing model.
Who has work to do?
Grab work, compute result, deliver to owner.
Fully decentralized.
Dependency-based scheduling.
Supports depth gt 1, dont care and dont know
parallelism.
Well-founded dependencies only no cycles.
Cf. join calculus, JOCaml (more general).
Maintain ready and waiting queues.
Ready queue available for stealing.
Wait queue awaiting satisfying assignment.

13
Chords

The unit of work on the grid is a chord.

14
Chords

Semantics
Idempotent can always be re-run.
Non-blocking runs to completion (but may create
more chords).
Communication via dependencies. Satisfying
assignment passed on activation.
Dependencies
And/or dependencies on results of other chords.
Certificate
Proof of compliance with host policy.
Generated by a certifying compiler.

15
Worker

Steal work from (self or) neighbor.
Fetch chord from host.
Typically arguments dependencies.
Code cached at host to reduce traffic.
Verify safety certificate.
Load and execute as a DLL.
Currently combined with verification.
Should verify at most once (cache result).
Deliver result to owner.

16
Moving Chords Around
A client submits work, broken into chords, to the
local conductor.
17
Moving Chords Around
Idle peers steal chords to work on. Chords have
destinations for their answers, shown by color.
18
Moving Chords Around
Some chords spawn new cords. They might depend on
other chords before they can run. The destination
of F and G is the green node, since they will be
used to fill Hs dependencies.
19
Moving Chords Around
When a chord finishes, the result is sent to its
destination. The client interprets and displays
the results. Simultaneously, unfinished chords
continue to be stolen...
20
Moving Chords Around
When the green node has answers for F and G, H is
then ready to be stolen.
21
Grid Programming

What is a good language for grid applications?
Functional language is natural.
Manage chord creation, distribution, and
coordination.
Permit binding to local resources.
Compiler generation of safety certificates.

22
A Low-Level Grid Language

Popcorn/Grid a rudimentary language.
Compiles to Typed Assembly Language.
Compliance checking type checking.
Programmer handles marshalling.
Separate program for each chord.
Proof-of-concept for basic applications.
But too simple for real work.
Used to build early demos.

23
A Low-Level Grid Language

Chords are essentially continuations.
my_cord string witness ! string.
Witness satisfying assignment of dependencies.
Cf join patterns.
Chords are typically dispatch functions
Input entry point arguments.
Unmarshall arguments, branch to designated entry
point.

24
A High-Level Grid Language

ML/Grid
One program for client and its chords.
Compiler handles marshalling, distribution,
coordination.
Compiles to TAL(T).
Currently, TALx86.
Eventually, TALT (more on this later).
Run-time checks enforce restrictions.
Chord cannot perform I/O.
Client is not a cord.
Want a static type discipline. (more on this
later.)

25
High-Level Grid Model

Fundamental abstraction task.
Type ? task is a task returning a value of type
?.
Compiles down to chord model on the grid.
Primitive operations
spawn (unit ! ?) ! ? task
sync ? task ! ?
relax ? task list ! ? ? task list
Sufficient to build richer mechanisms.
Eg, continuation-based parallelism.

26
Current Demos

Available after talk if youre interested.
Runs remotely at CMU.
Demo uses a single node, but supports any number
of hosts.
GML ray-tracer.
From ICFP 01 contest.
Depth 1.
Written in Popcorn/Grid.

27
Current Demos

Chess player.
Depth gt 1, and-or dependencies.
Written in Popcorn/Grid.
Uses jamboree search algorithm, but a woeful
board evaluation function.
Simple theorem prover.
Depth gt 1, and-or dependencies.
Written in ML/Grid.
Intuitionistic propositional logic.
MLL prover runs on simulator (could be ported).

28
Two Foundational Questions

What is an appropriate type system for a grid
programming language?
Enforce mobility constraints.
Clean type system to support development,
compilation, certification.
What safety policies can we support?
How to state policies?
How to prove compliance?
How to support multiple policies?

29
Modalities for Mobility

Curry-Howard interpretation of modal logic.
Cf. related ideas by Cardelli, Gordon, Walker, et
al.
Modalities enforce locality and mobility
constraints by type checking.
Hosts are possible worlds.
Each host provides an execution site for chords.
Accessibility between possible worlds.
A B means that may move from A to B.

30
Modalities for Mobility

Accessibility should be an equivalence
Reflexive can stay here.
Transitive can move from host to host.
Symmetric can go back to source.
This suggests looking at S5 modal logic.
Appropriate for RST accessibility.
Intuitionistic variant for computational
interpretation.

31
A Candidate Type System

Necessity ( A) an A anywhere.
Classifies mobile code of type A.
Enforces marshalling and access restrictions.
Runnable at any accessible site.
Possibility ( A) an A somewhere.
Classifies remote code of type A.
Expresses resource locality.
Can only depend on remote resources.
Other modalities are imaginable.
Walker broadcast/multicast modality.

32
Modalities for Mobility

Truth (local) typing judgment

Possible (Remote) Resources
True (Local) Resources
Valid (Mobile) Resources
33
Mobility as Necessity

Validity (mobile) typing judgement
Mobile does not use local resources.

34
Mobility as Necessity

Box marshal value and bindings.
Values of boxed type are mobile code available
here that can run anywhere.

35
Mobility as Necessity

Unboxing extract and run mobile code.
Implicit un-marshalling

36
Locality as Possibility

Possible (somewhere) typing judgement
What is here is somewhere

37
Locality as Possibility

Go to remote site, rendering local resources
possible.
Can only use specified remote resources!

38
Locality as Possibility

Create a local proxy
Access it

39
Joint Possibility

Cannot consider only a single possibility at a
time!
A and B both possible does not imply that they
are true at the same world.
Could resort to explicit worlds.
M A _at_ w means M is of type A at w.
Seems unnatural for our grid model in which code
moves spontaneously.

40
Joint Possibility

Solution take joint possibility as primitive.
Possibility context is clustered into records
Possibility modality for a cluster f?g

41
Joint Possibility

Generalized go to rule

42
Warning Work in Progress

Cut elimination for a sequent variant.
Work in progress.
Needs of proof suggested clustering.
It really is S5.
Relate to explicit worlds formulation.
Cf Alex Simpsons PhD work.
Operational interpretation.
Abstract machine for mobile code.
Type safety proof for the semantics.

43
Policies and Certification

Policy should specify what is permissible.
Memory safety (no wild pointers)
Control safety (no illegal instructions or jumps)
Current approaches specify how to ensure
compliance.
Fix a particular type system (or equivalent) and
certificate format, baked into TCB.
For example, PCC or TAL certified code formats.

44
Foundational Certification

This raises two issues
Flexibility different type systems for different
problems.
Robustness what if the committed type system is
unsound?
Moving the type system out of the TCB solves both
problems.
Appel emphasizes robustness.
Were concerned with flexibility.

45
Foundational Safety

Specify operational semantics of target
architecture.
Fully realistic, e.g., IA-32 OS RTS.
No unsafe transitions.
Safety target does not get stuck.
Any type system must come with a proof of
progress and preservation.
Experience shows that these proofs may be
mechanized fairly easily (using Twelf).

46
Foundational Certification
Certified Binary of Grid Application
47
Foundational Certification

Object code compiled as a DLL.

Compiled Machine Code
48
Foundational Certification

Annotations facilitate type checking.

Typing Annotations for Object Code
49
Foundational Certification

Type checking program (written in Twelf)

Type Checking Program
50
Foundational Certification

Proof that type checks ) safe (i.e., partial
correctness of checker)

Soundness Proof for Type Checker
51
Examples

TALT
Similar to TALx86, but more realistic and with a
safety proof.
Safety proof is mechanically checked.
Structured as a safety proof for an abstract
machine plus a simulation lemma of AM on target
architecture.
TALT Resource Bounds
Goal ensure that object code yields processor at
set intervals.
Precludes denial of CPU service.

52
Resource Bound Certification

Type system enforces upper bound on yield
interval.
A parameter of the type system.
Uses dependent types to manage counts.
Type correctness proves that code yields at
specified interval.
Ensures that grid application plays nicely with
other programs in system.

53
Resource Bound Certification

Rudimentary method
Conservative instruction counting.
Approximations arise at join points.
Yield processor at start of every basic block.
Cf GC check at block entry.
Type checking proves that each block can complete
before yield is required.
Otherwise, compiler must split the block.

54
Resource Bound Certification

Better methods are under development.
Better analysis across procedure calls.
Based on Feeleys balanced polling.
Still conservative, because fully static.
Adding run-time checks reduces yields.
Too many yields leads to poor utilization.
Minor yield use run-time checks to recalibrate
static approximation, doing a major yield to
acquire more time.
Major yield actually yield the processor.

55
A Meta-Grid?

ConCert Conductor represents one model of grid
computing.
Compute-intensive, distributed scheduling.
Not much reason to believe this is canonical.
Can we support a variety of models inside of a
single meta-grid?
Applications choose grid model.
Hosts are indifferent to programming model.

56
A Meta-Grid?

The ur-grid
A TCP port.
Foundational code certification.
A grid framework
Scheduler, recovery model, host policy.
Runs application cords.

57
A Meta-Grid?

Key capability safe dynamic loading and linking.
Current ConCert framework must be certified
against host safety policy.
It must be able to load application policies and
application code.
Requires a general theory of safe linking.
Network type system
Theory of marshalling / certification.

58
Summary

Declarative approach to safe grids.
Passive, policy-based participation model.
Logic and proof technology for specifying
policies and proving compliance.
Close interplay between systems building and
foundational theory.
Type systems for mobile code.
Type systems for various safety policies.

59
Thanks!

Web site http//www.cs.cmu.edu/concert.
Demonstration available after talk.
Any questions or comments?

60
Some Current Problems

Failures.
Fail-stop model is easily supported.
Demonic failures require result certification.
Abandoning chords.
Or-dependencies are satisfied by first chord to
deliver answer.
Parent must be prepared to receive result long
after it is no longer needed.
Result sharing.
Grid-wide cache of answers?

61
Result Certification

Host proves validity of answer.
Avoid need for application to trust hosts.
Avoid byzantine agreement problems.
Some applications naturally admit result
certification.
For theorem prover the proof.
For factoring, the factors.
General result certification methods?
Work-stealing model precludes random allocation /
redundancy methods (SETI, Bayanihan).
Centralized methods are not robust or scalable.

62
Result Certification