Title: CS696 Talk
1The Bugs and the Bees Research in Programming
Languages and Security
David Evans evans_at_cs.virginia.edu http//www.cs.vi
rginia.edu/evans
University of Virginia Department of Computer
Science
2Computer Science
- How to knowledge
- Ways of describing imperative processes
(computations) - Ways of reasoning about (predicting) what
imperative processes will do - Most interesting CS problems concern
- Better ways of describing computations
- Ways of reasoning about what they do (and dont
do)
3My Research Projects
- The Bugs Splint
- The Bees - Programming the Swarm
How can we detect code that describes unintended
computations?
How can we program massively distributed
collections of simple devices and reason about
their behavior in hostile environments?
4A Gross Oversimplification
all
Formal Verifiers
Bugs Detected
Splint
Compilers
none
Low
Unfathomable
Effort Required
5(Almost) Everyone Likes Types
- Easy to Understand
- Easy to Use
- Quickly Detect Many Programming Errors
- Useful Documentation
- even though they are lots of work!
- 1/4 of text of typical C program is for types
6Limitations of Standard Types
Type of reference never changes State changes along program paths
Language defines checking rules System or programmer defines checking rules
One type per reference Many attributes per reference
7Limitations of Standard Types
Attributes
Type of reference never changes State changes along program paths
Language defines checking rules System or programmer defines checking rules
One type per reference Many attributes per reference
8Approach
- Programmers add annotations (formal
specifications) - Simple and precise
- Describe programmers intent
- Types, memory management, data hiding, aliasing,
modification, null-ity, buffer sizes, security,
etc. - Splint detects inconsistencies between
annotations and code - Simple (fast!) dataflow analyses
9Security Flaws
190 Vulnerabilities Only 4 having to do with
crypto 108 of them could have been detected with
simple static analyses!
Reported flaws in Common Vulnerabilities and
Exposures Database, Jan-Sep 2001. Evans
Larochelle, IEEE Software, Jan 2002.
10Example Buffer Overflows David Larochelle
- Most commonly exploited security vulnerability
- 1988 Internet Worm
- Still the most common attack
- Code Red exploited buffer overflow in IIS
- gt50 of CERT advisories, 23 of CVE entries in
2001 - Attributes describe sizes of allocated buffers
- Heuristics for analyzing loops
- Found several known and unknown buffer overflow
vulnerabilities in wu-ftpd
11Some Open Issues
- Differential Program Analysis Joel Winstead
- We usually dont just have one program, we have
lots of versions of similar programs - How can we discover interesting differences
between two versions of a program? - e.g., find a test case that reveals the
difference, find invariants that are different - Design-level Properties
- Can we develop annotations and checks that deal
with design-level properties? - Integrate run-time checking
- Combine static and run-time checking to enable
additional checking and completeness guarantees
12Splint
- More information splint.org
- IEEE Software 02, USENIX Security 01, PLDI
96 - Public release real users, mentioned in C FAQ,
C Unleashed, Linux Journal, etc. - Students (includes other PL/SE/security related
projects) - David Larochelle buffer overflows, automatic
annotations - Joel Winstead differential program analysis
- Greg Yukl source code generation
- Current Funding NASA (joint with John Knight)
13Programming the Swarm
141950s Programming in the small... Programmable
computers Learned the programming is hard Birth
of higher-order languages Tools for reasoning
about trivial programs
Really Brief History of Computer Science
1970s Programming in the large... Abstraction,
objects Methodologies for development Tools for
reasoning about component-based systems
2000s Programming the Swarm!
15Whats Changing
- Execution Platforms
- Small, cheap and unreliable
- Limited power communication is expensive
- Execution environment
- Interact with physical world
- Unpredictable, dynamic
- Programs
- Old style of programming wont work
- Is there a new paradigm?
16Programming the Swarm Long-Range Goal
Cement 10 GFlop
17Why this Might be Possible?
- We are surrounded by systems that
- Contain 50 Trillion (5 1013) components
- Continue to function when 50 million components
fail every second - Survive in hostile environments (even Canada!)
- Self-organize starting from a single component
and a program that is smaller than WindowsXP
18A Biological Programming Model Selvin George
- Program systems the way biology does
- Literal interpretation
- Cells can change state (genes turn on and off)
- Cells can divide
- Asymmetrically
- Cells can communicate over short distances
- Chemical diffusion
19Example Cell Program
state s1 transitions -gt (s1, s1)
normal
20Cell Programs
- Use chemicals to control development
- How can we produce cell programs that generate
particular structures? - How can we reason about the behavior of cell
programs in the presence of failures and
randomness? - How can we describe cell programs at a higher
level? (Making abstractions)
21Less Literal Interpretation
- Learn about self-organization and robustness by
mimicking biology - Learn principles from biology, not programs
- Use this to build real systems
- Sensor networks
- Distributed file sharing
22Sensor Networks
High-power base station
Thousands of small, low-powered devices with
sensors and actuators, communicating wirelessly
23Sensor Networks
High-power base station
Compromised Node!
Enemy base station
24Security for Sensor Networks
- Control Messages
- Only messages from base station (or other nodes)
should change device behavior - Data Collection
- A few compromised nodes should not be able to
prevent or tamper with data collection - Data Confidentially
- Some applications eavesdropper shouldnt be able
to interpret messages
25Why security for sensor networks is hard
- Low power devices
- Cannot do traditional public-key algorithms
- Limited device communication
- Sending messages is extremely expensive
- Communication is wireless
- All messages are vulnerable to eavesdropping and
forgery - Devices start identical no stored secrets
26Asymmetric Cryptography
- Cryptography depends either on
- Shared secrets
- Asymmetry (normally or information)
- Exploit time and space asymmetries
- Public-key systems get asymmetry by only one
party knowing private key - In sensor networks, we can get asymmetry by using
time (key is revealed later, but in a verifiable
way) and space (only nodes within a certain
distance can hear)
27Non-Cryptographic Techniques
- Redundancy
- Lots of sensors, only a few will be compromised
or bogus - Snooping
- Because communication is wireless, nodes can hear
what their neighbors are saying - If they are lying, tattle tale!
28Programming the Swarm
- swarm.cs.virginia.edu
- Students
- Selvin George Biological Programming Model
- Undergraduates Keen Browne, Jacques Fournier,
Chris Frost, Ami Malaviya, Jon McCune - Funding NSF Career Award, NSF ITR
29Summary
- Programming the Swarm Describing and reasoning
about behavior of large ad hoc collections in
hostile environments - Splint Detecting differences between what
programs express and what programmers intend - Be proactive about finding an advisor
- Most important decision you will make in grad
school - Matching process is last resort
- Email to arrange meetings evans_at_cs.virginia.edu