Title: CS711: Reference Monitors Part 1: OS
1CS711 Reference MonitorsPart 1 OS SFI
- Greg Morrisett
- Cornell University
2A Reference Monitor
- Observes the execution of a program and halts the
program if its going to violate the security
policy. - Common Examples
- operating system (hardware-based)
- interpreters (software-based)
- firewalls
- Claim majority of todays enforcement
mechanisms are instances of reference monitors.
3Reference Monitors Outline
- Analysis of the power and limitations.
- What is a security policy?
- What policies can reference monitors enforce?
- Traditional Operating Systems.
- Policies and practical issues
- Hardware-enforcement of OS policies.
- Software-enforcement of OS policies.
- Why?
- Software-Based Fault Isolation
- Java and CLR Stack Inspection
- Inlined Reference Monitors
4Requirements for a Monitor
- Must have (reliable) access to information about
what the program is about to do. - e.g., what instruction is it about to execute?
- Must have the ability to stop the program
- cant stop a program running on another machine
that you dont own. - really, stopping isnt necessary, but transition
to a good state. - Must protect the monitors state and code from
tampering. - key reason why a kernels data structures and
code arent accessible by user code. - In practice, must have low overhead.
5What Policies?
- Well see that under quite liberal assumptions
- theres a nice class of policies that reference
monitors can enforce (safety properties). - there are desirable policies that no reference
monitor can enforce precisely. - rejects a program if and only if it violates the
policy - Assumptions
- monitor can have access to entire state of
computation. - monitor can have infinite state.
- but monitor cant guess the future the
predicate it uses to determine whether to halt a
program must be computable.
6Schneider's Formalism
- A reference monitor only sees one execution
sequence of a program. - So we can only enforce policies P s.t.
- (1) P(S) ???S.P (?)
- where P is a predicate on individual sequences.
- A set of execution sequences S is a property if
membership is determined solely by the sequence
and not the other members in the set.
7More Constraints on Monitors
- Shouldnt be able to see the future.
- Assumption must make decisions in finite time.
- Suppose P (?) is true but P (?..i) is false for
some prefix ?..i of ?. When the monitor sees
?..i it cant tell whether or not the execution
will yield ? or some other sequence, so the best
it can do is rule out all sequences involving
?..i including ?. - So in some sense, P must be continuous
- (2) ??.P (?) ? (?i.P(?..i))
8Safety Properties
- A predicate P on sets of sequences s.t.(1) P(S)
???S.P (?)(2) ??.P (?)? (?i.P(?..i)) - is a safety property no bad thing will
happen. - Conclusion a reference monitor cant enforce a
policy P unless its a safety property. In fact,
Schneider shows that reference monitors can (in
theory) implement any safety property.
9Safety vs. Security
- Safety is what we can implement, but is it what
we want? - lack of info. flow isnt a property.
- Safety ensures something bad wont happen, but it
doesnt ensure something good will eventually
happen - program will terminate
- program will eventually release the lock
- user will eventually make payment
- These are examples of liveness properties.
- policies involving availability arent safety
prop. - so a ref. monitor cant handle denial-of-service?
10Safety Is Nice
- Safety does have its benefits
- They compose if P and Q are safety properties,
then P Q is a safety property (just the
intersection of allowed traces.) - Safety properties can approximate liveness by
setting limits. e.g., we can determine that a
program terminates within k steps. - We can also approximate many other security
policies (e.g., info. flow) by simply choosing a
stronger safety property.
11Practical Issues
- In theory, a monitor could
- examine the entire history and the entire machine
state to decide whether or not to allow a
transition. - perform an arbitrary computation to decide
whether or not to allow a transition. - In practice, most systems
- keep a small piece of state to track history
- only look at labels on the transitions
- have small labels
- perform simple tests
- Otherwise, the overheads would be overwhelming.
- so policies are practically limited by the
vocabulary of labels, the complexity of the
tests, and the state maintained by the monitor.
12Reference Monitors Outline
- Analysis of the power and limitations.
- What is a security policy?
- What policies can reference monitors enforce?
- Traditional Operating Systems.
- Policies and practical issues
- Hardware-enforcement of OS policies.
- Software-enforcement of OS policies.
- Why?
- Software-Based Fault Isolation
- Inlined Reference Monitors
13Operating Systems circa 75
- Simple Model system is a collection of running
processes and files. - processes perform actions on behalf of a user.
- open, read, write files
- read, write, execute memory, etc.
- files have access control lists dictating which
users can read/write/execute/etc. the file. - (Some) High-Level Policy Goals
- Integrity one users processes shouldnt be able
to corrupt the code, data, or files of another
user. - Availability processes should eventually gain
access to resources such as the CPU or disk. - Secrecy? Confidentiality? Access control?
14What Can go Wrong?
- read/write/execute or change ACL of a file for
which process doesnt have proper access. - check file access against ACL
- process writes into memory of another process
- isolate memory of each process ( the OS!)
- process pretends it is the OS and execute its
code - maintain process ID and keep certain operations
privileged --- need some way to transition. - process never gives up the CPU
- force process to yield in some finite time
- process uses up all the memory or disk
- enforce quotas
- OS or hardware is buggy...
15Key Mechanisms in Hardware
- Translation Lookaside Buffer (TLB)
- provides an inexpensive check for each memory
access. - maps virtual address to physical address
- small, fully associative cache (8-10 entries)
- cache miss triggers a trap (see below)
- granularity of map is a page (4-8KB)
- Distinct user and supervisor modes
- certain operations (e.g., reload TLB, device
access) require supervisor bit is set. - Invalid operations cause a trap
- set supervisor bit and transfer control to OS
routine. - Timer triggers a trap for preemption.
16Steps in a System Call
Time
User Process
Kernel
calls ffopen(foo)
library executes break
saves context, flushes TLB, etc.
trap
checks UID against ACL, sets up IO buffers
file context, pushes ptr to context on users
stack, etc.
restores context, clears supervisor bit
calls fread(f,n,buf)
library executes break
saves context, flushes TLB, etc.
checks f is a valid file context, does disk
access into local buffer, copies results into
users buffer, etc.
restores context, clears supervisor bit
17Hardware Trends
- The functionality provided by the hardware hasnt
changed much over the years. Clearly, the raw
performance in terms of throughput has. - Certain trends are clear
- small large of registers 8 16-bit 128
64-bit - small large pages 4 KB 16 KB
- flushing TLB, caches is increasingly expensive
- computed jumps are increasingly expensive
- copying data to/from memory is increasingly
expensive - So a trap into a kernel is costing more over time.
18OS Trends
- In the 1980s, a big push for microkernels
- Mach, Spring, etc.
- Only put the bare minimum into the kernel.
- context switching code, TLB management
- trap and interrupt handling
- device access
- Run everything else as a process.
- file system(s)
- networking protocols
- page replacement algorithm
- Sub-systems communicate via remote procedure call
(RPC) - Reasons Increase Flexibility, Minimize the TCB
19A System Call in Mach
Time
User Process
Kernel
Unix Server
ffopen(foo)
break
saves context
checks capabilities,copies arguments
switches to Unixserver context
checks ACL, sets upbuffers, etc.
returns to user.
saves context
checks capabilities, copies results
restores users context
20Microkernels
- Claim was that flexibility and increased
assurance would win out. - But performance overheads were non-trivial
- Many PhDs on minimizing overheads of
communication - Even highly optimized implementations of RPC cost
2-3 orders of magnitude more than a procedure
call. - Result a backlash against the approach.
- Windows, Linux, Solaris continue the monolithic
tradition. - and continue to grow for performance reasons
(e.g., GUI) and for functionality gains (e.g.,
specialized file systems.) - Mac OS X, some embedded or specialized kernels
(e.g., Exokernel) are exceptions. VMware achieves
multiple personalities but has monolithic
personalities sitting on top.
21Performance Matters
- The hit of crossing the kernel boundary
- Original Apache forked a process to run each CGI
- could attenuate file access for sub-process
- protected memory/data of server from rogue script
- i.e., closer to least privilege
- Too expensive for a small script fork, exec,
copy data to/from the server, etc. - So current push is to run the scripts in the
server. - i.e., throw out least privilege
- Similar situation with databases, web browsers,
file systems, etc.
22The Big Question?
- From a least privilege perspective, many systems
should be decomposed into separate processes.
But if the overheads of communication (i.e.,
traps, copying, flushing TLB) are too great,
programmers wont do it. - Can we achieve isolation and cheap communication?
23Reference Monitors Outline
- Analysis of the power and limitations.
- What is a security policy?
- What policies can reference monitors enforce?
- Traditional Operating Systems.
- Policies and practical issues
- Hardware-enforcement of OS policies.
- Software-enforcement of OS policies.
- Why?
- Software-Based Fault Isolation
- Java Stack Inspection
- Inlined Reference Monitors
24Software Fault Isolation (SFI)
- Wahbe et al. (SOSP93)
- Keep software components in same hardware-based
address space. - Use a software-based reference monitor to isolate
components into logical address spaces. - conceptually check each read, write, jump to
make sure its within the components logical
address space. - hope communication as cheap as procedure call.
- worry overheads of checking will swamp the
benefits of communication. - Note doesnt deal with other policy issues
- e.g., availability of CPU
25One Way to SFI
- void interp(int pc, reg, mem, code, memsz,
codesz) while (true) - if (pc codesz) exit(1)
- int inst codepc, rd RD(inst), rs1
RS1(inst), - rs2 RS2(inst), immed IMMED(inst)
switch (opcode(inst)) - case ADD regrd regrs1 regrs2
break - case LD int addr regrs1 immed
- if (addr memsz) exit(1)
- regrd memaddr
- break
- case JMP pc regrd continue
- ...
-
- pc
0 add r1,r2,r3 1 ld r4,r3(12) 2 jmp r4
26Pros Cons of Interpreter
- Pros
- easy to implement (small TCB.)
- works with binaries (high-level
language-independent.) - easy to enforce other aspects of OS policy
- Cons
- terribly execution overhead (x25? x70?)
- but its a start.
27Partial Evaluation (PE)
- A technique for speeding up interpreters.
- we know what the code is.
- specialize the interpreter to the code.
- unroll the loop one copy for each instruction
- specialize the switch to the instruction
- compile the resulting code
- For a cool example of this, see Fred Smith's
thesis (hanging off my web page.)
28Example PE
Original Binary
Interpreter
0 add r1,r2,r3 1 ld r4,r3(12) 2 jmp r4 ...
while (true) if (pc codesz) exit(1) int
inst codepc ...
Resulting Compiled Code
- Specialized interpreter
- reg1 reg2 reg3
- addr reg3 12
- if (addr memsz) exit(1)
- reg4 memaddr
- pc reg4
-
0 add r1,r2,r3 1 addi r5,r3,12 2 subi
r6,r5,memsz 3 jab _exit 4 ld r4,r5(0) ...
29SFI in Practice
- Used a hand-written specializer or rewriter.
- Code and data for a domain in one contiguous
segment. - upper bits are all the same and form a segment
id. - separate code space to ensure code is not
modified. - Inserts code to ensure stores optionally loads
are in the logical address space. - force the upper bits in the address to be the
segment id - no branch penalty just mask the address
- may have to re-allocate registers and adjust
PC-relative offsets in code. - simple analysis used to eliminate unnecessary
masks - Inserts code to ensure jump is to a valid target
- must be in the code segment for the domain
- must be the beginning of the translation of a
source instruction - in practice, limited to instructions with labels.
30More on Jumps
- PC-relative jumps are easy
- just adjust to the new instructions offset.
- Computed jumps are not
- must ensure code doesnt jump into or around a
check or else that its safe for code to do the
jump. - for this paper, they ensured the latter
- a dedicated register is used to hold the address
thats going to be written so all writes are
done using this register. - only inserted code changes this value, and its
always changed (atomically) with a value thats
in the data segment. - so at all times, the address is valid for
writing. - works with little overhead for almost all
computed jumps.
31More SFI Details
- Protection vs. Sandboxing
- Protection is fail-stop
- stronger security guarantees (e.g., reads)
- required 5 dedicated registers, 4 instruction
sequence - 20 overhead on 1993 RISC machines
- Sandboxing covers only stores
- requires only 2 registers, 2 instruction sequence
- 5 overhead
- Remote Procedure Call
- 10x cost of a procedure call
- 10x faster than a really good OS RPC
- Sequoia DB benchmarks 2-7 overhead for SFI
compared to 18-40 overhead for OS.
32Questions
- What happens on the x86?
- small of registers
- variable-length instruction encoding
- What happens with discontiguous hunks of memory?
- What would happen if we really didnt trust the
extension? - i.e., check the arguments to an RPC?
- timeouts on upcalls?
- Does this really scale to secure systems?