- PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Description:

... accessed using Stack pointer with a bit mask! Sender Kernel Stack Access ... Thread number in lower 32-bits of UID. AND with bit mask & ADD to TCB's array base ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 37

Provided by: Kart158

Category:

Tags:

more less

Transcript and Presenter's Notes

Title:

1

Improving IPC by kernel design
- J. Liedtke
German National Research Center
for Computer Science
Presented by
Karthik Chandrasekar

2
Introduction

µ Kernel performs
Memory Management - Virtual Address Space
Thread Management
IPC
IPC lends the following
Modularity
Security
Scalability

3
Related Work

Mach
RPC oriented IPC
SRC - RPC
Special path for Context Switching
Shared memory buffers
LRPC
Simple stubs
Direct Context Switching
Shared memory buffers

4
Approach Needed

Synergetic approach in Design and Implementation
guided by IPC requirements
Architectural Level
Algorithm Level
Interface Level
Coding Level

5
L3 Operating System

Data type - Task
Threads ? Communicate via messages using Task and
Thread Ids (Even for device drivers and
Interrupts - delivered by µ Kernel)
Data Space ? Address Space
Persistence of Data and Threads
Clans and Chiefs Model for Message Integrity

6
Principles and Methods for IPC Improvement

Reconstruction of the following in L3
Process Control
Communication
Newer Implementation of the following in L3
IPC
Thread Management
Architecture
Principles
IPC performance is the key
Design decisions require a performance discussion
Poor performs punished
Synergetic effects
Synergy at all levels

7
Performance Metrics defined using Null IPC

Achieved 250 cycles (5 µs)

8
Architectural Level

System Calls - Kernel Mode
Messages
Direct Transfer by Temporary Mapping
Strict Process Orientation
Control Blocks as Virtual Objects

9
Architectural Level - System Calls

I. System Calls - Kernel Mode (40)
Call
Reply Receive Next
Instead of
Call
Reply
Send
Receive
No need for scheduling to handle replies
differently from
requests!

10
Architectural Level - Messages

II. Messages (20)
Sequence of Send operations can be combined, if
no intermediate
reply is required!
Eg. Text to Screen Driver
Operation Code
Co-ordinates
Text String specified by address and length

11
Architectural Level - Messages

User Level Sender and Receiver Buffer Structures
are same
Sending a Complex Message maintaining program
variables

12
Architectural Level - Transfer

III. Direct Transfer by Temporary Mapping
Refer the senders communication window at the
target region of the dest. address space and then
copy the message into it! (Exists per address
space only kernel accessible)
Issues
Mapping must be fast
Parallel Threads must co-exist in the same
address space

13
Architectural Level - Transfer

1024 entry in page directory - each corresponds
to a
1024-entry 2nd level table ? 4 GB
1 word copy from B to A refers to 4 MB of
information in
1024-entry 2nd level table
Data page size 4 KB

14
Architectural Level - Transfer

TLB Flush is mandatory for proper mapping - data
integrity!
Approaches to maintain Clean TLB
One thread per address space process
Clean TLB
Thread access - TLB Clean
Transfer (Copy) - TLB Clean
Address space switch after copying in an IPC -
use TLB Window Clean
Multiple threads per address space! - Using one
Window!!
TLB is flushed when thread switching does not
affect address space
Invalidate communication window values (will lead
to a page fault)
Multiprocessor One window per address space per
processor
Different Processors Spl. TLB flush for multiple
address space support

15
Architectural Level - Process

IV. Strict Process Orientation
Allocate one kernel stack per thread! - No stack
switching or copying - as in Continuations
Cheaper - In Virtual Memory
Also supports Thread Control Blocks (TCBs)
Thread Control Block (TCB) information has
A pointer so it can be chained into a linked
list
Value of its stack pointer
A stack area that includes local variables
Thread number, type, priority and name
Age and resources granted.

16
Architectural Level - TCB

V. Control Blocks as Virtual Objects
Virtual array to hold TCBs
Faster access array base tcb no. tcb size
Saves 3 TLB Misses
Direct Access to Destination TCB
Kernel Stack is accessed using Stack pointer with
a bit mask!
Sender Kernel Stack Access
Receiver Kernel Stack Access
Applicable to Page Directories
my pdir window pdir destbuffer gtgt 22
my pdir window 1 pdir dest(buffer gtgt
22) 1

17
Algorithmic Level

Thread Identifier
Virtual Queues
Timeouts and Wakeups
Lazy Scheduling
Direct Process Switch
Short Messages Via Registers

18
Algorithmic Level - Thread Identifier

Thread addressed by 64-bit UID in user-mode
Thread Number
Generation
Station Number
Chief Id
Thread number in lower 32-bits of UID
AND with bit mask ADD to TCBs array base
Check for Validity
Thread UID with Requested UID - 4 cycles

19
Algorithmic Level - Virtual Queues

Busy queue, present queue and a polling-me queue
per thread
Doubly linked lists, where the links are held in
the TCBs
TCBs are chained in virtual address space, but
parsing the chains and inserting or deleting TCBs
will never lead to page faults - unmapping TCBs
upon parsing!

20
Algorithmic Level - Timeouts and Wakeups

The frequently used values t 8 and t 0
Wakeups
Array indexed by thread number - but sequential
A set of n unordered wakeup lists implemented by
doubly linked lists. If a thread is entered with
wakeup time t, its tcb is linked into the list t
mod n.
For a total of k threads, scheduler will have to
inspect k/n entries per clock interrupt, on
average.
Wakeup point is far in the future - long time
wakeup list
n8, wakeup time 4 ms ? 400 threads ? 12500
IPC/sec
? 1 of total (6 is IPC ? 16), but 25 of IPC
use wakeups ? 50,000 IPC/sec! ? (1.5 IPC ? 4)
Base Offset to represent time ? every 224
offset updated

21
Algorithmic Level - Lazy Scheduling

IPC operation call or reply receive next
Delete sending thread from ready queue
Insert into waiting queue
Delete receiving thread from waiting queue
Insert into ready queue
L3 queue invariants
Ready queue contains all ready threads
Waiting queue contains at least all threads
waiting
TCB contains threads state (ready/waiting) -
updated!
Scheduler removes all threads not belonging to
queue during queue parsing - delete operation can
be omitted!
Call reply receive next - need not be
inserted!
Performs better with increasing IPC rate!

22
Algorithmic Level Direct Process Switch

When B sends a reply to A and another thread C is
waiting to send a message to B (polling B), C's
IPC to B is immediately initiated before
continuing A.
When multiple threads try to send messages to one
receiver, it will get the messages in the
sequence in which the IPC operations were invoked

23
Algorithmic Level - Short Messages Via Registers

High proportion of messages are short
Ex. Driver ack/error, hardware interrupts
486
7 general registers
3 needed sender ID, result code
4 available
8-byte messages using coding scheme

24
Interface Level

Simple and short RPC stubs
Load registers, Issue system call, check success
Compiler generates stubs inline
Avoiding Unnecessary Copies
Complex Messages not arbitrarily mixed
Similar structuring helps in parsing and tracing
Sharing common variables
Parameter Passing
Use registers when possible
More efficient than stacks
Support Better code optimization

25
Coding Level

Reduce cache and TLB misses
Short kernel code
Use Short jumps, use registers, short address
displacements
Frequently accessed data - 1 byte displacement
Data frequently used together - same cache line
and access sequence
IPC related kernel code in 1 page - else TLB
flush
Internal tables should be with the data - same
page
(at least the heavily used entries) - 4 TLB
Miss avoided!
Handle save/restore of coprocessor lazily
Delayed until different thread needs to use it

26
Coding Level

Segment Registers and General Registers
Take 9 cycles for loading - Instead use one flat
segment covering the entire address space!
Loading flat descriptor for every segment
register - 66 cycles
Checking on entry and loading before returning
-10 cycles
Fit message in one 32-bit register
Counter accessed through 8-bit register

27
Coding Level