Title: POSH Python Object Sharing
1POSHPython Object Sharing
- Steffen Viken Valvåg
- In collaboration with
- Kjetil Jacobsen
- Åge Kvalnes
University of Tromsø, Norway Sponsored by Fast
Search Transfer
2Python Execution Model
3Python Threading Model
Thread A
Thread B
Byte codes
- Each thread executes a separate sequence of byte
codes
- All threads must contend for one global
interpreter lock
4Example Matrix Multiplication
- Performs a matrix multiplication A B x C
- The work is split between several worker threads
- The application runs on a machine with 8 CPUs
- Threads do not scale for multiple CPUs due to
lock contention on the GIL
5Workaround Processes
Process A
Process B
IPC
- Each process has its own interpreter lock
- Requires inter-process communication, using e.g.
message passing by means of pipes
6Matrix Multiplication using Processes
- A master process distributes the input matrices
to a set of worker processes - Each worker process computes some part of the
output matrix, and returns its result to the
master - The master process assembles the final result
matrix - More communication, and more complex pattern than
using threads
7Ways Ahead
- Communication through standard Python container
objects favors threads - Scalability on multiprocessor architectures
favors processes - The GIL is here to stay, so making threads scale
better is hard - However, there might be room for improvement of
inter-process communication mechanisms
8Using Shared Memory for IPC
Process B
Process A
Shared Memory
- Processes communicate by accessing a shared
memory region
- Requires explicit synchronization and data
marshalling, imposes a flat data structure
9Using POSH for IPC
Process B
Process A
Shared Memory
- Allocates regular Python objects in shared memory
- Shared objects are accessed transparently through
regular method calls
- IPC is done by modifying shared, mutable objects
10Complications
- Processes must synchronize their access to
shared, mutable objects (just like threads) - Explicit synchronization of critical regions must
be possible, while implicit synchronization upon
accessing shared objects is desireable - Pythons regular garbage collection algorithm is
inadequate for shared objects, which may be
referenced by multiple processes
11Proxy Objects
X.method1()
X
return value
Shared Object
Proxy Object
- Provides transparent access to a shared object by
forwarding all attribute accesses and method
calls - Provides a single entry point to a shared object,
where synchronization policies may be enforced
12Multi-Process Garbage Collection
- Must account for references from all live
processes - Must stay up-to-date when processes fork, as this
may create new references to shared objects - Should be able to handle abnormal process
termination without leaking shared objects
13Garbage Collection in POSH
Process A
Shared Memory
Process B
X
M
Y
L
Shared Object
Regular Python reference
Reference from a process to a shared object (type
I)
Reference from one shared object to another (type
II)
14Garbage Collection Details
- POSH creates at most one proxy object per process
for any given shared object - Shared objects are always referenced through
their proxy objects - A bitmap in each shared object records the
processes that have a corresponding proxy object.
This tracks references of type I (from a
process) - A separate count in each shared object records
the number of references to the object from other
shared objects. This tracks references of type
II - Shared objects are deleted when there are no
references to them of either type
15Performance
- Performing a matrix multiplication A B x C
using POSH - The work is split between several worker
processes - The application runs on a machine with 8 CPUs
- More overhead, but scales for multiple CPUs
16Summary
- Python uses a global interpreter lock (GIL) to
serialize execution of byte codes - This entails a lack of scalability on
multiprocessor architectures for CPU-intensive
multi-threaded apps - However, threads offer an attractive programming
model, with implicit communication - Processes shared memory reduce IPC overheads,
but normally impose flat data structures and
require data marshalling - POSH uses processes shared memory to offer a
programming model similar to threads, with the
scalability of processes
17Availability
- Open source, hosted at SourceForge
- http//poshmodule.sf.net/
- Still not very stable
- Developers wanted
18Example Usage
import posh class Stuff(object)
pass posh.allow_sharing(Stuff,
posh.generic_init) mystuff posh.share(Stuff())
def worker1() mystuff.money 0 def
worker2() mystuff.debt 100000 def
worker3() mystuff.balance mystuff.money -
mystuff.debt for w in worker1, worker2,
worker3 posh.forkcall(w) posh.waitall()