Title: Cristiana Amza
1Transactional Memory and Dynamic Load Balancing
for Multiplayer Games
University of Toronto
2Once Upon a Time
- locks were painful for dynamic and complex
applications . - e.g., Massively Multiplayer Games
3Massively Multiplayer Games
- Support many concurrent players and
- Low update interval to players
4So, game developers said
- Forget locks !
- Well use our secret sauce !
5State-of-the-art in Game Code
- Ad-hoc parallelization segments/shards
- e.g., World of Warcraft/ Ultima Online
- Sequential code, admission control
- e.g., Quake III
6Ad-hoc Partitioning (segments)
7Artificial Admission Control
- Admission control
- Gateways
- E.g., airports, doors
8But, gamers said
- We want to interact, and we hate lag !
-
9Problem with State-of-the-art
- Flocking
- Players move to one area e.g., quests
- Overload the server hosting the hotspot
10So I said
- Forget painful locks !
- Transactional Memory will make game developers
and players happy ! - Story endorsed by Intel (fall of 2006).
-
11Our Goals
- Parallelize server code into transactions
- Easy to thread any game
- Dynamic load balance of tx
- on any platform
- e.g., clusters, multi-cores, mobile devices
- Beats locks any day !
12Ideal solution Contiguous world
- Seamless partition
- Players can see across partition boundaries
- Players can smoothly transfer
- Regardless of game map
13Challenge On Multi-core
- Inter-thread conflicts
- Mostly at the boundary
14Roadmap
- The game
- Parallelization Using TM
- Compiler code transformations for TM
- Runtime TM design choices
- Dynamic load balancing of tx in game
15Game Benchmark (SimMud)
Players can move and interact
Terrain fixed, restricts movement
Food objects
- Interactions player - Obj, player - player
16Game Benchmark (SimMud)
- Actions move, eat, fight
- Quest flocking of players to a spot on the game
map
17Flocking in SimMud
Quest
S2
S1
S3
S4
18Parallelization of Server Code
Select
Process Requests
2
Read-Write phase
Rx
Form Send Replies
3
Tx
Read-only phase
19Example Move Request
- Move()
- region1-gtremovePlayer( p )
- region2-gtaddPlayer( p )
-
20Parallelize Move Request
- Insert atomic keyword in code
- Compiler makes it a transaction
- Ex pragma omp critical / __tm_atomic
-
- region1-gtremovePlayer( p )
- region2-gtaddPlayer( p )
-
21Ex SimMud Data Structure
- struct Region
-
- int x, y
- int width, height
- set_t players
- set_t objects
22Example Code for Action Move
- void movePlayer( Player p, int new_x, int new_y
) -
- Region r_old getRegion( p-gtx, p-gty )
- Region r_new getRegion( new_x, new_y
) - if( isVacant_position( r_new, new_x,
new_y ) ) -
- set_remove( r_old-gtplayers, p )
- set_insert( r_new-gtplayers, p )
- p-gtx new_x p-gty new_y
-
23Manual Transformations (Locks)
- void movePlayer( Player p, int new_x, int new_y
) -
- lock_player( p)
- Region r_old getRegion( p-gtx, p-gty )
- Region r_new getRegion( new_x, new_y
) -
- lock_regions( r_old, r_new )
- if( isVacant_position( r_new, new_x,
new_y ) ) -
- set_remove( r_old-gtplayers, p )
- set_insert( r_new-gtplayers, p )
- p-gtx new_x p-gty new_y
-
- unlock_regions( r_old, r_new )
- unlock_player( p-gtlock )
24Manual Transformations (TM)
- void movePlayer( Player p, int new_x, int new_y
) -
- pragma omp critical
- Region r_old getRegion( p-gtx, p-gty )
- Region r_new getRegion( new_x, new_y
) - if( isVacant_position( r_new, new_x,
new_y ) ) -
- set_remove( r_old-gtplayers, p )
- set_insert( r_new-gtplayers, p )
- p-gtx new_x p-gty new_y
-
-
25My Story
- TM will make game developers and players happy !
- So far, the developers should be !
26It Gets Worse for Locks
- Move
- May impact objects within bounding box
- Short-range or long-range
- Lock all impacted objects
- need to search objects
Top-view of world
Long-range
Short-range
Objects
27e.g., Quake III Area Node Tree
Top-view of World
- Each region corresponds to a leaf
27
28e.g., Quake III Area Node Tree
Top-view of World
- Each region corresponds to a leaf
- Lock all leaf nodes in bounding box
- atomically
Overlapping regions
28
29Area Node Tree Even Worse !
- Objects linked to leaf nodes
- If cross leaf boundary, link to parent node
Top-view of world
Object lists
Non-Overlapping regions
Objects cross boundary
Region leafs
29
30Area Node Tree Even Worse !
- Need to lock parent nodes
- False Sharing
- The whole tree may be locked
Top-view of world
Object lists
Non-Overlapping regions
Objects cross boundary
Region leafs
30
31My Story
- TM will make game developers and players happy !
- Lock down a whole box/tree, vs just read/write
what you need in TM. - Players should be happy too !
32Compiler/Runtime TM Support
- Compiler
- Automatic source transformations to tx
- Runtime
- track accesses
- resolve conflicts between transactions
- adapt to application pattern
33Manual Transformations (TM)
- void movePlayer( Player p, int new_x, int new_y
) - i
- pragma omp critical
- Region r_old getRegion( p-gtx, p-gty )
- Region r_new getRegion( new_x, new_y
) - if( isVacant_position( r_new, new_x,
new_y ) ) -
- set_remove( r_old-gtplayers, p )
- set_insert( r_new-gtplayers, p )
- p-gtx new_x p-gty new_y
-
-
34Automatic Transformations (TM)
- void tm_movePlayer( tm_Player p, int new_x, int
new_y ) -
- Begin_transaction
- tm_Region r_old tm_getRegion( p-gtx,
p-gty ) - tm_Region r_new tm_getRegion( new_x,
new_y ) - if( tm_isVacant_position( r_new, new_x,
new_y ) ) -
- tm_set_remove( r_old-gtplayers, p
) - tm_set_insert( r_new-gtplayers, p
) - p-gtx new_x p-gty new_y
-
- Commit_transaction
35Automatic Transformations (TM)
- struct tm_Region
-
- tm_int x, y
- tm_int width, height
- tm_set_t players //recursively
re-type - tm_set_t objects //nested
structures
36Compiler TM code translation
- pragma ? begin/end
- Re-type variables tm_sharedltgt or tm_privateltgt
37TM Runtime (libTM)
- Access Tracking tm_typeltgt
- Operator overloading for intercepting reads and
writes - Access Granularity basic-type level
- Conflict detection and resolution
- Several design choices
38TM Conflict Resolution Choices
- Pessimistic
- Reader/Writer Locks
- Read Optimistic
- Only writer locks
- Fully Optimistic
- No locks
- Adaptive
39Pessimistic
- A transaction (tx) locks an object before use
- Waits for locks held by other tx
- Releases all locks at the end
40Reader-writer locks
BEGIN
END
Reader lock excludes writers Writer lock excludes
readers/writers
41Read Optimistic
- Writers take locks, readers do not
- A write invalidates (aborts) all readers
- a) Encounter-time at the write
T1 BEGIN_TRANSACTION ... ...
... WRITE A ...
... COMMIT_TRANSACTION
T2 BEGIN_TRANSACTION READ A ...
... INVALID
T3 BEGIN_TRANSACTION ... READ A
... INVALID
42Read Optimistic
- Writers take locks, readers do not
- A write invalidates (aborts) all readers
- b) Commit-time at commit
T1 BEGIN_TRANSACTION ... ...
... WRITE A ...
... COMMIT_TRANSACTION
T2 BEGIN_TRANSACTION READ A ...
... ... COMMIT_TRANSACTION
T3 BEGIN_TRANSACTION ... READ A
... ... ... ...
INVALID
43Fully Optimistic
- A write invalidates (aborts) all active readers,
but supports multiple writers - Commit-time at commit
T1 BEGIN_TRANSACTION ... ...
... WRITE A ...
... COMMIT_TRANSACTION
T2 BEGIN_TRANSACTION WRITE A ...
... ... COMMIT_TRANSACTION
T3 BEGIN_TRANSACTION ... READ A
... ... ... ...
INVALID
44Implementation Details
- Meta-data kept with tm_sharedltgt var
- Lock, visible-readers set
45Implementation Details
- Validation of each read
- Recoverability
- Undo-logging
- Write-buffering
- Private thread data (needs to be searchable)
- Necessary for fully optimistic
46Factors Determining Trade-offs
- Conflict type
- w-w conflicts favor fully optimistic
- Conflict-span
- long ? domino-effect (no progress) for read
optimistic
47Evaluation of Design Trade-offs
No. of threads 4
48Roadmap
- The game
- Parallelization Using TM
- Compiler code transformations for TM
- Runtime TM design choices
- Dynamic load balancing of tx in game
49Parallel Server Phase Types
Select
Load balancing
Process Requests
Read-Write phase
Rx
Form Send Replies
3
Read-only phase
Tx
50Dynamic Load Management
- Region grid unit
- Dynamic load balancing
- Reassign regions from one server/thread to
another
51Conflicts vs Load Management
- Locality, fewer conflicts
- Keep adjacent regions on same thread
Block partition
Global reshuffle
52Overload due to Quest
53Reassign Load Minimize Conflicts
54 Locality-Aware Load Balancing
SimMud game map with quest in upper left
Recorded dynamic load balancing
55Dynamic Load-balancing Algorithms
- Lightest
- Shed regions to lightest loaded thread
- Spread
- Best load spread across all threads
- Locality aware
- Keep nearby regions on same thread
56Locality-aware (Quad-tree)
- Split task when
- Load gt thresh
- Reassign tasks
- reduce conflicts
- Can approximate !
57Task Splitting
58Task Re-assignment
- Assign tasks to reduce conflicts
- Keep
- Load lt threshold
T1
T1
T0
T2
59Dynamic Load-balancing Algorithms
- All algorithms implemented on
- A cluster (single thread on each node)
- A multi-core (with multiple threads)
60Results on Multi-core
- Load balancing algorithms
- Static
- Lightest
- Spread
- Locality (Quad-tree)
- Metrics
- Number of clients per thread
- Border conflicts
- Client update latency
61Thread Load on Multi-core
62Border Conflicts on Multi-core
63Client update latency on M-core
64Conclusion
- Support for seamless world partitioning
- Compiler Runtime parallelization support
- Tx much simpler than locks
- Locality aware dynamic load balancing
- Can apply in server clusters, P2P mobile
environments and multi-cores
65I need your help.
- When TM first beat locks is a good story
- I need a more sophisticated game to make the
story happen !
66Backup Slides
67Client Update Latency on Cluster
STATIC
most loaded
least loaded
LOCALITY
- All dynamic load balancing algs - similar
68Number of Player Migrations
- Locality aware has fewest migrations
69Average Execution Time / Request(when App
changes access pattern)
70Trade-offs
- Private thread data
- Per-thread data copy overhead (-)
- Search private data on read (-)
- No need to restore data on abort ()
- Allows multiple concurrent writers ()
71Trade-offs (contd)
- Private thread data
- Per-thread data copy overhead (-)
- Search private data on read (-)
- No need to restore data on abort ()
- Allows multiple concurrent writers ()
- Locks
- Aborts due to deadlock (-)
- No other aborts ()
72A WAN distributed server system
Quest lasts during 0-1000 sec
73TM code translation (cont.)
- Based on Omni OpenMP compiler
74Average Execution Time / Request