Cristiana Amza - PowerPoint PPT Presentation

About This Presentation
Title:

Cristiana Amza

Description:

State-of-the-art in Game Code. Ad-hoc parallelization: segments/shards ... E.g., airports, doors. But, gamers said ... 'We want to interact, and we hate lag ! ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 75
Provided by: eecgTo
Category:

less

Transcript and Presenter's Notes

Title: Cristiana Amza


1
Transactional Memory and Dynamic Load Balancing
for Multiplayer Games
  • Cristiana Amza

University of Toronto
2
Once Upon a Time
  • locks were painful for dynamic and complex
    applications .
  • e.g., Massively Multiplayer Games

3
Massively Multiplayer Games
  • Support many concurrent players and
  • Low update interval to players

4
So, game developers said
  • Forget locks !
  • Well use our secret sauce !

5
State-of-the-art in Game Code
  • Ad-hoc parallelization segments/shards
  • e.g., World of Warcraft/ Ultima Online
  • Sequential code, admission control
  • e.g., Quake III

6
Ad-hoc Partitioning (segments)
  • Countries, rooms

7
Artificial Admission Control
  • Admission control
  • Gateways
  • E.g., airports, doors

8
But, gamers said
  • We want to interact, and we hate lag !

9
Problem with State-of-the-art
  • Flocking
  • Players move to one area e.g., quests
  • Overload the server hosting the hotspot

10
So I said
  • Forget painful locks !
  • Transactional Memory will make game developers
    and players happy !
  • Story endorsed by Intel (fall of 2006).

11
Our Goals
  • Parallelize server code into transactions
  • Easy to thread any game
  • Dynamic load balance of tx
  • on any platform
  • e.g., clusters, multi-cores, mobile devices
  • Beats locks any day !

12
Ideal solution Contiguous world
  • Seamless partition
  • Players can see across partition boundaries
  • Players can smoothly transfer
  • Regardless of game map

13
Challenge On Multi-core
  • Inter-thread conflicts
  • Mostly at the boundary

14
Roadmap
  • The game
  • Parallelization Using TM
  • Compiler code transformations for TM
  • Runtime TM design choices
  • Dynamic load balancing of tx in game

15
Game Benchmark (SimMud)
Players can move and interact
Terrain fixed, restricts movement
Food objects
  • Interactions player - Obj, player - player

16
Game Benchmark (SimMud)
  • Actions move, eat, fight
  • Quest flocking of players to a spot on the game
    map

17
Flocking in SimMud
Quest
S2
S1
S3
S4
18
Parallelization of Server Code
Select
Process Requests
2
Read-Write phase
Rx
Form Send Replies
3
Tx
Read-only phase
19
Example Move Request
  • Move()
  • region1-gtremovePlayer( p )
  • region2-gtaddPlayer( p )

20
Parallelize Move Request
  • Insert atomic keyword in code
  • Compiler makes it a transaction
  • Ex pragma omp critical / __tm_atomic
  • region1-gtremovePlayer( p )
  • region2-gtaddPlayer( p )

21
Ex SimMud Data Structure
  • struct Region
  • int x, y
  • int width, height
  • set_t players
  • set_t objects

22
Example Code for Action Move
  • void movePlayer( Player p, int new_x, int new_y
    )
  • Region r_old getRegion( p-gtx, p-gty )
  • Region r_new getRegion( new_x, new_y
    )
  • if( isVacant_position( r_new, new_x,
    new_y ) )
  • set_remove( r_old-gtplayers, p )
  • set_insert( r_new-gtplayers, p )
  • p-gtx new_x p-gty new_y

23
Manual Transformations (Locks)
  • void movePlayer( Player p, int new_x, int new_y
    )
  • lock_player( p)
  • Region r_old getRegion( p-gtx, p-gty )
  • Region r_new getRegion( new_x, new_y
    )
  • lock_regions( r_old, r_new )
  • if( isVacant_position( r_new, new_x,
    new_y ) )
  • set_remove( r_old-gtplayers, p )
  • set_insert( r_new-gtplayers, p )
  • p-gtx new_x p-gty new_y
  • unlock_regions( r_old, r_new )
  • unlock_player( p-gtlock )

24
Manual Transformations (TM)
  • void movePlayer( Player p, int new_x, int new_y
    )
  • pragma omp critical
  • Region r_old getRegion( p-gtx, p-gty )
  • Region r_new getRegion( new_x, new_y
    )
  • if( isVacant_position( r_new, new_x,
    new_y ) )
  • set_remove( r_old-gtplayers, p )
  • set_insert( r_new-gtplayers, p )
  • p-gtx new_x p-gty new_y

25
My Story
  • TM will make game developers and players happy !
  • So far, the developers should be !

26
It Gets Worse for Locks
  • Move
  • May impact objects within bounding box
  • Short-range or long-range
  • Lock all impacted objects
  • need to search objects

Top-view of world
Long-range
Short-range
Objects
27
e.g., Quake III Area Node Tree
Top-view of World
  • Each region corresponds to a leaf

27
28
e.g., Quake III Area Node Tree
Top-view of World
  • Each region corresponds to a leaf
  • Lock all leaf nodes in bounding box
  • atomically

Overlapping regions
28
29
Area Node Tree Even Worse !
  • Objects linked to leaf nodes
  • If cross leaf boundary, link to parent node

Top-view of world
Object lists
Non-Overlapping regions
Objects cross boundary
Region leafs
29
30
Area Node Tree Even Worse !
  • Need to lock parent nodes
  • False Sharing
  • The whole tree may be locked

Top-view of world
Object lists
Non-Overlapping regions
Objects cross boundary
Region leafs
30
31
My Story
  • TM will make game developers and players happy !
  • Lock down a whole box/tree, vs just read/write
    what you need in TM.
  • Players should be happy too !

32
Compiler/Runtime TM Support
  • Compiler
  • Automatic source transformations to tx
  • Runtime
  • track accesses
  • resolve conflicts between transactions
  • adapt to application pattern

33
Manual Transformations (TM)
  • void movePlayer( Player p, int new_x, int new_y
    )
  • i
  • pragma omp critical
  • Region r_old getRegion( p-gtx, p-gty )
  • Region r_new getRegion( new_x, new_y
    )
  • if( isVacant_position( r_new, new_x,
    new_y ) )
  • set_remove( r_old-gtplayers, p )
  • set_insert( r_new-gtplayers, p )
  • p-gtx new_x p-gty new_y

34
Automatic Transformations (TM)
  • void tm_movePlayer( tm_Player p, int new_x, int
    new_y )
  • Begin_transaction
  • tm_Region r_old tm_getRegion( p-gtx,
    p-gty )
  • tm_Region r_new tm_getRegion( new_x,
    new_y )
  • if( tm_isVacant_position( r_new, new_x,
    new_y ) )
  • tm_set_remove( r_old-gtplayers, p
    )
  • tm_set_insert( r_new-gtplayers, p
    )
  • p-gtx new_x p-gty new_y
  • Commit_transaction

35
Automatic Transformations (TM)
  • struct tm_Region
  • tm_int x, y
  • tm_int width, height
  • tm_set_t players //recursively
    re-type
  • tm_set_t objects //nested
    structures

36
Compiler TM code translation
  • pragma ? begin/end
  • Re-type variables tm_sharedltgt or tm_privateltgt

37
TM Runtime (libTM)
  • Access Tracking tm_typeltgt
  • Operator overloading for intercepting reads and
    writes
  • Access Granularity basic-type level
  • Conflict detection and resolution
  • Several design choices

38
TM Conflict Resolution Choices
  • Pessimistic
  • Reader/Writer Locks
  • Read Optimistic
  • Only writer locks
  • Fully Optimistic
  • No locks
  • Adaptive

39
Pessimistic
  • A transaction (tx) locks an object before use
  • Waits for locks held by other tx
  • Releases all locks at the end

40
Reader-writer locks
BEGIN
END
Reader lock excludes writers Writer lock excludes
readers/writers
41
Read Optimistic
  • Writers take locks, readers do not
  • A write invalidates (aborts) all readers
  • a) Encounter-time at the write

T1 BEGIN_TRANSACTION ... ...
... WRITE A ...
... COMMIT_TRANSACTION
T2 BEGIN_TRANSACTION READ A ...
... INVALID
T3 BEGIN_TRANSACTION ... READ A
... INVALID
42
Read Optimistic
  • Writers take locks, readers do not
  • A write invalidates (aborts) all readers
  • b) Commit-time at commit

T1 BEGIN_TRANSACTION ... ...
... WRITE A ...
... COMMIT_TRANSACTION
T2 BEGIN_TRANSACTION READ A ...
... ... COMMIT_TRANSACTION
T3 BEGIN_TRANSACTION ... READ A
... ... ... ...
INVALID
43
Fully Optimistic
  • A write invalidates (aborts) all active readers,
    but supports multiple writers
  • Commit-time at commit

T1 BEGIN_TRANSACTION ... ...
... WRITE A ...
... COMMIT_TRANSACTION
T2 BEGIN_TRANSACTION WRITE A ...
... ... COMMIT_TRANSACTION
T3 BEGIN_TRANSACTION ... READ A
... ... ... ...
INVALID
44
Implementation Details
  • Meta-data kept with tm_sharedltgt var
  • Lock, visible-readers set

45
Implementation Details
  • Validation of each read
  • Recoverability
  • Undo-logging
  • Write-buffering
  • Private thread data (needs to be searchable)
  • Necessary for fully optimistic

46
Factors Determining Trade-offs
  • Conflict type
  • w-w conflicts favor fully optimistic
  • Conflict-span
  • long ? domino-effect (no progress) for read
    optimistic

47
Evaluation of Design Trade-offs
No. of threads 4
48
Roadmap
  • The game
  • Parallelization Using TM
  • Compiler code transformations for TM
  • Runtime TM design choices
  • Dynamic load balancing of tx in game

49
Parallel Server Phase Types
Select
Load balancing
Process Requests
Read-Write phase
Rx
Form Send Replies
3
Read-only phase
Tx
50
Dynamic Load Management
  • Region grid unit
  • Dynamic load balancing
  • Reassign regions from one server/thread to
    another

51
Conflicts vs Load Management
  • Locality, fewer conflicts
  • Keep adjacent regions on same thread

Block partition
Global reshuffle
52
Overload due to Quest
53
Reassign Load Minimize Conflicts
54
Locality-Aware Load Balancing
SimMud game map with quest in upper left
Recorded dynamic load balancing
55
Dynamic Load-balancing Algorithms
  • Lightest
  • Shed regions to lightest loaded thread
  • Spread
  • Best load spread across all threads
  • Locality aware
  • Keep nearby regions on same thread

56
Locality-aware (Quad-tree)
  • Split task when
  • Load gt thresh
  • Reassign tasks
  • reduce conflicts
  • Can approximate !

57
Task Splitting
58
Task Re-assignment
  • Assign tasks to reduce conflicts
  • Keep
  • Load lt threshold

T1
T1
T0
T2
59
Dynamic Load-balancing Algorithms
  • All algorithms implemented on
  • A cluster (single thread on each node)
  • A multi-core (with multiple threads)

60
Results on Multi-core
  • Load balancing algorithms
  • Static
  • Lightest
  • Spread
  • Locality (Quad-tree)
  • Metrics
  • Number of clients per thread
  • Border conflicts
  • Client update latency

61
Thread Load on Multi-core
62
Border Conflicts on Multi-core
63
Client update latency on M-core
64
Conclusion
  • Support for seamless world partitioning
  • Compiler Runtime parallelization support
  • Tx much simpler than locks
  • Locality aware dynamic load balancing
  • Can apply in server clusters, P2P mobile
    environments and multi-cores

65
I need your help.
  • When TM first beat locks is a good story
  • I need a more sophisticated game to make the
    story happen !

66
Backup Slides
67
Client Update Latency on Cluster
STATIC
most loaded
least loaded
LOCALITY
  • All dynamic load balancing algs - similar

68
Number of Player Migrations
  • Locality aware has fewest migrations

69
Average Execution Time / Request(when App
changes access pattern)
70
Trade-offs
  • Private thread data
  • Per-thread data copy overhead (-)
  • Search private data on read (-)
  • No need to restore data on abort ()
  • Allows multiple concurrent writers ()

71
Trade-offs (contd)
  • Private thread data
  • Per-thread data copy overhead (-)
  • Search private data on read (-)
  • No need to restore data on abort ()
  • Allows multiple concurrent writers ()
  • Locks
  • Aborts due to deadlock (-)
  • No other aborts ()

72
A WAN distributed server system
Quest lasts during 0-1000 sec
73
TM code translation (cont.)
  • Based on Omni OpenMP compiler

74
Average Execution Time / Request
Write a Comment
User Comments (0)
About PowerShow.com