Title: Parallel Rendering Immediate Mode
1Parallel Rendering(Immediate Mode)
Tamer Fahmy
2Overview
- Immediate / retained mode
- Why immediate mode rendering
- Different kinds of application types
- Sorting algorithms
- Display reassembly in hard and software
- A working example Chromium
- Resources
3(No Transcript)
4Immediate mode
- Application itself maintains the data that
describes a model - Model data is immediately available and not
duplicated by graphics system
5Retained mode
- Retains a copy of all the data describing a model
- Requires to completely specify a model by passing
model data to the system using predefined data
structures
6Why immediate mode clustering
- Off the shelves cheap standard components
- Scalability
- Can be easily upgraded as technology improves
7Different kinds of applications
- Compute limited
- Graphics limited
- Geometry limited
- Pixel fill limited
- Interface limited
- Resolution limited
8Sorting algorithms
- Sort-first
- Sort-last
- Sort-middle
- Hybrid sort-first sort-last
9Sorting algorithms
10Sort-first
- Screen space is partitioned into non-overlapping
2D tiles - Each tile rendered independently by a PC graphics
card
11Sort-first
- communication requirements are relatively
small - - extra work must be done to transform graphics
primitives - - graphics primitives are rendered redundantly
if they overlap multiple tiles
12Sort-last
- Each processor renders a separate image
containing a portion of the graphics primitives - Resulting images are composited (with depth
comparisons)
13Sort-last
- scalability
- - requires an image composition network with
very high bandwidth and processing capabilities - - no strict primitive ordering semantics
14Sort-middle
- Processing of graphics primitives is partitioned
equally among geometry processors - Processing of pixels is partitioned among
rasterization processors
15Sort-middle
- - no high performance access to the results of
geometry processing by graphics accelerators - - network communication performance is currently
too slow
16Hybrid Sort-first Sort-last
- Dynamically partitions both the 2D screen into
tiles and 3D polygons into groups - 3 to 4 times better than sort-first
- 33\ to 55\ better than sort-last
17Hardware (Lightning-2)
- 4 Digital Visual Interface (DVI) (digital
scan-out of the framebuffer) inputs - 8 DVI outputs
- A pixel bus'' for more inputs
- Can be chained to provide more outputs
- no overhead
- - special hardware needed
18Display reassembly in hard- and software
19Software
- cheap and no special hardware needed
- - network overhead
- - pixel read perfomance of framebuffer
20A working example Chromium
- Features
- Synchronization primitives
- SPU's (stream processing units)
- Configuration
21A working example Chromium
- Features
- Sort-first, sort-last, hybrid parallel rendering
- OpenGL command stream filtering
- Existing OpenGL programs can be used without
modification - Special synchronization primitives
- Runs on Linux, IRIX and Windows-based systems
- An open-source project
22A working example Chromium
- Synchronization primitives
- Problem OpenGL has ordered semantics
- Each parallel process is responsible for a model
as if it were the only process in the world - Barriers
- Semaphores
23A working example Chromium
- SPU's (stream processing units)
- OpenGL Stream Processing Units''
- Implemented as dynamically loadable modules
- Can be chained
- Some SPU's tilesort, readback, passthrough, nop,
hiddenline, etc.
24A working example Chromium
25A working example Chromium
26A working example Chromium
- Configuration
- Using Python(!) scripts
- Configuration mothership
- components need to know what to do
- controlled by scripts
- crappfaker, crserver
27A working example Chromium
- Part of an configuration script in Python
- import sys
- sys.path.append('../server')
- from mothership import
- ...
- server_spu SPU('render')
- client_spu SPU(clientspuname)
- server_spu.Conf('window_geometry', 100, 100, 500,
500) - server_node CRNetworkNode()
- server_node.AddSPU(server_spu)
- if (clientspuname 'tilesort')
- server_node.AddTile(0, 0, 500, 500)
28A working example Chromium (cont.)
- client_node CRApplicationNode()
- client_node.AddSPU(client_spu)
- client_spu.AddServer(server_node, 'tcpip')
- ...
- cr CR()
- cr.MTU(161024)
- cr.AddNode(client_node)
- cr.AddNode(server_node)
- cr.Go()
29A working example Chromium
30Resources (1/3)
- WireGL A Scalable Graphics System for Clusters
http//graphics.stanford.edu/papers/wiregl/clust\_
papi.pdf - Chromium A Stream Processing Framework for
Interactive Rendering on Clusters
http//graphics.stanford.edu/papers/cr/cr\_lowqual
ity.pdf - Hybrid Sort-First and Sort-Last Parallel
Rendering with a Cluster of PCs
http//www.cs.princeton.edu/rudro/gh2k.pdf
31Resources (2/3)
- Parallel Texture Caching http//graphics.stanfor
d.edu/papers/parallel\_texture/parallel\_texture.p
df - Prefetching in a Texture Cache Architecture
http//graphics.stanford.edu/papers/texture\_prefe
tch/texture\_prefetch\_down.pdf - Lightning-2 A High-Performance Display
Subsystem http//graphics.stanford.edu/papers/lig
htning2/lightning2.pdf
32Resources (3/3)
- Retained and Immediate Modes http//developer.ap
ple.com/techpubs/quicktime/qtdevdocs/QD3D/qd3dintr
oduction.7.htm - High Performance Parallel Rendering on a PC
Cluster http//www.cs.princeton.edu/rudro/cluste
r-rendering/