Title: Computing Architectures for Virtual Reality
1Computing Architectures for Virtual Reality
Electrical and Computer Engineering Dept.
2 Computer (rendering pipeline)
System architecture
3 Computing Architectures
The VR Engine
Definition A key component of the VR system
which reads its input devices, accesses
task-dependent databases, updates the state of
the virtual world and feeds the results to the
output displays. It is an abstraction it can
mean one computer, several co-located cores in
one computer, several co-located computers, or
many remote computers collaborating in a
distribute simulation
4 Computing Architectures
- The real-time characteristic of VR requires a VR
engine - which is powerful in order to assure
- fast graphics and haptics refresh rates (30 fps
for graphics and - hundreds of Hz for haptics)
- low latencies (lt100 ms to avoid simulation
sickness) - at the core of such architecture is the
rendering pipeline. - within the scope of this course rendering is
extended to include - haptics
5 Computing Architectures
The Graphics Rendering Pipeline
The process of creating a 2-D scene from a 3-D
model is called rendering. The rendering
pipeline has three functional stages. The speed
of the pipeline is that of its slowest stage.
6 The Graphics Rendering Pipeline
Old rendering pipelines were done in software
(slow) Modern pipeline architecture uses
parallelism and buffers. The application stage
is implemented in software, while the other
stages are hardware-accelerated.
7 - Modern pipelines also do anti-aliasing for
points, lines or the whole scene
Aliased polygons (jagged edges)
Anti-aliased polygons
8 - How is anti-aliasing done? Each pixel is
subdivided - (sub-sampled) in n regions, and each sub-pixel
has a color
The anti-aliased pixel is given a shade of
green-blue (5/16 blue 11/16 green). Without
sub-sampling the pixel would have been entirely
green the color of the center of the pixel
(from Wildcat manual)
9 - More samples produce better anti-aliasing
8 sub-samples/pixel
16 sub-samples/pixel
From Wildcat SuperScene manual
http//62.189.42.82/product/technology/superscene
_antialiasing.htm
10 Ideal vs. real pipeline output (fps) vs. scene
complexity (Influence of pipeline
bottlenecks)
HP 9000 workstation
11 Computing Architectures
The Rendering Pipeline
12 - The application stage
- Is done entirely in software by the CPU
- It reads Input devices (such as gloves, mouse)
- It changes the coordinates of the virtual
camera - It performs collision detection and collision
- response (based on object properties) for
haptics - One form of collision response if force feedback.
13- Application stage optimization
- Reduce model complexity (models with less
polygons less to feed down the pipe)
Higher resolution model 134,754 polygons.
Low res. Model 600 polygons
14 - Application stage optimization
- Reduce floating point precision (single
precision - instead of double precision)
- minimize number of divisions
- Since all is done by the CPU, to increase
- speed a dual-processor (super-scalar)
architecture - is recommended.
15 Computing Architectures
The Rendering Pipeline
Rendering pipeline
16 - The geometry stage
- Is done in hardware
- Consists first of model and view transforms
- (to be discussed in Chapter 5)
- Next the scene is shaded based on light models
- Finally the scene is projected, clipped, and
- mapped to the screen coordinates.
17 - The lighting sub-stage
- It calculates the surface color based on
- type and number of simulated light sources
- the lighting model
- the reflective surface properties
- atmospheric effects such as fog or smoke.
- Lighting results in object shading which makes
- the scene more realistic.
18 Computing architectures
I? Ia? Ka Od? fatt Ip? Kd
Od?cos? Ks Os?cosna
where I? is the intensity of light of wavelength
? Ia? is the intensity of ambient
light Ka is the surface ambient
reflection coefficient Od? is the
object diffuse color fatt is the
atmospheric attenuation factor Ip? is
the intensity of point light source of
wavelength ? Kd is the
diffuse reflection coefficient Ks
is the specular reflection coefficient
Os? is the specular color
19 - The lighting sub-stage optimization
- It takes less computation for fewer lights
- in the scene
- The simpler the shading model, the less
- computations (and less realism)
- Wire-frame models
- Flat shaded models
- Gouraud shaded
- Phong shaded.
20 - The lighting models
- Wire-frame is simplest only shows polygon
- visible edges
- The flat shaded model assigns same color to all
- pixels on a polygon (or side) of the object
- Gouraud or smooth shading interpolates colors
- Inside the polygons based on the color of the
edges - Phong shading interpolates the vertex normals
- before calculating the light intensity based on
the - model described most realistic shading model.
21 Computing architectures
Wire-frame model
Flat shading model
Gouraud shading model
22- The rendering speed vs. surface polygon type
- The way surfaces are described influences
rendering speed. - If surfaces are described by triangle meshes,
the rendering will - be faster than for the same object described by
independent - quadrangles or higher-order polygons. This is due
to the - graphics board architecture which may be
optimized to render - triangles.
- Example the rendering speed of SGI Reality
Engine.
23 SGI Onyx 2 with Infinite Reality
24 Computing Architectures
The Rendering Pipeline
25 - The Rasterizer Stage
- Performs operations in hardware for speed
- Converts 2-D vertices information from the
- geometry stage (x,y,z, color, texture) into pixel
- information on the screen
- The pixel color information is in color buffer
- The pixel z-value is stored in the Z-buffer (has
- same size as color buffer)
- Assures that the primitives that are visible
from - the point of view of the camera are displayed.
26 - The Rasterizer Stage - continued
- The scene is rendered in the back buffer
- It is then swapped with the front buffer which
- stores the current image being displayed
- This process eliminates flicker and is called
- double buffering
- All the buffers on the system are grouped into
the - frame buffer.
27 - Testing for pipeline bottlenecks
- If CPU operates at 100 then the pipeline is
- CPU-limited (bottleneck in application stage)
- If the performance increases when all light
- sources are removed, then the pipeline is
- transform-limited (bottleneck in geometry
stage) - If the performance increases when the resolution
- of the display window, or its size are reduced
- then the pipeline is fill-limited (bottleneck
in - rasterizer stage).
28 Transform-limited (reduce level of detail)
Fill-limited (increase realism)
29 The Pipeline Balancing
Single buffering
Application (75)
Geometry (75)
Rasterizer (100)
Double buffering, balanced pipeline
Application (90)
Geometry (95)
Rasterizer (100)
30 Computing Architectures
The Haptics Rendering Pipeline
The process of computing the forces and
mechanical textures Associated with haptic
feedback. Is done is software and in hardware.
Has three stages too.
31PC graphics architecture PC is King!
- Went from 66 MHz Intel 486 in 1994 to 3.6 GHz
Pentium IV today - Newer PC CPUs are dual (or quad) core improves
performance by 50 - Went from 7,000 G-shaded poly./sec (Spea Fire
board) in 1994 to 27 Mil G-shaded poly/sec. (Fire
GL 2 used to be in our lab) - Today PCs are used for single or multiple users,
single - or tiled displays
- Intensely competitive industry.
32PC bus architecture just as important
- Went from 33 MHz Peripheral Component
Interface - (PCI) bus to 264 MHz Accelerated Graphics Port
- (AGP4x) bus, and doubled again in the AGP8x
- Larger throughput and lower latency since
address bus - lines decoupled from data lines. AGP uses
sideband lines
33 Intel 820/850 chipset
Graphics Accelerator (memory processors
AGP 8x rate 2 GBps
unidirectional 533 MHz x 32 bit/sec
PCI transfer rate 133 MBps 33 MHz x 32
bit/sec PCI Express rate 4 GBps
bidirectional
Todays PC system architecture
34 PC system architecture for the VR Teaching Lab
35 PC system architecture for VR Teaching Lab
36 Fire GL 2
Stereo glasses connector
Passive coolers
AGP bus connector
37 Fire GL 2 architecture
38 - Fire GL 2 features
- 27 Million G-shaded/sec., non-textured
polygons/sec - Fill rate is 410 M Pixels/sec.
- supports up to 16 light sources
- has a 300 MHz D/A converter
39Stereo glasses connector
Fire GL X3 256
Passive coolers
DVI-I video output
AGP bus connector
40 Fire GL X3-256 architecture
- 24-bit pixel processing, 12 pixel pipes
- dual 10-bit DAC and dual DVI-I connections
- does not have Genlock
- anti-aliased points and lines
- quad-buffered stereo 3D support (2 front and 2
back buffers)
41 NVIDIA Quadro FX 4000
500 MHz DDR Memory
Graphics processor Unit (GPU)
42 NVIDIA Quadro FX 4000 architecture
- dual DVI-I connections
- 32-bit pixel processing, 16 pixel pipes
- has Genlock
- anti-aliased points and lines
- quad-buffered stereo 3D support
43 FireGL X3-256 vs. NVIDIA Quadro vs 3DLabs
44CPU Evolution to Multi-Core
- Places several processors on a single chip.
- It has faster communication between cores than
between separate processors - Each core has its own resources (L1 and L2
caches) unlike multi-threads on a single core. - It is more energy efficient and results in higher
performance
45Multi-core details
46AMD64 x2 Architecture
47Guts of Native Quad Core (Next Gen)
48 - Aims at a balance between hardware software and
service - Has a flexible design by abandoning the
nVidia-only deal of the xBox - Uses a multi-core design on a single die like
having three PowerPC CPUs running at 3.2 GHz - Each of the three cores can process two threads
at-a-time (like 6 conventional processors - Each core has a SIMD unit - exploits real-time
graphics data parallelism
The X-Box 360
49 The X-Box 360
- The GPU has a Unified Shader Architecture,
meaning one unit that does both geometry and
rasterization stage (vs. separate vertex and
pixel shaders) - The Arbiter retrieves commands from the
Reservation Stations and delivers them to the
appropriate Processing Engine - The xBox 360 has several Arbiters and 48 ALUs
50 The X-Box 360
- The GPU has embedded 10 MB DRAM for use as a
frame buffer - Resolution up to 1920x1080 with full-screen
anti-aliasing - The GPU has the memory controller connecting to
the 3 cores at 22 GB/sec - Renders 500 million triangles/sec and fill rate
of 16 Gsamples/sec
51PlayStation 3 Information
- Two simultaneous High-definition television
streams for use on a title screen for a HD
Blu-ray Movie. - High-definition IP video conferencing.
- EyeToy interactive reality game.
- EyeToy voice command recognition.
- EyeToy virtual object manipulation.
- Digital photograph display (JPEG).
- MP3 and ATRAC download and playback.
- Simultaneous World Wide Web access and
gameplay. - Hub/Home Ethernet Gaming Network.
- The Ability to Have 7 Controllers at Once
52PS3 Specs
- PS3 CPU Cell Processor
- - Developed by IBM.- Cell Processor-
PowerPC-base Core _at_ 3.2GHz- 1 VMX vector unit
per core- 512KB L2 cache- 7 x SPE _at_ 3.2GHz- 7
x 128b 128 SIMD GPRs- 7 x 256KB SRAM for SPE-
1 of 8 SPEs reserved for redundancy- total
floating point performance 218 GFLOPS
53Cell Processor Architecture
- The PowerPC core present in the system is a
general-purpose 64-bit PowerPC processor that
handles the Cell BE's general-purpose workload
(or, the operating system) and manages
special-purpose workloads for the SPEs. - The SPEs are SIMD units capable of operating on
128-bit vectors consisting of four 32-bit operand
types at a time. Each SPE has a large register
file of 128x128-bit registers for operating on
128-bit vector data types and has an instruction
set heavily biased towards vector computation.
The SPEs have a fairly simple implementation to
save power and silicon area.
54Element Interconnect Bus(the communication path)
- It turns out that the physical center of the
processor is not any of the processor
elements, but the bus which connects them. - Main memory bandwidth about 25.6GB/s
- I/O bandwidth 35GB/s inbound and another 40GB/s
outbound - and a fair amount of bandwidth left over for
moving data within the processor.
55PlayStation 3 use of the multi-core processor
(IEEE Spectrum 2006)
56PS3 chip Physical Layout
57Screenshot -Resident Evil
58Screenshot -Gran Turismo
59PlayStation 3 Videos
FFVII Tech Demo
Madden Nextgen Demo
60Other I/O Components
- Audio/video output
- - Supported screen sizes 480i, 480p, 720p,
1080i, 1080p - - Two HDMI (Type A) outputs (Dual-screen HD
outputs) - - S/PDIF optical output for digital audio
- - Multiple analog outputs (Composite, S-Video,
Component video) - Sound
- - Dolby Digital 5.1, DTS, LPCM (DSP
functionality handled by the Cell processor)
61The Nintendo Wii
- Nintendos fifth video game console, 1.2 million
sold by February 1, 2007. - The concept involved focusing on a new form of
player interaction accelerometer and IR
tracking - Contains solid-state accelerometers and
gyroscopes. - Tilting and rotation up and down, left and right
and along the main axis (as with a screwdriver). - Acceleration up /down, left /right, toward the
screen and away. - Dramatically improved interface for video games.
- Innovative controller, integrates vibration
feedback. - Uses Bluetooth technology, 30 foot range.
- As a pointing device, can send a signal up to 15
feet away. Up to 4 Wii Remotes connected at once.
62Playing tennis with Nintendo Wii
- Dramatically improved interface for video games.
- Innovative controller, integrates vibration
feedback. - Uses Bluetooth technology, 30 foot range.
- As a pointing device, can send a signal up to 15
feet away. Up to 4 Wii Remotes connected at once.
63http//www.winsupersite.com/showcase/xbox360_vs_ps
3.asp
64 - Graphics Benchmarks
- Benchmark established by independent
organization - Allow comparison of graphics cards performance
based standardized application cases. - Can be application-specific like SPECapc
(Application Performance Characterization) - Or general-purpose for OpenGL architectures like
SPECviewperf
65 for OpenGL-based systems
66 Accelerator boards viewperf 8.0.1 comparison
- SPECviewperf is a portable OpenGL performance
benchmark - program written in CSPECviewperf reports
performance in frames per second. - There are six tests
- 3ds max for graphics design software.
- CATIA (DX) for CAD design application.
- EnSight(DRV) a 3D visualization package.
- Maya, an animation application.
- ProEngineer
- Lightscape radiosity application for large data
sets. - Solidworks
- Unigrfaphics
for OpenGL-based systems
67Accelerator boards viewperf 9.1
- larger, more complex viewsets that place greater
stress on graphics hardware - memory and list allocation improvements that
allow data to be reused and shared in the same
manner as within actual applications - better compression, enabling the inclusion of
larger viewsets - mixing of primitive types and graphics modes,
helping to ensure that optimizations for a
viewset will be reflected in real-world
performance.
68 Accelerator boards viewperf comparison
- Updated regularly at www.spec.org
- SPECviewperf uses a geometric mean formula to
- determine scores
- Geometric mean (fps) (test1 weight 1) ? (test2
weight 2) - . ?
(testN weight n)
69 Accelerator boards Viewperf comparison
70 Accelerator boards Viewperf comparison
71 72 - Workstation-based architectures
- Second-largest computation base
- Unix system is well suited for VR multi-tasking
needs - Multi-processor, superscalar architecture is
also appropriate for VR real-time needs - Example SGI InfiniteReality
73The SGI InfiniteReality computer
- A massively parallel architecture based on
proprietary - ASIC technology Was considered for a long time
the crème-de-la-crème in VR computers. - Can have up to 24 R10,000 CPUs in the
application stage, - The geometry board consists of a host interface
processor (HIP), a geometry distributor and
geometry engines (with a FIFO queue) - The HIP task is to pull data from main memory
(using DMA) it also has its own 16 MB cache,
such that the need to pull data is reduced.
74Influence of HIP Display List caching
75 76The SGI InfiniteReality - continued
- The HIP sends data to the geometry distributor
which - distributes the load to the geometry engines on a
least busy - fashion (with a FIFO queue)
- Each Geometry Engine uses SIMD
(single-instruction- - multiple-data) by processing the three
coordinates of the vertex - in parallel on three floating-point cores.
- The GE floating point core has its own ALU,
multiplier and - 32-word register in a four-stage pipeline
- The FIFO holds the results of the GEs output and
writes the - merged stream to the vertex bus
77 SGI Infinite Reality system architecture
- Data from the vertex bus are received by the
fragment generators on the raster memory board - The fragment generator performs the texturing,
color, depth pixel interpolation and
anti-aliasing (4 to 8 sub-samples/pixel) - Their output is then distributed equally among
80 image engines on the raster board - The image engine tiling pattern is 320x80
pixels - The display hardware has dynamic video resize,
video timing and D/A conversion
78 - Distributed VR architectures
- Single-user systems
- multiple side-by-side displays
- multiple LAN-networked computers
- Multi-user systems
- client-server systems
- pier-to-pier systems
- hybrid systems
79 Single-user, multiple displays
(3DLabs Inc.)
80 - Side-by-side displays.
- Used is VR workstations (desktop), or in large
volume displays (CAVE or the Wall) - One solution is to use one PC with graphics
accelerator for every projector - This results is a rack mounted architecture,
such as the MetaVR Channel Surfer used in - flight simulators or the Princeton Display Wall
81 - Side-by-side displays.
- Another (cheaper) solution is to use one PC
only with several graphics accelerator cards
(one for every monitor). Windows 2000 allows this
option, while Windows NT allowed only one
accelerator per system - Accelerators need to be installed on a PCI bus
82 - Genlock..
- If the output of two or more graphics pipes is
used to drive monitors placed side-by-side, then
the display channels need to be synchronized
pixel-by-pixel - Moreover, the edges have to be blended, by
creating a region of overlap.
83 (Courtesy of Quantum3D Inc.)
84 - Problems with non-synchronized displays...
- CRTs that are side-by-side induce fields in each
other, resulting in electronic beam distortion
and flickers need to be shielded - Image artifacts reduce simulation realism,
increase latencies, and induce simulation
sickness.
85 Problems with non-synchronized CRT displays...
86 (Courtesy of Quantum3D Inc.)
87 - Synchronization of displays
- software synchronized system commands that
frame processing start at same time on different
rendering pipes - does not work if one pipe is overloaded one
image finishes first
Synchronization command
88- Synchronization of displays
- frame buffer synchronized system commands that
frame buffer swapping starts at same time on
different rendering pipes - does not work because swapping depends on
electronic gun refresh - one buffer will swap up
to 1/72 sec before the other.
CRT
Synchronization command
Buffer
89 - Synchronization of displays
- video synchronized system commands that CRT
vertical beam starts at same time one CRT
becomes the master - does not work if horizontal beam is not
synchronized too (one line too many or too few).
Master CRT
Buffer
Synchronization command
Buffer
Slave CRT
90 - Synchronization of displays
- Best method is to have software buffer video
synchronization of the two (or more) rendering
pipes
Master CRT
Buffer
Synchronization command
Synchronization command
Synchronization command
Buffer
Slave CRT
91 Video synchronized displays (three PCs)
done
release
(Digital Video Interface- Video out)
Wildcat 4210
92 (Courtesy of Quantum3D Inc.)
93 - Graphics and Haptics Pipeline Synchronization
- Has to be done at the application stage to allow
decoupling of the rendering stages (have vastly
different output rates)
94 Haptic Interface Controller (embedded Pentium)
Graphics pipe and Haptics pipe
Pentium II Dual-processor Host computer
Haptic Interface
95 Physics Processing Unit (PPU)
- First Physics Processing Unit made by Ageia Inc.
is called PhysX - PhysX available as an add on card (see above).
- Helps the CPU do computations related to
material properties (elasticity, friction,
density) - Better smog and fog effects and more realistic
clothing simulation (characters clothes will
react differently based on the material and other
factors like rain and wind - Better fluid dynamics simulation and collision
effects Cost 160
96 Physics Processing Unit (PPU)
97 - Co-located Rendering Pipelines
- Another, cheaper, solution is to use a single
multi-pipe graphics accelerator - one output channel for every monitor.
Wildcat II 5110
98 Wildcat II 5110
99 - Wildcat4 7210 features
- 38 Million Gouraud-shaded, Z-buffered
triangles/sec/ - 400 Megapixel/sec texture fill rate
- 32 light sources in hardware
- Independent dual display support
- 1529x856 frame-sequential stereo _at_ 120 Hz.
100 - Wildcat Realizm 800 features
- Uses a Visual Processing Unit (VPU)
- Uses OpenGL Shading Language
101 - Wildcat Realizm 800 features
- Texture sizes up to 4K x 4K
- 32 light sources in hardware
- Independent dual 400 MHz 10-bit DAC
- 3D textures are applied throughout the volume of
a model, not just on the external surfaces
102 Computing architectures
- PC Clusters
- multiple LAN-networked computers
- used for multiple-PC video output
- used for multiple computer collaboration (when
computing power is insufficient on a single
machine) older approach.
103Chromium cluster of 32 rendering servers and four
control servers
104 Chromium networking architecture
105 Frame refresh rate comparison
106 Princeton display wall using eight LCD rear
projectors (1998)
107 Princeton display wall eight 4-way Pentium-Pro
SMPs with ES graphics accelerators. They drive
8 Proxima 9200 LCD projectors. (1998)
108 VRX Rack - Ciara Technologies 256 Xeon processors
and 1.T TerraBytes of DDR Memory Best
price/performance ratio, Lynux and Windows OS
Ciara VRX
109 Computing architectures
- Multi-User distributed remote system
architecture - Multiple modem-networked computers
- multiple LAN-networked computers
- multiple WAN-networked computers
- what is the network topology and influence on
number of users?
110 Network connections
111 - Two-User Shared Virtual Environments
- These were the first multi-user environments to
be introduced (they are the simplest) - Communicate over LAN using unicast packets with
TCP/IP protocols
112 Server-mediated communication Unicast
mode Sever is bottleneck on allowable number of
clients
Server
Client 1
Client 2
Client n
(adapted from Networked Virtual Environments
Singhal and Zyda, 1999)
113 Client 2,1
Client 2,2
Client 2,n
Server-mediated communication Allows more
clients to be networked over LANs
Server 2
LAN
LAN
Server 1
LAN
Client 1,1
Client 1,2
Client 1,n
(adapted from Networked Virtual Environments
Singhal and Zyda, 1999)
114Pier-to-pier communication Allows more clients
to be networked over LANs Can use broadcast or
multicast Reduces network traffic, BUT.. More
vulnerable to viruses, and does not work well
over WAN.
LAN
Multicast packets
Area of interest management
AOIM 1
AOIM 3
AOIM n
User 1
User 3
User n
(adapted from Networked Virtual Environments
Singhal and Zyda, 1999)
115Hybrid network using multiple servers
communicating through multicast allows
deployment over WAN - no broadcasting allowed
WAN
Unicast packets
Unicast packets
Proxy Server 1
Proxy Server 2
Proxy Server 3
Proxy Server n
Multicast packets
LAN
User 1,1
User 1,2
User 1,n
For very large DVEs current WAN - do not support
multicasting
(adapted from Avatars in Networked Virtual
Environments Chapin,
Pandzic, Magnenat-Thalman and Thalman, 1999)
116Example of distributed Virtual Environment (connec
tion between Geneva and Lausanne in Switzerland
Cybertennis