Reality Engine

About This Presentation

Title:

Reality Engine

Description:

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL. Topics. Look at RE pipeline ... RE stole memory cycles to refresh display. Commodity parts lowered cost of RE. 20 ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 32

Provided by: anselmo9

Category:

more less

Transcript and Presenter's Notes

Title: Reality Engine

1
Reality Engine

Anselmo Lastra
COMP290-052

2
Topics

Look at RE pipeline
Examine the data that flows between stages
Look at bandwidths

3
Reality Engine

First OpenGL machine
Change from proprietary IrisGL
Fastest commercial machine of time
Generations
Akeleys definitions
Begins at flat-shaded polygons

4
Design Goals

½ million textured tris/sec
Mip-mapped textures
Antialising
High fill rate
Work as well as VGX on 2nd gen work

5
Three types of Boards

Geometry board
6, 8, 12 geometry processors
Raster memory board
1, 2, 4 boards to increase fill rate
Or antialiasing capability
Display/video generator

6
Command Processor

Controls work sent to geometry engines
Broadcasts some state info
Send tris to particular GE
Round-robin assignment
Static load balancing
Sizes of primitives (t-strips) sent to GEs
important
Primitive ordering
FIFOs between stages

7
Geometry Engines

Intel i860
RISC, pipelined FP
All polygons converted to tris
Single precision computation
Typical
Broadcasts data for rasterization
Note also path to load code into DRAM

8
Fragment Generators

Custom ASICs
Each a portion of frame buffer
Interleaved
Pipeline with several fragments in flight
Tasks z and color for center, coverage mask,
texture addresses, texture lookup, final color
computation (blend, fog)
Care to keep color in triangle (not always
center)
Talk about fragment generation / rasterization
later

9
Subpixel mask

Fixed 8x8 grid
Select 4, 8, or 16 samples on grid
Computer coverage of samples
Only one depth and texture coord chosen
Depths expanded later from dX, dY
Color at center also

10
Texturing

Texture replicated at each fragment processor
5-20 times
Eight DRAM chips
One for each mipmap sample

11
Image Engine

16 per fragment generator
One DRAM each
Each computes depth at subpixel covered by
fragment
Bits/pixel depends on of boards and display
resolution (256, 512, 1024)
12 bits / color component
32 bit depth

12
Display Board

Display color computed by image engines every
fragment
OpenGL has no explicit end-of-frame
50MHz single bit paths to board from each image
engine
Color maps, etc.

13
Antialiasing

Alpha
Coverage on 8x8 grid computed
Ordering must be observed

14
Multisample antialiasing

Point sampling
Including accurate edges
Not always good representation of actual area
Area sampling
Can produce artifacts
Screen-door transparency
Alpha to coverage

15
Texturing

Default is 16-bits / texel
Because of bandwidth issues
Can increase to 32 or 48 bits

16
Clipping

FIFOs even out load
MIMD better for clipping
SIMD must execute wasted cycles to compute both
if and else
Far and near planes
Also less clipping because rasterizers scissor
the primitives

17
Antialiasing

Single pass
Multisample
vs. A-Buffer
Makes case for utility of supersampling as
opposed to multi-pass
Less overall hdw
Transform/texture only once

18
Triangle Bus

Argues that doing sort before fragment engines
better than after
Compares to ES
Notes PixelFlow frame latency
Lets defer discussion until we talk about
sorting classification of parallel rendering

19
Commodity DRAM

Many other machines used specialized video RAM
RE stole memory cycles to refresh display
Commodity parts lowered cost of RE

20
Data Flow of RE

What flows between stages?

21
Bandwidths

Compute bandwidths (at dots) required to render a
million 50-pixel triangles
Well make assumptions, such as all visible

22
Retained vs. Immediate

Retained
Potentially better performance when bandwidth to
host a problem
Difficult if much of model is edited
Immediate
Adds-ons to cache data on GPU
Often have scene graph anyway

23
Geometry

Often vertices treated independently
Primitives reassembled
Parallelism at fine or coarse scale
Vertex 20-30 bytes
About 50-100 Flops/vertex

24
Rasterization/Fragment

Output is pixels triangles
Standard then - 50-100 pixel tris
Total about 50M fragments for our example

25
Texturing

Per fragment cost is 8 32-bit texels
About 1.6GB / million tris
Can use compression here
Modern systems use caching
How fast a memory would we need for 10 M tris?

26
Frame Buffer Bandwidth

Assume z of 32 bits, color 4 bytes
Could go with z of 24 bits
Must read Z for every fragment
200 MB/s for Z
Must write some fraction
Say ½
400 MB/s for Z and color
Total 600 MB/s

27
Frame Buffer Clear

Tricks to avoid clearing
Example One indicator bit cleared at start of
frame

28
Scanout Bandwidth

3 bytes/pixel
1024x1024, approx. 1M pixels
60 Hz
About 180 MB/s
1280x1024, 72Hz, 280 MB/s
Can go over 500 MB/s for HDTV at 72Hz

29
Modeling

We have found spreadsheet performance model
useful
Can explore what-if
Functional model in HLL
Gate-level model in HLL or Verilog

30
Additional Reading

The architecture of the pipeline
Mark Segal and Kurt Akeley, The design of the
OpenGL graphics interface

31
Next Time

Rasterization
Pineda, "A parallel algorithm for polygon
rasterization", SIGGRAPH 88, and
Olano, Marc and Trey Greer, "Triangle Scan
Conversion Using 2D Homogeneous Coordinates",
Proceedings of the 1997 SIGGRAPH/Eurographics
Workshop on Graphics Hardware.
Supplementary reading Joel McCormack, Robert
McNamara, "Tiled polygon traversal using
half-plane edge functions", Graphics Hardware
2000. The Pineda paper presents a method to
decide whether a fragment is in a triangle, but
only sketches ways to fidn the candidate
fragments. This paper describes a method to
traverse triangles in tile order (good for
caching).