Ray Tracing Hardware - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Ray Tracing Hardware

Description:

Proposed architecture from U. of Saarland, Germany ... Any questions for him? Thursday we look at compositors. Lightning2. Metabuffer. Sepia? ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 40

Provided by: anselmo9

Category:

more less

Transcript and Presenter's Notes

Title: Ray Tracing Hardware

1
Ray Tracing Hardware

Anselmo Lastra

2
Types/Examples of Hardware

Fairly programmable machines
Pixel Machine, Pixel-Planes
Duke ray casting machine
For CAD
ART for off-line acceleration
Proposed single-chip
SaarCOR
GI-Cube
Globillum for vol rend

3
SaarCOR

Proposed architecture from U. of Saarland,
Germany
Based on their fast PC and PC-cluster ray tracers
Renders triangles
Shades conventionally
Maybe shadow rays

4
Coherence

Traverse packets of rays
BSP node fetched if any ray intersects
Avoid traversing node with rays that are not in
Keep a bit vector indicating which rays active in
BSP tree branch

5
Packet Size

They propose 64

6
Block Diagram
7
Ray Generation

Master generates eye rays
Slaves manage ray until ray complete
Each slave has shading unit(s)
Memory interface to FB

8
Questions

Separate frame buffer DRAM?
Looks like it. If so, good idea or bad?

9
Ray Tracing Core

Traces rays and computes intersections
Traversal unit traces through BSP tree
When reaches leaf sends addr of list of triangles
List unit traverses triangles
Intersection unit computes intersections
Traversed until intersection found
Then result sent back to Slave

10
Operation Costs

Traversal
3 FP adds, 1 FP multiply
Intersection
12 FP adds and 13 FP multiplies
Suggest balancing traversal and intersection by
making deeper BSP tree
Say that BSP tree creation is automatic and can
be done in hardware
No design given

11
Trav Ops vs Int Ops

They choose 4 trav to 1 intersection

12
Multiple Traversal Units

Several paragraphs discussing multiple traversal
and intersection units
All subsequent discussion about traversal units
Other paper says ave. 40-50 traversals and 5-10
intersections per ray
Not clear what that architecture would look like
All work on one packet of rays (64)
Talk about running them asynchronously. Not sure
what they mean
Why multiple traversal units instead of
intersection units?

13
Memory Accesses

They propose multiple ray packets in flight
Hyperthreading
Why? Seems simpler to have a FIFO like for
texture and Z
Is there something very different about ray
casting?
They propose local copies at traversal and
intersection units
Not quite sure what this means. Local register?

14
RTC Memory Interface

Data memory read only
Connection from MI to RTCs
Single bus RTCs pick off their labelled data
Memory requests with round-robin multiplexer
They say this will be OK with up to 8-16 RTCs

15
Caches

Propose 64KB-144KB caches
Cache hit rate of 95
Say type of cache not big issue (3)
They expect to need bandwidth to memory of
250MB/s-1GB/s
No mention of DRAM efficiency
Nothing about banks, trying to increase memory
bandwidth, etc.
Just says simple address hashing to avoid hot
spots

16
Shading

Simple Phong shading with bilinear texture
mapping
They assume 20-80 cycles/ray
Say that need 3 FP adders and 4 FP multipliers
Would they need FP?
They say memory bandwidth requirements for
shading similar to RTC (250MB/s-1GB/s)
Is this from the FB DRAM port?
Lets compute it

17
Shading(2)

Shading parameters not carried through pipeline
Fetched once they know what to shade
Like deferred shading
How are they fetched? From where?
Sentence about asynchronous shading
Page 6
What does that mean?

18
Other Rays

Reflection, shadow rays
How are they generated?
They propose tossing away partially completed
rays to avoid deadlocking if run out of room
Which rays?
Do they mean toss them away and regenerate later?
How?

19
Proposed Chip

4 RTCs each with 16 threads and four traversal
units
Clock rate of 533 MHz
Quite fast
Four SDRAMS at 133 MHz
Slow
Caches 64KB traversal, 64KB list, 144KB
intersection
192 floating point units
See next slide

20
GeForce3 Flops

Believe their 76 GFlops and 380 floating point
units way too high
Geforce3 marketing 100M tris
200 MHz clock -gt 2 clocks/vert
That probably means 5 floating point units
Based on a simple vertex taking 10 flops

21
Floating Point Units

Their number seems higher than could be put on a
chip
Theyre assuming much hardware shading
calculation done in FP
1MB of registers/cache
Is this too high?

22
Test Scenes
23
Test Conditions

1024x768, point sampled
All lights cast shadows
One shadow ray per light

24
(No Transcript)
25
Cruiser Scene

Bandwidth Limited
The BDQ scene needs 1MB cache and 2 GB/s memory
bandwidth

26
Usage
27
More Complex Effects

eye rays (er) only, (b) er and reflections up to
3 levels (r3), (c) er and 3 lights (3l), (d) er,
reflections and 3 lights, (e) er with a simple
four times oversampling (4os)
Looks like performance is computation limited,
not bandwidth

28
Summary

Static scenes
Simple shading
That could be fixed
Is ray tracing necessarily frame based?

29
Advanced Rendering Technology

Make accelerators for offline ray tracing
High-quality
RenderMan shading
Chip
1.8M gates
64 32-bit FPUs
Block diagram next

30
AR 350
31
GI-Cube

Global illumination for volumes
Architecture paper from SUNY Stony Brook

32
Examples
33
Algorithm

Two pass
Rays from lights to volume faces, with k
supersamples
These distribute energy through volume
Rays from eye
Rays can be transmitted, absorbed or scatter

34
Machine Block Diagram
35
Block Processor

Scattering uses BRDF to change ray direction
If simple, done on hardware if complex, sent to
software
I assume software means the PC, not the DSP

36
Queues

Enough queues for all blocks
For 2562 volume with 323 block size, 128 queues
over 4 processors
Queue Overflow
Rays sent to DSP
They say this is OK because it means theres
enough work at block processor anyway

37
Queue Importance