Ray Tracing Hardware - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Ray Tracing Hardware

Description:

Proposed architecture from U. of Saarland, Germany ... Any questions for him? Thursday we look at compositors. Lightning2. Metabuffer. Sepia? ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 40
Provided by: anselmo9
Category:
Tags: hardware | ray | sepia | tracing

less

Transcript and Presenter's Notes

Title: Ray Tracing Hardware


1
Ray Tracing Hardware
  • Anselmo Lastra

2
Types/Examples of Hardware
  • Fairly programmable machines
  • Pixel Machine, Pixel-Planes
  • Duke ray casting machine
  • For CAD
  • ART for off-line acceleration
  • Proposed single-chip
  • SaarCOR
  • GI-Cube
  • Globillum for vol rend

3
SaarCOR
  • Proposed architecture from U. of Saarland,
    Germany
  • Based on their fast PC and PC-cluster ray tracers
  • Renders triangles
  • Shades conventionally
  • Maybe shadow rays

4
Coherence
  • Traverse packets of rays
  • BSP node fetched if any ray intersects
  • Avoid traversing node with rays that are not in
  • Keep a bit vector indicating which rays active in
    BSP tree branch

5
Packet Size
  • They propose 64

6
Block Diagram
7
Ray Generation
  • Master generates eye rays
  • Slaves manage ray until ray complete
  • Each slave has shading unit(s)
  • Memory interface to FB

8
Questions
  • Separate frame buffer DRAM?
  • Looks like it. If so, good idea or bad?

9
Ray Tracing Core
  • Traces rays and computes intersections
  • Traversal unit traces through BSP tree
  • When reaches leaf sends addr of list of triangles
  • List unit traverses triangles
  • Intersection unit computes intersections
  • Traversed until intersection found
  • Then result sent back to Slave

10
Operation Costs
  • Traversal
  • 3 FP adds, 1 FP multiply
  • Intersection
  • 12 FP adds and 13 FP multiplies
  • Suggest balancing traversal and intersection by
    making deeper BSP tree
  • Say that BSP tree creation is automatic and can
    be done in hardware
  • No design given

11
Trav Ops vs Int Ops
  • They choose 4 trav to 1 intersection

12
Multiple Traversal Units
  • Several paragraphs discussing multiple traversal
    and intersection units
  • All subsequent discussion about traversal units
  • Other paper says ave. 40-50 traversals and 5-10
    intersections per ray
  • Not clear what that architecture would look like
  • All work on one packet of rays (64)
  • Talk about running them asynchronously. Not sure
    what they mean
  • Why multiple traversal units instead of
    intersection units?

13
Memory Accesses
  • They propose multiple ray packets in flight
  • Hyperthreading
  • Why? Seems simpler to have a FIFO like for
    texture and Z
  • Is there something very different about ray
    casting?
  • They propose local copies at traversal and
    intersection units
  • Not quite sure what this means. Local register?

14
RTC Memory Interface
  • Data memory read only
  • Connection from MI to RTCs
  • Single bus RTCs pick off their labelled data
  • Memory requests with round-robin multiplexer
  • They say this will be OK with up to 8-16 RTCs

15
Caches
  • Propose 64KB-144KB caches
  • Cache hit rate of 95
  • Say type of cache not big issue (3)
  • They expect to need bandwidth to memory of
    250MB/s-1GB/s
  • No mention of DRAM efficiency
  • Nothing about banks, trying to increase memory
    bandwidth, etc.
  • Just says simple address hashing to avoid hot
    spots

16
Shading
  • Simple Phong shading with bilinear texture
    mapping
  • They assume 20-80 cycles/ray
  • Say that need 3 FP adders and 4 FP multipliers
  • Would they need FP?
  • They say memory bandwidth requirements for
    shading similar to RTC (250MB/s-1GB/s)
  • Is this from the FB DRAM port?
  • Lets compute it

17
Shading(2)
  • Shading parameters not carried through pipeline
  • Fetched once they know what to shade
  • Like deferred shading
  • How are they fetched? From where?
  • Sentence about asynchronous shading
  • Page 6
  • What does that mean?

18
Other Rays
  • Reflection, shadow rays
  • How are they generated?
  • They propose tossing away partially completed
    rays to avoid deadlocking if run out of room
  • Which rays?
  • Do they mean toss them away and regenerate later?
  • How?

19
Proposed Chip
  • 4 RTCs each with 16 threads and four traversal
    units
  • Clock rate of 533 MHz
  • Quite fast
  • Four SDRAMS at 133 MHz
  • Slow
  • Caches 64KB traversal, 64KB list, 144KB
    intersection
  • 192 floating point units
  • See next slide

20
GeForce3 Flops
  • Believe their 76 GFlops and 380 floating point
    units way too high
  • Geforce3 marketing 100M tris
  • 200 MHz clock -gt 2 clocks/vert
  • That probably means 5 floating point units
  • Based on a simple vertex taking 10 flops

21
Floating Point Units
  • Their number seems higher than could be put on a
    chip
  • Theyre assuming much hardware shading
    calculation done in FP
  • 1MB of registers/cache
  • Is this too high?

22
Test Scenes
23
Test Conditions
  • 1024x768, point sampled
  • All lights cast shadows
  • One shadow ray per light

24
(No Transcript)
25
Cruiser Scene
  • Bandwidth Limited
  • The BDQ scene needs 1MB cache and 2 GB/s memory
    bandwidth

26
Usage
27
More Complex Effects
  • eye rays (er) only, (b) er and reflections up to
    3 levels (r3), (c) er and 3 lights (3l), (d) er,
    reflections and 3 lights, (e) er with a simple
    four times oversampling (4os)
  • Looks like performance is computation limited,
    not bandwidth

28
Summary
  • Static scenes
  • Simple shading
  • That could be fixed
  • Is ray tracing necessarily frame based?

29
Advanced Rendering Technology
  • Make accelerators for offline ray tracing
  • High-quality
  • RenderMan shading
  • Chip
  • 1.8M gates
  • 64 32-bit FPUs
  • Block diagram next

30
AR 350
31
GI-Cube
  • Global illumination for volumes
  • Architecture paper from SUNY Stony Brook

32
Examples
33
Algorithm
  • Two pass
  • Rays from lights to volume faces, with k
    supersamples
  • These distribute energy through volume
  • Rays from eye
  • Rays can be transmitted, absorbed or scatter

34
Machine Block Diagram
35
Block Processor
  • Scattering uses BRDF to change ray direction
  • If simple, done on hardware if complex, sent to
    software
  • I assume software means the PC, not the DSP

36
Queues
  • Enough queues for all blocks
  • For 2562 volume with 323 block size, 128 queues
    over 4 processors
  • Queue Overflow
  • Rays sent to DSP
  • They say this is OK because it means theres
    enough work at block processor anyway

37
Queue Importance
  • Pipelined insertion sorter

38
Redistribution Scheme
  • Works well for this volume because space is
    bounded
  • Known number of bins
  • Can the same thing be done for conventional ray
    tracer?

39
Next Week
  • Molnar here on Tuesday
  • Any questions for him?
  • Thursday we look at compositors
  • Lightning2
  • Metabuffer
  • Sepia?
Write a Comment
User Comments (0)
About PowerShow.com