Neon Graphics Accelerator - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Neon Graphics Accelerator

Description:

Take into account that this was in the days of 3dfx Voodoo. Not too many sources of info on one-chip GPUs. More detailed tech report ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 31
Provided by: anselmo9
Category:

less

Transcript and Presenter's Notes

Title: Neon Graphics Accelerator


1
Neon Graphics Accelerator
  • Anselmo Lastra

2
History
  • A proposed DEC product
  • Never sold, as far as I know
  • They were bought by COMPAQ
  • Later HP
  • Paper at 1998 Graphics Hardware
  • Take into account that this was in the days of
    3dfx Voodoo
  • Not too many sources of info on one-chip GPUs
  • More detailed tech report

3
Rasterizer-Side Only
  • Geometry processing on host
  • A fast Alpha workstation
  • Common on early PC products also

4
Performance
  • Note the historical increase in performance
  • Memory bandwidth
  • Neon 3.2 GB/s, now about 40GB/s, gt10X more
  • Fill
  • Neon about 2M 50-pixel textured triangles,
    100Mpixels/s
  • Now fill rates of 10 Gpixels/s
  • Transform
  • None, but 7M tris/s setup
  • Now 800 Mverts/s

5
Block Diagram
6
Unified Memory
  • They argue for single memory
  • Common now, of course
  • Can trade off one type for another
  • Smaller FB More Texture
  • Say that texturing can cause thrashing between
    texels and color/Z
  • Is unified memory the best way to go?

7
Fragment Generator
  • Each cycle generates one of
  • Single textured fragment
  • 2x2 square of z-buffered fragments with color
    (64-bits data)
  • RGB alpha
  • Z
  • Fog intensity
  • 8 fragments with 32-bit solid color (or stipple
    pattern)
  • 32 8-bit 2D fragments

8
Texel Central
  • More than just for textures
  • Also frame buffer transfers go through here
  • DMA to memory, Bit Blts
  • Texture maps one fragment per clock

9
Fetch Before Z Compare
  • Why?
  • Pre-texture data approx. 350 bits
  • Post texture 100 bits
  • Couldnt afford more (and wider) queues to
    texture map after z test
  • OpenGL requires z update after texturing
  • Distributing textured fragments to memory
    controller and then using only some would
    complicate maintaining spatial coherence (?)

10
Texture Interpolation
  • c is fraction of u, v, or the LOD
  • They also added a table to implement separable
    filters (See TR WRL-99-1)

http//www.hpl.hp.com/techreports/Compaq-DEC/WRL-9
9-1.html
11
Pixel Processors
  • Eight of them
  • One per memory controller
  • Responsible for
  • Z test
  • Alpha
  • Stencil
  • Fog
  • Blending
  • Dithering

12
Arithmetic
  • Make a point of doing correct math
  • We ran into this on PixelFlow
  • Reference Blinn, Two Wrongs Make a Right, CGA,
    Nov 95
  • Conversion
  • Float to fixed pt.
  • Fixed to float

13
Multiplication
  • To see how this works, Blinn begins by converting
    to FP, then doing multiplication
  • Product is
  • or
  • Scaling the 0.5,
  • If s is a power of 2, like 16384, can use shifts

14
Color Multiplication
  • Say were using 8-bit color, 0 to 255
  • In order to represent 0, 1, the 255 is
    equivalent to 1.0, and s 255
  • So, to multiply a X b
  • In hardware?

15
Blinn Says
16
Why?
  • Division generates quotient j and remainder k
  • The border cases of the remainder are 127 and 128
    (no exact 0.5)

17
The Two Biases
  • Bias of 128
  • Bias of 127

18
Explanation
  • Turns out bias must satisfy
  • or

19
Implementation
  • Dont want to divide by 255
  • Attributes this trick to Alvy Ray Smith

20
Video Controller
  • Always requests data from both bank A and B
  • Memory controller chooses bank to maximize memory
    throughput
  • Video controller can request data immediately
    when reply will be late

21
Memory Controllers
  • 8 separate memory controllers
  • Each 32 bits wide
  • 32 total 100MHz SDRAM chips
  • SDRAMs can have up to 4 banks
  • 23 address and control pins per controller
  • Total of 440 pins for memory

22
Memory Controller
  • Frame buffer partitioned across the 8
  • Five request queues each
  • Reads from Texel Central
  • Read and Write from Pixel Processors
  • A and B bank reads from video controller
  • Chooses from among queues to minimize memory
    cycle waste
  • Each controller has texture cache, 8 32-bit
    texels, fully associative
  • Very small

23
Fragment Batches
  • Batch of z comparisons and z/color writes
  • Need to detect when two fragments belong to same
    pixel
  • 8-way fully-associative overlap detector
  • Detector terminates batch if incoming fragment
    overlaps
  • Writes all data for terminated batch before
    reading new

24
Interleaved Frame Buffer
  • Checkerboard but rotated
  • They suggest that 2x2 or 4x4 pixel batches would
    be better
  • Would need more memory at controllers

25
Fragment Generation
  • Chunked or tiled
  • Already discussed this

26
Memory Usage (from slides)
27
Performance Counters
  • Two 64-bit counters
  • Programmable
  • To decide what to count
  • Not much detail provided

28
Chip
  • To show how much area/function
  • Lot of space devoted to MC
  • Includes caches and request queues

29
Design Details
  • They used C simulator
  • Wrote a C to Verilog tool
  • Developed some synthesis tools to help Synopsis

30
Technical Report
  • More detail in Tech Report
  • http//www.hpl.hp.com/techreports/Compaq-DEC/WRL-9
    8-1.pdf
  • Might be useful for people implementing
    particular parts of pipeline
  • Texture addressing and filtering, for example
Write a Comment
User Comments (0)
About PowerShow.com