Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board PowerPoint PPT Presentation

presentation player overlay
1 / 20
About This Presentation
Transcript and Presenter's Notes

Title: Implementation of High-Rate JPEG2000 Coding on a Virtex-2 Pro Reconfigurable Computing Board


1
Implementation of High-Rate JPEG2000 Coding on a
Virtex-2 Pro Reconfigurable Computing Board
  • Presented by Damon Van Buren
  • SEAKR Engineering
  • MAPLD 2004
  • Submission 133

2
The Sensor Bandwidth Problem
  • Commercial satellite imaging systems are
    experiencing growth in imaging capability...
  • Higher resolution lt 1 m
  • Larger images gt10k image width and height
  • More spectral components
  • Panchromatic
  • Red/Green/Blue
  • Multi-spectral
  • Improved capabilities are leading to high sensor
    data rates
  • Data output rates gt 2 Gbps for some systems
  • Providing storage and downlink bandwidth for the
    data is becoming a significant challenge for
    system designers
  • The largest data recorders can store less than 20
    minutes of data at 2 Gbps
  • Downlinks must be several hundred Mbps to
    downlink 15 minutes of data in under an hour
  • Data storage and high-bandwidth downlinks require
    lots of power
  • By reducing the amount of image data, compression
    provides a solution to the bandwidth problem!

3
Desired Compressor Features
  • Real Time
  • Compression must be performed in real time, prior
    to storage.
  • High throughput (gt 2 Gbps)
  • Excellent Performance in Lossy and Lossless Modes
  • Purchasers of satellite imagery are sensitive to
    reductions in image quality caused by lossy
    compression.
  • Scientific users prefer undistorted data (bit
    true).
  • Space-Qualified
  • Must survive hazards of launch and space
    operation, including radiation.
  • Low Risk
  • Satellite imaging companies seek high reliability
    solutions..
  • Low Cost
  • Commercial customers require cost effective
    solutions.
  • Flexible
  • The ability to support varying compression ratios
    and contents would allow more effective use of
    available storage and bandwidth.

4
JPEG2000 Algorithm
  • JPEG2000 is an excellent choice for satellite
    image compression.
  • Latest still image compression standard from the
    JPEG committee
  • Meets two key requirements for satellite image
    compression
  • Excellent performance in both lossy and lossless
    modes.
  • 1.7 to 1 lossless compression for typical
    satellite imagery - 70 improvement!
  • Visually lossless compression gt 2 to 1 - 100
    improvement in storage and downlink performance.
  • Very flexible
  • Many options for compressed images.
  • Other advantages
  • International Standard
  • Wavelet based
  • High quality lossy images with comp. ratios gt
    1001
  • Packet oriented
  • Allows random access to the compressed code
    stream.
  • Makes compressed data more robust in the presence
    of bit errors.
  • Allows selection of image quality, spatial
    region, resolution, and color component after
    compression.

5
JPEG2000 Implementation Challenges
  • JPEG2000 is a very complex algorithm.
  • More Features More Complexity.
  • Operation intensive
  • Several hundred operations per pixel, because
    each bit must be processed many times, for the
    wavelet transform, entropy coding, MQ coding,
    packet generation, etc.
  • Complex
  • Many different stages to produce compressed
    output.
  • Wavelet transform.
  • Quantization.
  • Context generation.
  • Arithmetic coding.
  • Packet generation.
  • Many parameters must be tracked individually for
    each code block (64x64).
  • Memory intensive
  • Each pixel must be accessed many times, so many
    small buffers are needed to get good throughput.
  • Few processors are capable of implementing
    JPEG2000 at high rates!

6
High-Performance Processing Using Xilinx FPGAs
  • Xilinx FPGAs have many advantages for fast
    parallel processing
  • Millions of gates.
  • System clocks of several hundred MHz.
  • High speed I/O
  • 622 Mbps LVDS
  • Multi-Gigabit serial I/O
  • Hundreds of internal block RAMS.
  • Hundreds of internal 18 bit multipliers.
  • Xilinx FPGAs are available in a space qualified
    versions
  • Radiation testing is complete on the Virtex and
    Virtex-II devices.
  • 200 kRad total dose, latchup immune.
  • Radiation testing to begin on the Virtex-II Pro
    devices soon.
  • Xilinx FPGAs are very flexible, reducing risk
  • May be re-programmed an infinite number of times.
  • Configurations may be uploaded at any time during
    the mission to fix errors or add new capability.
  • Xilinx FPGAs are the best solution for fast
    compression in space!

7
Challenges for Xilinx Use in Space
  • The effects of radiation in spacecraft
    electronics are well known.
  • Caused primarily by charged particles.
  • May cause permanent damage over time by ionizing
    SiO2 (total dose).
  • May also cause errors in digital logic by
    upsetting registers (single event effects).
  • Mitigation techniques are used to reduce or
    eliminate the effect of radiation upsets.
  • Triple Modular Redundancy (TMR) uses voting to
    select the correct output from 3 separate
    instances of the design.
  • Mitigation of radiation effects in SRAM-based
    FPGAs presents an additional challenge
  • As with other digital electronics, the functional
    logic of the device is susceptible to upset,
    however...
  • Another layer of logic (configuration logic)
    controls the routing of the part, giving the
    device its capability to be reprogrammed to
    perform different functions.
  • Configuration logic is also susceptible to
    radiation upsets.
  • Xilinx FPGAs require system level mitigation
    strategies in addition to the device level
    mitigation techniques (such as TMR) that are
    commonly used for space electronics.
  • Configuration data must be continuously
    re-written, or scrubbed using a read-and-correct
    approach.

8
SEAKRs RCC Board Processing Solutions
  • SEAKR has developed a line of Reconfigurable
    Computing (RCC) products based on the Xilinx
    FPGAs.
  • RCC 1 4x Virtex 1000s
  • RCC 2 4x Virtex II 6000s
  • RCC 3 (NTRCC) 4x Virtex II Pro 70/100s
  • Boards include system-level upset mitigation
    (scrub) for the Xilinx devices.
  • Configuration data is continuously read and
    checked for errors.
  • Errors are corrected by overwriting the corrupted
    frames, without interrupting the operation of the
    device.
  • Other devices on board employ radiation
    mitigation strategies as well
  • Radiation hardened
  • EDAC
  • Boards also have dedicated resources to support
    high-performance processing
  • High speed I/O.
  • External memories.
  • Industry standard form-factor 6U Compact PCI.

9
Network RCC (NTRCC)
  • Four Xilinx XC2VP70-6FF1704 FPGA CO-Processors
  • Design compatible with XC2VP100-6FF1706 and V2P-X
  • (4) banks of 1Mx36 Quad Data Rate (QDR) SRAMs for
    each COP
  • 512MB of DDRII Shared SDRAM memory for prototype
  • 1GB of 128M x 64 EDAC (R-S) Protected DDRII SDRAM
    shared memory (19.2Gbps _at_150MHz) using 1Gbit
    memory
  • Network IF
  • (2) parallel 16bit RapidIO ports to front panel
    (8 Gbps)
  • (1) 4x3.125 Gbps serial port to front panel
    (gt10Gbps)
  • 4x3.125 Gbps ports from NIC to each COP (gt10Gbps)
  • 4x3.125 Gbps ports from each COP to each neighbor
    COP (gt10Gbps)
  • Shared Data Buses
  • Cop Interconnect Bus (4.224 Gbps)
  • cPCI 32bit 33Mhz
  • Read and write COP configurations via cPCI
  • Extended 6U form factor
  • Configuration RAM SEU detection and correction
  • DDRII SDRAM on configuration controller for
    shadow config program storage
  • Non-Volatile memory for 16 different
    configurations (1 Gbit Flash)

10
Network RCC Block Diagram
11
NTRCC Layout
  • 24 Layer board
  • MicroVias, blind vias, via-in-pad
  • High speed 3.125 Gbps Serial links
  • 82 pages of schematic capture
  • 10 weeks of PCB layout time

12
Implementation of the JPEG2000 Algorithm
  • The JPEG2000 core has been in development for
    over a year.
  • Eventual target data rate 600 Mbps/device.
  • Written in VHDL.
  • Simulations performed in Modelsim.
  • Synthesis in Synplify_Pro.
  • Targeted to the NTRCC-R summer 04.
  • Targeted to a reduced version of the NTRCC with a
    single coprocessor.
  • Take advantage of improved external memory
    throughput.
  • Ultimately use the high-speed serial I/O to move
    image information on the board.
  • Designed for high throughput.
  • Cycle efficient coding style.
  • Highly parallel design.
  • Pipelined architecture.
  • Rolling wavelet transform.
  • Designed for flexible output file format.
  • Output is divided into quality layers for easy
    selection of compression ratio.

13
JPEG2000 Block Diagram
14
JPEG2000 Coding Steps
  • Image is broken into tiles
  • Tiles are wavelet transformed
  • 5/3 reversible or 9/7 irreversible, also user
    defined.
  • Selectable number of transform levels.
  • Each subband from the transform is further broken
    up into code blocks (typically 32x32 or 64x64)
    for entropy coding.
  • Each code block is entropy coded, starting from
    the top bit plane and working down.
  • The current bit of each pixel is passed to an
    arithmetic coder, along with context information.
  • The MQ encoder takes advantage of any skewing of
    the probability for each context, and adapts
    contexts as the coding progresses.
  • Packets are formed by combining the entropy coder
    outputs from a single resolution.
  • Tile parts are formed from all the packet in a
    given bit plane.

15
JPEG2000 Architecture Drivers
  • To achieve high data rates, the processing must
    be paralleled as much as possible.
  • The tall pole in the tent is the arithmetic
    coding, because the coding of a single data bit
    with its context can take several clock cycles.
  • Significance propagation coding is also a
    challenge, because each coefficient must be
    accessed many times, as each bit plane is
    processed.
  • Other operations, such as wavelet transform, code
    block loading, and packet generation are much
    more efficient, and require fewer parallel paths.
  • A pipelined architecture with many entropy coders
    in parallel was used to achieve the required
    throughput.

16
Architecture Description
  • Processes 256x256 tiles.
  • Pipelined architecture, using separate external
    memories for image, tile, and compressed data
    storage.
  • 19 Entropy coders working in parallel to improve
    throughput, one for each code block.
  • 64x64 code blocks.
  • FIFO buffering between the stages improves data
    flow efficiency.
  • A rolling wavelet transform is used to reduce
    memory accesses and improve efficiency.
  • Entropy coder outputs are formed into layers,
    giving each tile a progressive output format.
  • Tile parts are interleaved as the image tiles are
    processed.
  • Performs lossy or lossless compression.

17
NTRCC-R Implementation Results
  • The JPEG2000 encoder was targeted to the V2Pro 70
    FPGA on the NTRCC-R.
  • Lossless or Lossy compression.
  • Data precision up to 13 bits.
  • Simulation and Routing Results
  • Slices 30043 out of 33088, 90
  • Block RAMS 148 out of 328, 45
  • Max system clock 43 MHz without optimization.
  • Hardware Throughput
  • 140 Mbps w/ 33 MHz clock (depending on image.)
  • 180 Mbps w/ 43 Mhz clock.

18
JPEG2000 Floorplan
  • The Pro 70 Device is quite full!

19
Planned Improvements
  • Optimize design to hit 66 MHz.
  • Un-optimized design will operate at up to 43 MHz.
  • Use of asynchronous fifos will allow optimal
    clocking of various parts of the design.
  • Improve pipelining of code block loader and
    wavelet transform.
  • Allow autonomous operation of each stage, so
    that operations take place as soon as input data
    and output buffers are ready.
  • Make use of additional QDR SRAMs available to
    each coprocessor by creating separate buffers for
    wavelet transform and packetizer output.
  • NTRCC has 4 QDR memories for each coprocessor.
  • Arithmetic coder bypass.
  • Arithmetic coder requires gt 2 cycles per bit
    coded, on average.
  • 9/7 wavelet transform with quantization.
  • Use of the 9/7 wavelet results in better SNR and
    max error performance for lossy compression.
  • Add RapidIO serial interface to Network Interface
    Chip (NIC).

20
Conclusions
  • The JPEG2000 core is expected to provide a
    valuable option for satellite imagery systems.
  • Compression will result in a dramatic improvement
    in system performance.
  • Lossless compression will allow 70 more image
    data to be stored and downlinked by a system.
  • Lossy compression will allow even greater
    improvements.
  • NTRCC hardware is an excellent platform for the
    compressor.
  • High bandwidth interconnect and I/O (several
    Gbps).
  • High bandwidth external memories.
  • Excellent processing capability with the
    Virtex-II Pro devices.
  • The skys the limit!
  • Target rate of 600 Mbps per device appears to be
    a realistic goal.
  • Some improvements are left to be made to the
    clock rate and pipelining of the design.
Write a Comment
User Comments (0)
About PowerShow.com