Emulated Digital CNN-UM Implementation of a - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Emulated Digital CNN-UM Implementation of a

Description:

Emulated Digital CNN-UM Implementation of a 3-dimensional Ocean Model on FPGAs Zolt n Nagy, P ter Szolgay Introduction Cellular Neural/Nonlinear Networks Universal ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 24
Provided by: nag72
Learn more at: http://www.klabs.org
Category:

less

Transcript and Presenter's Notes

Title: Emulated Digital CNN-UM Implementation of a


1
Emulated Digital CNN-UM Implementation of a
3-dimensional Ocean Model on FPGAs
  • Zoltán Nagy, Péter Szolgay

2
Introduction
  • Cellular Neural/Nonlinear Networks Universal
    Machine (CNN-UM)
  • Ocean modeling
  • Results
  • Conclusions

3
Cellular Neural/Nonlinear Networks (CNN)
  • 2 or N dimensional grid
  • Locally connected
  • Analog processing elements
  • State value is continuous in time

4
Structure of a CNN cell
  • uij input
  • xij state
  • yij output
  • zij constant bias
  • Aij,kl feedback template
  • Bij,kl feed-forward template

5
CNN-UM implementations
  • Software simulation
  • Easy to implement
  • Slow, even if using processor specific
    instructions
  • Emulated digital VLSI
  • Specialized digital architecture
  • Selectable computing precision (Castle
    architecture 1, 6, 12 bit)
  • Orders faster than the software simulation
  • Long design time
  • Analog VLSI
  • Huge computing power (TeraOP/s)
  • Low accuracy (7-8 bit)
  • Noise and temperature sensitivity

6
Structure of the Falcon emulated digital CNN-UM
  • Mixer
  • Contains cell values for the next updates
  • Memory unit
  • Contains a belt of the cell array
  • Template memory
  • Arithmetic unit
  • Processors can be connected on a grid
  • Linear speedup

7
Structure of the arithmetic unit
  • Cell update in row wise order
  • Cycle time depends on template size
  • Fully pipelined

8
Configurable parameters
  • State, template and constant width between 2 to
    64 bits
  • Number of templates
  • Size of the templates
  • Width of the cell array slice
  • Number of layers
  • Number and arrangement of the processor cores

9
Example Solution of a simple PDE on CNN
  • The Wave equation
  • Spatial discretization
  • 2 layer CNN

10
Ocean models
  • Barotropic model
  • Baroclinic models
  • z-coordinate model
  • s-coordinate model
  • isopycnal
  • Fine resolution models
  • Real-time forecast
  • Fishing industry
  • Search and rescue
  • Coarse resolution models
  • Long term predictions
  • Climate modeling

11
The Princeton Ocean Model (POM)
  • Sigma coordinate model
  • Vertical coordinate is scaled on the water column
    depth
  • Second moment turbulence closure sub-model
  • Provides vertical mixing coefficients
  • Solution technique Mode splitting
  • Internal mode (3D)
  • Vertical structure equations
  • Implicit solution
  • External mode (2D)
  • Vertically integrated equations
  • Explicit solution (Leapfrog method)

12
Governing equations of the external (2D) mode
  • H depth of the ocean
  • g gravitational acceleration
  • tw, tb wind and bottom stress
  • A lateral viscosity
  • ux, uy mass transport
  • ? free surface elevation
  • O angular rotation of the Earth
  • T latitude

13
Solution on CNN
  • Spatial discretization on a uniform grid
  • 3-layer CNN structure
  • Non-linear template required for advection term
  • Cannot be solved on analog VLSI CNN chips
  • Solvable on the modified Falcon architecture
  • Support of non-linearity
  • Specialized cell model

14
The modified arithmetic unit of the Falcon
architecture
15
Implementation on FPGA
  • Complicated arithmetic unit
  • Fixed-point number representation
  • Configurable precision
  • High level hardware description language
    required(e.g. Handel-C)

16
Performance
17
The Seamount problem
18
Results after 72 hours
Circulation pattern
Elevation
19
Error of the solution
20
Error of the solution
21
Memory requirements of the internal (3D) equations
  • Extended memory hierarchy
  • New level stores 3 cross sectional slices from
    the 3D array
  • Large memory required (e.g. 512x512x64 sized
    grid, 3x512x64 elements per state variable)
  • Cannot be stored on-chip
  • Off-chip storage requires huge I/O bandwidth
  • Processor array should be used
  • The 3D array is divided between the processors
  • Optimal data set for on chip storage 2048
    elements per cross sectional slice (512x32x64
    sized grid per processor)
  • Each processor located on a separate FPGA

22
Solution of the internal (3D) equations
  • Implicit solution
  • Fixed-point solution
  • Requires large precision to avoid rounding errors
  • Seems to be impractical
  • Floating-point solution
  • Requires large area (especially add/sub)
  • Explicit solution
  • Smaller timestep
  • Simpler arithmetic unit

23
Conclusions
  • Ocean modeling using emulated digital CNN is very
    promising
  • Moderate precision is required in 2D mode
  • 1 accuracy using 24 bits
  • Expected speedup (compared to an Athlon64 2GHz
    microprocessor)
  • 80 times on our RC200 prototyping board
  • 3700 times on the largest available FPGA
Write a Comment
User Comments (0)
About PowerShow.com