Title: ALTERA FPGAs and NIOSII
1ALTERA FPGAs and NIOSII
- ELG6158 Computer Systems Architecture
- Miodrag Bolic
2Presentation Outline
- Basic description of Stratix Altera Devices
- NIOS II processor architecture
- How to design a system using NIOS II processor
3Stratix EP1S10 2
4(No Transcript)
5(No Transcript)
6TriMatrix Memory 1
Dedicated External Memory Interface
M512 Blocks
M4K Blocks
M-RAM
- Small FIFOs
- Shift Register
- Rake Receiver Correlator
- FIR Filter Delay Line
- Packet / Data Storage
- Nios Program Memory
- System Cache
- Video Frame Buffers
- Echo Canceller Data Storage
- Header / Cell Storage
- Channelized Functions
- ATM cellpacket processing
- Nios Program Memory
- Look-Up Schemes
- Packet Cell Buffering
- Cache
More Bits For Larger Memory Buffering
512 Kbits per block parity
4 Kbits per block parity
512 bits per block parity
More Data Ports for Greater Memory Bandwidth
7Memory Bandwidth SummaryStratix Device Family 1
Device Total RAM Bits M-RAM Blocks M4K Blocks M512 Blocks MaximumBandwidth (Mbps)
EP1S10 920,448 1 60 94 1,245,024
EP1S20 1,669,248 2 82 194 2,096,928
EP1S25 1,944,576 2 138 224 2,894,400
EP1S30 3,317,184 4 171 295 3,750,192
EP1S40 3,423,744 4 183 384 4,384,800
EP1S60 5,215,104 6 292 574 6,762,528
EP1S80 7,427,520 9 364 767 8,784,720
8(No Transcript)
9Logic Array Blocks (LAB) 2
Control Signals
- 10 LEs
- Local Interconnect
- LAB-Wide Control Signals
4
4
4
4
4
Local Interconnect
4
4
4
4
4
10LAB Arrangement
- LABs Communicate Directly to Each Other Other
Blocks Both Horizontally Vertically
LAB Column
LAB
LAB
LAB
LAB
LAB
LAB
M512
LAB Row
LAB
LAB
LAB
LAB
LAB
LAB
M512
11Logic Elements
- Smallest Units of Logic
- Used for Combinatorial/Registered Logic
Register ChainInput
Carry-In
LUT ChainInput
Stratix LE
General Routing Local Routing
Carry-Out
Register ChainOutput
LUT ChainOutput
12Total LE Resources
Device Total LEs
EP1S10 10,570
EP1S20 18,460
EP1S25 25,660
EP1S30 32,470
EP1S40 41,250
EP1S60 57,120
EP1S80 79,040
13LE Datasheet Image
14LE Features
- 4-Input Look-Up Table (LUT)
- Configurable Register
- 2 Operation Modes
- Dynamic Add/Subtract Control
- Carry-Select Chain Logic
- Performance-Enhancing Features
- LUT Register Chain
- Area-Enhancing Features
- Register Packing Feedback
15LE Inputs/Outputs
- Inputs
- 4 Data
- 2 LE Carry-Ins 1 Lab Carry-In
- 1 Dynamic Addition/Subtraction Control
- Register Controls
- Outputs
- 2 LE Carry-Outs
- 2 Row/Column/DirectLink Outputs
- 1 Local Output
- 1 LUT Chain 1 Register Chain
16Operation Modes
- Normal
- General Combinatorial or Registered Logic
- Dynamic Arithmetic
- Used for
- Adders
- Counters
- Accumulators
- Comparators
- Uses Carry Chain for Faster Operation
- Chosen Automatically by Quartus II NativeLink
Synthesis Tools - Based on Design Design Constraints
17LE Register Controls
- Clock/Clock Enable
- Synchronous Asynchronous Clear
- Synchronous Asynchronous Load Data
- Asynchronous Preset
- Preset Function Loads a 1
ALD/PRE
ADATA
Q
D
ENA
CLRN
18Normal Mode
LUT Chain Input
Register Chain Input
Register Control Signals
addnsub
cin
(2)
data1
4-Input LUT
Sync Load Clear Logic
data2
Row, Column DirectLink Routing
data3
data4
Local Routing
Register Feedback
Register Chain Output
LUT Chain Output
- Note
- Functional Diagram Only. Please See Datasheet
for more Details. - Addnsum data1 connected via XOR logic
19Combinatorial Logic Only
LUT Chain Input
Register Chain Input
Register Control Signals
addnsub
cin
(2)
data1
4-Input LUT
Sync Load Clear Logic
data2
Row, Column DirectLink Routing
data3
data4
Local Routing
Register Feedback
Register Chain Output
LUT Chain Output
- Note
- Functional Diagram Only. Please See Datasheet
for more Details. - Addnsum data1 connected via XOR logic
20Sequential Logic Only
LUT Chain Input
Register Chain Input
Register Control Signals
addnsub
cin
(2)
data1
4-Input LUT
Sync Load Clear Logic
data2
Row, Column DirectLink Routing
data3
data4
Local Routing
Register Feedback
Register Chain Output
LUT Chain Output
- Note
- Functional Diagram Only. Please See Datasheet
for more Details. - Addnsum data1 connected via XOR logic
21Dynamic Arithmetic Mode
Register Chain Input
Register Control Signals
LAB Carry-In
Carry-In Logic
Carry-In0
Carry-In1
addnsub
data1
Sum Calculator
Sync Load Clear Logic
data2
Row, Column DirectLink Routing
data3
Carry Calculator
Local Routing
Carry-Out Logic
Carry-In0
Carry-In1
Register Chain Output
Carry-Out1
Carry-Out0
Note Functional Diagram Only. Please See
Datasheet for more Details.
22Carry-Select Logic
- Each Cell Pre-Calculates Sum Carry-Out for
Carry 1 Carry 0 - Carry-In Selects which Pre-Calculation Is Used
CIN
1
0
Single LUT
A0B01
A0B00
SUMOUT
COUT1
COUT0
COUT
23Carry Chain Details
0
1
LAB Carry-In
A1
LE1
Sum1
LE1
B1
A2
Sum2
LE2
LE2
B2
- Carry Chains Begin End in Any LE
- 2 Carry Chains Can Exist In Any LAB
- Carry-Select Generated in LEs 5 10
- Every LE Not in Critical Timing Path
A3
Sum3
LE3
LE3
LE3
B3
A4
Sum4
LE4
LE4
B4
LE5
Sum5
A5
B5
1
0
LE6
Sum6
A6
B6
LE7
Sum7
A7
B7
A8
LE8
Sum8
B8
Sum9
A9
LE9
B9
A10
LE10
Sum10
B10
LAB Carry-Out
24LUT Register Chains
- LUT Chain
- Output of LUT Connects Directly to LUT Below
- Available Only In Normal Mode
- Ex. Wide Fan-In Functions
- Register Chain
- Output of Register Connects Directly to Register
Below (Shift Register) - LUT Can Be Used for Unrelated Function
- Ex. LE Shift Register
- Both Chains End at LAB Boundary
LE1
LUT
LE2
LUT
Register Chain
LUT Chain
LEs 3 - 10
25Stratix Interconnects
- Global Signals
- LE Register Chains
- Carry Chains
- Local Interconnect
- DirectLink
- MultiTrack Interconnects
- Row Interconnects
- Column Interconnects
26Local Interconnect
- Groups 10 LEs Together
- Provides Input Signals to Blocks (LABs, Memory,
DSP Blocks)
LAB
M512
of Local Lines Depends on Block
27DirectLink
- Allows Blocks to Drive Local Interconnects of
Neighboring Blocks in the Same Row
M512
28DirectLink (cont.)
- Provides Fast Communication between Neighboring
Blocks - One LE Has Fast Access to Up to 29 Other LEs in
Area - Saves Row Resources
29MultiTrack Interconnect Architecture
- Provides Connections between All Device Blocks
- Series of 3 Types of Continuous Row Column
Interconnects - Each Has a Fixed Speed and Length
- Constant Performance Across Family Members within
Given Area - Simplifies Block Design
- Same Routing Resources Available Regardless of
Location
30Row Resources
- 3 Row Interconnect Lengths
- R4
- R8
- R24
4 LABs
160 Lines Wide
R4
R8
48 Lines Wide
R24
24 Lines Wide
31Row Resources (cont.)
- Each Block Has Own Row Resource to Drive Right
and Left
R4 Routing Line Driving Right
R4 Routing Line Driving Left
32Row Resource Details
- R4
- Terminate at M-RAM
- R8
- Only Connect to Local R8/C8 Interconnects
- Terminate at M-RAM
- Faster than 2 R4s
- R24
- Do Not Interface with Blocks Directly
- Can Cross M-RAM
- Fastest Resource for Long Connections (Ex. Design
Block to Design Block)
33Column Resources
C16
- 3 Interconnect Lengths
- C4
- C8
- C16
- Features Similar to Row Interconnects
- Each Block Has Column Resource to Drive Up and
Down - Interconnects Are Staggered
- Interconnects Can Drive End-to-End
C8
C4
4 LABs
34Presentation Outline
- Basic description of Stratix Altera Devices
- NIOS II processor architecture
- How to design a system using NIOS II processor
35(No Transcript)
36NIOS II Overview 3
- Soft IP Core
- A soft-core processor is a microprocessor fully
described in software, usually in an HDL, which
can be synthesized in programmable hardware, such
as FPGAs. - Reduced Instruction Set Computer (RISC)
- No pipeline, 5 or 6 stages pipeline
configurations - Full 32-bit instruction set, data path, and
address space - 32 general-purpose registers
- 32 external interrupt sources
- Access to a variety of on-chip peripherals, and
interfaces to off-chip memories and peripherals - Software development environment based on the GNU
C/C tool chain and Eclipse IDE
37NIOS II Scalability
- Powerful multiprocessing systems can be built
38NIOS II Processor Core 3
39Implementation
- The functional units of the Nios II architecture
form the foundation for the Nios II instruction
set. - The Nios II architecture describes an instruction
set, not a particular hardware implementation. - Trade-offs
- More or less of a feature - amount of instruction
cache memory. - Inclusion or exclusion of a feature - the JTAG
debug module. - Hardware implementation or software emulation -
divider
40Types of Processors
41Memory Organization
42Cache Performance
Memory I-Cache D-Cache Normalised
Performance SDRAM No
No 40.2 SDRAM No Yes 55.2 SDRAM Yes No 6
4.3 SDRAM Yes Yes 96.4 OnChip No No 100.0
OnChip No Yes 98.0 OnChip Yes No 110.2
OnChip Yes Yes 105.6
Memory I-Cache D-Cache Normalised
Performance SDRAM No
No 40.2 SDRAM No Yes 55.2 SDRAM Yes No 6
4.3 SDRAM Yes Yes 96.4 OnChip No No 100.0
OnChip No Yes 98.0 OnChip Yes No 110.2
OnChip Yes Yes 105.6
Performance relative to on chip RAM with no Cache
running dhry.c modified for unbuffered I/O
43Tightly Coupled Memory
- Fast data buffers
- Fast sections of code
- Fast interrupt handler
- Critical loop
- Constant access time guaranteed not to have
arbitration delays - Up to 4 tightly coupled memories
- Software Guidelines
- Software accesses tightly-coupled memory
addresses just like any other addresses. - Cache operations have no effect when targeting
tightly-coupled
44Pipelining
- Static branch prediction is implemented using the
branch offset direction - a negative offset is predicted as taken
- a positive offset is predicted as not-taken
45(No Transcript)
46Presentation Outline
- Basic description of Stratix Altera Devices
- NIOS II processor architecture
- Review pipelining techniques
- Review memory access techniques
- How to design a system using NIOS II processor
47(No Transcript)
48Hardware Abstraction Layer (HAL) 4
- Isolates the application software from hardware
modifications. - Applications are device-independent because they
abstract information from such systems as - Character mode devices UART core, JTAG UART
core, LCD display controller - Flash memory devices
- Timer devices
- DMA controller core
- Ethernet MAC/PHY Controller
- HAL application program interface (API) is
integrated with the ANSI C standard library.
49Layers of HAL API 4
- HAL library generatioin
- SOPC Builder generates a hardware system
- Nios II IDE generates a custom HAL system library
to match the hardware configuration - Changes in the hardware configuration
automatically propagate to the HAL device driver
configuration - NIOS II is programmed in C
50Programming NIOS II Processor 4
- Programming UART
- Standard Input, Standard Output routines in C
- --------------------------------------------------
- - include ltstdio.hgt
- include ltstring.hgt
- int main (void)
-
- char msg hello world
- FILE fp
- fp fopen (/dev/uart1, w)
- if (fp)
-
- fprintf(fp, s,msg)
- fclose (fp)
-
- return 0
-
51References
- Altera Corp., Stratix Stratix II Module 3
Using TriMatrix Memories, 2004 - Altera Corp., Stratix Module 2 Logic Structure
MultiTrack Interconnect, 2004. - Altera Corp., Nios II Processor Reference
Handbook, 2005. - Altera Corp., Nios II Software Developer's
Handbook, 2005.