Title: PCI Express DMA Engine f
1PCI Express DMA Engine für Active Buffer Projekt
im CBM Experiment
- Wenxue Gao, Andreas Kugel, Reinhard Männer,
Holger Singpiel, Andreas Wurz - Uni. Mannheim
- DPG Tagung, Gießen
- 14 März 2007
2Inhalt
- Einleitung
- Blockdiagramm
- Realisierung
- Leistung
2 von 15
3Einleitung CBM Experiment
CBM TSR, Jan. 2006
4Einleitung PCI Express
- 2,5 Gbps pro Link
- Point-to-Point
- TLP (Transaction Layer Packet)
- Post MWr (Memory Write Request),
- Non-post MRd (Memory Read Request),
- Completion CplD, Cpl,
- Message Msg
4 von 15
5PCI Express Post TLP (MWr, )
Trn.
Host
End-Point
Rx
Tx
6PCI Express Post TLP (MWr, )
Trn.
Host
End-Point
Rx
Tx
MWr1
7PCI Express Post TLP (MWr, )
Trn.
Host
End-Point
Rx
Tx
MWr1
8PCI Express Post TLP (MWr, )
Trn.
Host
End-Point
Rx
MWr1
Tx
9PCI Express Post TLP (MWr, )
Trn.
Host
End-Point
Rx
MWr1
Tx
MWr2
10PCI Express Post TLP (MWr, )
Trn.
Host
End-Point
Rx
MWr1
Tx
MWr2
11PCI Express Post TLP (MWr, )
Trn.
Host
End-Point
Rx
MWr1
MWr2
Tx
12PCI Express Post TLP (MWr, )
Trn.
Host
End-Point
Rx
MWr1
MWr2
Tx
MWr3
13PCI Express Post TLP (MWr, )
Trn.
Host
End-Point
Rx
MWr1
MWr2
Tx
MWr3
14PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
Tx
15PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
Tx
MRd1
16PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
Tx
MRd1
17PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
MRd1
Tx
18PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
MRd1
Tx
MRd2
19PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
MRd1
Tx
MRd2
20PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
MRd1
Tx
MRd2
21PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
CplD1
MRd1
Tx
MRd2
22PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
CplD2
CplD1
MRd1
Tx
MRd2
23PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
CplD2
CplD1
MRd1
Tx
MRd2
24PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
CplD2
CplD1
MRd1
Tx
MRd2
25PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
CplD2
CplD1
MRd1
Tx
MRd2
26PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
CplD1
CplD2
MRd1
Tx
MRd2
27PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
CplD1
CplD2
MRd1
Tx
MRd2
28PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
CplD1
CplD2
MRd1
MRd1
Tx
MRd2
MRd2
29PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
CplD1
CplD2
Tag70
MRd1
MRd1
Tx
MRd2
MRd2
30PCI Express Non-post TLP (MRd, )
Trn.
End-Point
Host
Rx
CplD1
CplD2
Tag70
MRd1
MRd1
Tx
MRd2
MRd2
31Einleitung SG DMA
- SG(Scatter/Gather)
- Multiple-descriptor chain
- Voll-Duplex
- Downstream Host ? Endpoint
- Upstream Endpoint ? Host
- Done Zustand
- Status Register
- Interrupt
32Blockdiagramm
33Channel Buffer
- TLP Channel FIFO
- Breite 128
- Tiefe 15
- TLP ohne Payload
- Alles im Word
- TLP mit Payload
- Lokale Adresse
- Zusätzliche Informationen
9 von 15
34Realisierung DMA teilen
- 4 KB Grenze verboten
- Address/Length Combination
35Realisierung Done bestätigen
- Wann ist DMA beendet?
- Done Zustand nötig
- CplDs für unterschiedliche MRds kommen nicht
folgend - Mögliche Lösungen
- Tag RAM lesen
- CplD zählen
- Channel Buffer leer
- Letzten Tag triggern (x)
- Bitmap füllen
- 128-bit Register für 7-bit Tags
11 von 15
36Leistungsparameter
- Zielbaustein
- Virtex4 XC4VFX60-11ff672
- FFs
- 9 834 out of 50 560 ( 19 )
- LUT4s
- 11 464 out of 50 560 ( 22 )
- RAMb16
- 58 out of 232 ( 25 )
- Slices
- 9 426 out of 25 280 ( 37 )
- Frequenz ( trn_clk )
- 250 MHz
- Verzögerung (Transaction layer)
- PIO 52 ns (MRd ? CplD )
- DMA 80 ns (DMA Start ? Tx TLP)
- Theoretische Bandbreite
- 2Gbps x4 8Gbps, bi-directional
12 von 15
374-Lane Tests
38Offene Fragen
- Kleinerer Channel Buffer
- Meistens reichen 64-bit, statt 128-bit
- Bessere Behandlung von Fehlern
- Teilweise unvollständig
- Ãœberschreiben von CplD zu vermeiden
- Time-out
- tag Recycling
- Höhere Bandbreite für downstream DMA
39Zusammenfassung
- PCI Express Vorteile
- Parallelität
- Skalierbarkeit
- Virtual channels
- 2 DMA Channels
- 1 PIO Channel
- Xilinx Lösung
- 62,5 MHz für x1
- 250 MHz für x4
15 von 15
40(No Transcript)
41x4-ABB
- Design Summary
- --------------
- Logic Utilization
- Number of Slice Flip Flops 9,834 out of
50,560 19 - Number of 4 input LUTs 11,464 out of
50,560 22 - Logic Distribution
- Number of occupied Slices 9,426 out of
25,280 37 - Total Number 4 input LUTs 12,993 out of
50,560 25 - Number used as logic 11,464
- Number used as a route-thru 643
- Number used for Dual Port RAMs 202
- Number used as Shift registers 684
- Number of bonded IPADs 18 out of
62 29 - Number of bonded OPADs 16 out of
24 66 - Number of bonded IOBs 1 out of
352 1 - Number of BUFG/BUFGCTRLs 5 out of
32 15 - Number used as BUFGs 4
42X4 Test
43DMA Prozess
- Buffer-descriptor
- SA (Source Address)
- DA (Destination Address)
- NXA (Next Descriptor Address)
- Length (Length in bytes)
- Control (Control register)
- Start/Stop Befehl
- Upstream MWr MRd (dex)
- Downstream MRd
- Busy/Done Zustände erkennen
- Status Register
- Interrupt (Msg)
44Blockdiagramm
45Verifizieren
- PIO DMA (random)
- Transaction length
- Address-pair
- Chain length (DMA)
- Descriptor Address (DMA)
- Flow control _rdy_n
- Output checking
- tsof/teof
- Data
- Deskriptor abteilen
46Memory Space
- BRAM
- 16KB
- FIFO
- 32 x 32
- Loop-back
- Registers
- Write / Read
- Control / Status
- Eventuelle Erweiterung
- DDR (BRAM ähnlich)
- GbE (FIFO ähnlich)