APIC Stalling problem Notes - PowerPoint PPT Presentation

About This Presentation
Title:

APIC Stalling problem Notes

Description:

WASHINGTON UNIVERSITY IN ST LOUIS. dips in output are due to Kernel printing error msgs ... WASHINGTON UNIVERSITY IN ST LOUIS. sys/arch/i386/isa/vector.s ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 43
Provided by: johnd51
Category:
Tags: apic | arch | louis | notes | problem | st | stalling

less

Transcript and Presenter's Notes

Title: APIC Stalling problem Notes


1
  • APIC Stalling problem Notes
  • with additional notes on Interrupts
  • John DeHart
  • Washington University
  • jdd_at_arl.wustl.edu
  • http//www.arl.wustl.edu/jdd

2
Issue
  • There seems to be a bug in the system
  • Right now only shows up on SPC-II
  • In the past, has shown up on SPC-I
  • But this could be similar symptoms of different
    problems.
  • No recollection of it ever showing up on end
    hosts.
  • All these different systems have different timing
  • We ran into this problem in preparing for and
    doing the WU 150th anniversary demo.
  • Fred is having this problem in his kernel
    testing.
  • JohnD is having this problem in his final SPC-II
    performance testing.

3
Issue (continued)
  • Symptoms
  • Transmit queue stalls for paced connections
  • Resuming connection as BE (then Paced) clears the
    queue most of the time
  • sometimes it then stalls again and eventually we
    can not resume it.
  • Also stalls for BE connections
  • Resuming as BE gets the data flowing again
  • actually resuming ANOTHER channel causes the
    stalled channel to resume
  • This seems to imply a possible global pacer
    problem?
  • When it stalls and the APIC runs out of
    descriptors, we do get an ERROR interrupt for the
    out of descriptors state.
  • This seems to imply that the APIC and ICU are in
    a state such that they can generate an APIC
    interrupt to the CPU.
  • If the APIC had generated an interrupt that had
    been lost the APIC and/or ICU would probably
    not be in a state that would allow another APIC
    interrupt to reach the CPU.
  • Seems to be traffic rate related
  • as the traffic rate approaches the limit of what
    we can process the problem is more likely to show
    itself
  • Seems to be SPC related
  • some SPCs show the problem more readily than
    others

4
Tools Assembled
  • Monitoring GUI
  • PCI Bus analyzer
  • setup for it save in
  • jddlap/C/SPC_II/PCI_Traces/SPC_II_PCI_Setup.stp
  • /project/arl/jdd/SPC_II/PCI_Traces/SPC_II_PCI_Setu
    p.stp
  • SPCWatch
  • using APIC control cells dump portions of memory
    and APIC registers without going through the
    kernel. Going through the kernel sometimes
    changes the state of the current problem by
    resuming a stalled xmit connection. Depending on
    how much memory is being dumped, this may take a
    long time (16 bytes of memory per APIC control
    cell).
  • scripts using it are in
  • /d/jdd/wu_arl/HARDWARE_TESTS/SPC_TEST_PCI/
  • dumpAllMSRDescs dumps ALL 64K APIC Descriptors
  • dumpMSRrxDescs dumps all 8K Rx descriptors
  • dumpMSRtxDescs dumps all 8K Tx descriptors
  • getTxConnAndChanStatusRegs retrieves the
    Connection and Channel Status regs for a Tx chan.
  • sencmd
  • SPC/usr/local/bin/datatest
  • SPC/usr/local/bin/readCounts
  • Jammer

5
Useful Notes
  • APIC Sync Bits
  • 0 DONE_VALIDLINK (APIC is done, belongs to the
    Driver now)
  • 1 DONE_INVALIDLINK (should never happen for Tx)
  • 2 NOT_READY (Belongs to the Driver)
  • 3 READY (Belongs to the APIC)
  • Kernel modified to support PCI Bus Analyzer
  • Bus Analyzer requires line card to be removed
    from SPC-II
  • SPC-II with no line card does not get a grant for
    sending data to the line card so the FPGA Fifos
    fill up and drop cells. Ick.
  • Kernel modified to send external data to switch
  • with this we can also monitor the output rate of
    the external data VCs.
  • APIC Descriptor Address Ranges
  • Index 0 Addr 0x1d17000 Invalid descriptor
  • Index 0001 - 8192 Addr 0x1d17010 0x1d37000
    Rx Descs
  • Index 8193 16384 Addr 0x1d37010 0x1d57000
    Tx Descs
  • APIC Registers of interest
  • 0x518 Interrupt Acknowledge Register
  • 0x530 Notification Register
  • 0xD500CH08 TX Channel status register
  • 0xD500CHF0 TX Channel BE Resume register

6
Questions
  • Does the driver handle multiple buffers chained
    together on receive properly?
  • It is possible for the last cell of a packet to
    get dropped making the packet look like a long
    packet spanning multiple buffers.
  • Are there any buffer start address concerns?
  • old notes on APIC bug which caused us to align
    buffers on 48 and 56 byte boundaries
  • This is the RX Sync bug (July 1999 Kits slides)
    which locks up the APIC and needs a reset to get
    going again. This does not sound like what is
    happening to us now.
  • although, could this be what eventually happens
    after a few resumes when the SPC locks up?

7
Issue (continued)
  • Suspects
  • Lost interrupt
  • APIC Hardware bug
  • interrupt handling
  • timing between two instances of INTR signal being
    asserted.
  • descriptor handling
  • pacer
  • flow control
  • other?
  • APIC driver bug
  • interrupt handling
  • descriptor handling
  • other?
  • NetBSD Interrupt handling bug
  • SPC-II FPGA flow control bug

8
Issue (continued)
  • Plan of Attack
  • Analyze apic driver code
  • compare MSR vs. end host driver code.
  • Get details of descriptor chain when it stalls
  • dump APIC descriptor chain as it exists in memory
  • dump APIC current descriptor chain register for
    stalled channel
  • monitor interrupt counts on SPC-II and compare to
    packet counts
  • vmstat I
  • Note what IRQs are assigned to what at boot time.
  • Turn off SPC-II FPGA flow control to APIC
  • change VHDL
  • rebuild bitfile
  • re-program SPC-II FPGA
  • retest

9
Additional Issue/Symptom
  • We sometimes get into a state where
  • We send an MSR command/control cell to a port
  • The APIC does not register a cell arrival.
  • Neither the OPP transmit cell counter nor the OPP
    drop cell counters on that port increment.
  • Suspect APIC or FPGA flow control issue

10
SPC II FPGA Architecture
SPC-II CLOCK DOMAINS
PCI Bus Port
APIC
Reset
Port 1
Port 0
Reset
16
16
16
16
B
OSC
C
B
G
1
3
6
D
16/32
VPI01
VPI01 VCI 38
A
2
32
VPI00
4
64ltVCIlt127???
32
5
H
16/32
E
FPX
Switch
LC
SPC-II FPGA
11
SPC FPGA Fifos
  • FIFO 1 Large Sync Fifo 512 Words 36 cells
  • FIFO 2 Large Async Fifo 512 Words 36 cells
  • FIFO 3 Tiny Sync Fifo 64 Words 4 cells
  • FIFO 4 Tiny Sync Fifo 64 Words 4 cells
  • FIFO 5 Medium Async Fifo 128 Words 9 cells
  • FIFO 6 Medium Sync Fifo 128 Words 9 cells

12
Flow Control Test 1
  • Send data from Switch to SPC-II
  • transit through APIC from Port 1 to Port 0
  • SPC-II is reset, no kernel running
  • No data crossing PCI bus
  • No descriptors/buffers used
  • Overload 16 bit APIC interface
  • Send 1.2 Gb/s
  • 982 Mb/s goes through APIC
  • 220 Mb/s is dropped in OPP CS0 buffer
  • Turn data on/off repeatedly
  • no stall/hang-up
  • when data turned back on it continues to transit
    APIC

13
Flow Control Test 2A (AAL5Generator)
  • Send data from Switch to SPC-II
  • Load kernel (JDDs BE Debug Kernel) and process
    packets
  • Configure switch and routes so that two input
    ports (P1, P5) get a copy of the traffic to be
    routed.
  • Configure the two input ports (P1, P5) routes so
    that they route the traffic to Egress port 0
  • Overload APIC processing in Kernel on Port 0
  • send 60 Mb/s at each input port
  • using AAL5Generator smooth pacing at batch (8)
    of cells level
  • total of 120 Mb/s at output port
  • pkt sz 1500 bytes (
  • Kernel error messages
  • RX CID (65 and 69) out of descriptors
  • indicates we are sending more data at the kernel
    than it can handle
  • Bad CRC
  • indicates cells are being dropped somewhere
  • either APIC or SPC-II FPGA.
  • Probably APIC, if it was FPGA, it would flow
    control switch
  • but we may not be sending enough for FC to back
    up all the way through OPP buffer.
  • But no cells are dropped in OPP
  • indicates SPC-II FPGA is not flow controlling
    switch

14
dips in output are due to Kernel printing error
msgs
15
Flow Control Test 2B (sendpkts)
  • Send data from Switch to SPC-II
  • Load kernel (JDDs BE Debug Kernel) and process
    packets
  • Configure switch and routes so that two input
    ports (P1, P5) get a copy of the traffic to be
    routed.
  • Configure the two input ports (P1, P5) routes so
    that they route the traffic to Egress port 0
  • Overload APIC processing in Kernel on Port 0
  • send 60 Mb/s at each input port
  • using sendpkts sends batches of packets
  • total of 120 Mb/s at output port
  • pkt sz 1500 bytes (
  • Kernel error messages
  • RX CID (65 and 69) out of descriptors
  • indicates we are sending more data at the kernel
    than it can handle
  • Bad CRC
  • indicates cells are being dropped somewhere
  • either APIC or SPC-II FPGA.
  • Probably APIC, if it was FPGA, it would flow
    control switch
  • but we may not be sending enough for FC to back
    up all the way through OPP buffer.
  • But no cells are dropped in OPP
  • indicates SPC-II FPGA is not flow controlling
    switch

16
Freds Kernel sendpkts B 40 p 10 a 20 c -S
17
Freds Kernel sendpkts B 80 p 10 a 20 c -S
18
Analysis of previous screen dump
  • P0 has stopped sending any pkts out to the link
  • P0 has stopped back pressuring the switch
  • APIC interrupts still being generated and counted
    by kernel (vmstat i)
  • APIC still counting cells arriving
  • APIC NOT counting cells on PCI bus
  • APIC thinks it is getting cells and generating
    Interrupts.
  • What does the kernel think in this state?
  • channel is suspended and needs resuming...
  • when resumed things start working again.
  • This is probably the Ready descriptor error
  • So in this state the APIC is out of descriptors
    and all of its cell buffers are probably full.
  • Is it just continually generating ERROR
    interrupts?
  • And discarding every cell it receives (after
    counting it)?

19
JDDs version of Freds Kernelsendpkts c v S
a 20 x 8000
BE Resume of P0 channel 80 resumes data output.
20
Analysis of previous screen dump
  • After resuming BE twice (each worked), the third
    time it stalled the kernel had crashed.
  • panic kernel assertion 0 failed apic.c, line
    1045
  • This assert is checking that a TX descriptor
    being allocated from the free list has SYNC bits
    set to NOT_READY.
  • need to repeat the test with proper debug turned
    on so we can see what descriptor it is and what
    the sync bits are actually set to.
  • Repeated, after 8 successful resume BE
  • Port 0 (APIC/Crit) msr_apic_txdesc_alloc
    Desc-gtMatchFlags ! DESC_SYNC_NOT_READY!, offset
    15851 sync 0
  • panic kernel assertion 0 failed file
    ../../../../dev/ic/apic.c line 1045

21
APIC errors detected
  • APIC errors that occurred during different runs.
  • --------------------------------------------------
    ----
  • Port -1 (Ctl/Info) msr_process_ctlcell cmd 0x1,
    ver 0, seq 0, len 4, flags 0x9
  • Port 0 (APIC/Error) apic_intr Unexpected RX
    Error on CID 65, chanstatus 0x07
  • apic0 Descriptor Error Match incorrect (not
    0xcafe) 0x07
  • --------------------------------------------------
    ----
  • Port 0 (APIC/Crit) msr_free_txdescs Invalid tx
    desc index (current 14250 or next 128)panic
    kernel
  • assertion "((((txindx) gt ((0x00000001 8192 -
    1) 1)) ((txindx) lt (((0x00000001 8192 -
    1) 1)
  • 8192 - 1))) (((nextindx) gt ((0x00000001
    8192 - 1) 1)) ((nextindx) lt (((0x00000001
    8192 -
  • 1) 1) 8192 - 1))))" failed file
    "../../../../dev/ic/apic.c", line 2332
  • Stopped at 0xf018ff8c leave
  • dbgt
  • --------------------------------------------------
    ----

22
State when we set debug and get stats causes the
xmit channel to come alive again!
23
Freds Kernel sendpkts B 80 p 10 a 20 c
Ssendpkts has stopped sending data???
24
Freds Kernel sendpkts c v S a 20 x 8000
25
Flow Control Test 3
  • Send data from Switch to SPC-II
  • Load kernel and process packets
  • Configure classifier and data pkts so they are
    dropped
  • i.e. no route for destination address.
  • Overload APIC processing in Kernel
  • Turn data on/off repeatedly

26
SPC-I System FPGA
  • Supported
  • Four Interrupts supported and statically
    assigned
  • PIT (IRQ 0)
  • APIC (IRQ 5)
  • COM1 (IRQ 4)
  • COM2 (IRQ 3)
  • Static fully-nested interrupt priority structure.
  • Specific End of Interrupt is the only EOI mode
    supported
  • Not Supported
  • Special Mask Mode
  • Automatic End of Interrupt (AUTO_EOI_1,
    AUTO_EOI_2)
  • Special Fully Nested Mode

27
SPC-II Interrupts
  • Supported by a real Southbridge/ICU
  • FPGA provides flow control
  • but with the traffic patterns and rates we are
    using there should be no flow control asserted.

28
Hardware Interrupt Structure (Ignoring Bus)
MASK/UNMASK
CPU
ACK
INTR
ICU
ACK
INTR
APIC
29
Overview of what happens
  • APIC generates INTR to ICU
  • Apic will not generate another INTR until ACKed
  • ICU pushes INTR(IRQ) onto Bus
  • ICU will only send higher priority interrupts
  • CPU gets INTR
  • MASK IRQ in ICU
  • ICU will not send this IRQ again
  • ACK IRQ in ICU
  • Allows lower priority interrupts from ICU
  • Check priority and hold if lower than current
  • Call APIC inter handler
  • ACK Intr in APIC
  • APIC can generate another INTR to ICU
  • Intr processing
  • process all packets that have been received
  • put packets being forwarded on transmit queue and
    resume transmit queue if needed
  • Return
  • UNMASK IRQ in ICU
  • ICU can send us this IRQ again

30
sys/arch/i386/isa/vector.s
  • include "opt_ddb.h"
  • include lti386/isa/icu.hgt
  • include ltdev/isa/isareg.hgt
  • define ICU_HARDWARE_MASK
  • define IRQ_BIT(irq_num) (1 ltlt ((irq_num) 8))
  • define IRQ_BYTE(irq_num) ((irq_num) / 8)
  • ifdef ICU_SPECIAL_MASK_MODE // SPC System FPGA
    does not support SMM
  • define ACK1(irq_num)
  • define ACK2(irq_num) \
  • movb (0x60IRQ_SLAVE),al / specific EOI for
    IRQ2 / \
  • outb al,IO_ICU1
  • define MASK(irq_num, icu)
  • define UNMASK(irq_num, icu) \
  • movb (0x60(irq_num8)),al / specific EOI
    / \
  • outb al,icu

31
sys/arch/i386/isa/vector.s
  • else / I.E. NOT ICU_SPECIAL_MASK_MODE /
  • ifndef AUTO_EOI_1
  • define ACK1(irq_num) \
  • movb (0x60(irq_num8)),al / specific EOI
    / \
  • outb al,IO_ICU1
  • else
  • define ACK1(irq_num)
  • endif
  • ifndef AUTO_EOI_2
  • define ACK2(irq_num) \
  • movb (0x60(irq_num8)),al / specific EOI
    / \
  • outb al,IO_ICU2 / do the second ICU first
    / \
  • movb (0x60IRQ_SLAVE),al / specific EOI for
    IRQ2 / \
  • outb al,IO_ICU1
  • else
  • define ACK2(irq_num)
  • endif

32
sys/arch/i386/isa/vector.s
  • ifdef ICU_HARDWARE_MASK
  • define MASK(irq_num, icu) \
  • movb _C_LABEL(imen) IRQ_BYTE(irq_num),al
    / imen interrupt mask enable (2 bytes)/
  • orb IRQ_BIT(irq_num),al
    / mask our irq (put a 1 in its
    place) /
  • movb al,_C_LABEL(imen) IRQ_BYTE(irq_num)
  • FASTER_NOP
  • outb al,(icu1)
    / write it to the
    ICU /
  • define UNMASK(irq_num, icu)
  • cli
  • movb _C_LABEL(imen) IRQ_BYTE(irq_num),al
  • andb IRQ_BIT(irq_num),al
  • movb al,_C_LABEL(imen) IRQ_BYTE(irq_num)
  • FASTER_NOP
  • outb al,(icu1)
  • sti
  • else / ICU_HARDWARE_MASK /
  • define MASK(irq_num, icu)
  • define UNMASK(irq_num, icu)

33
sys/arch/i386/isa/vector.s
  • ifdef __ELF__
  • define XINTR(irq_num) Xintr//irq_num
  • define XHOLD(irq_num) Xhold//irq_num
  • define XSTRAY(irq_num) Xstray//irq_num
  • else
  • define XINTR(irq_num) _Xintr//irq_num
  • define XHOLD(irq_num) _Xhold//irq_num
  • define XSTRAY(irq_num) _Xstray//irq_num
  • endif

34
sys/arch/i386/isa/vector.s
  • / Beginning of INTR Macro /
  • define INTR(irq_num, icu, ack)
  • IDTVEC(resume//irq_num)
  • cli
  • jmp 1f
  • IDTVEC(recurse//irq_num)
  • pushfl
  • pushl cs
  • pushl esi
  • cli

Block the CPU from accepting any more interrupts.
35
sys/arch/i386/isa/vector.s
  • XINTR(irq_num)
  • pushl 0 / dummy error code /
  • pushl T_ASTFLT / trap for doing ASTs /
  • INTRENTRY
  • MAKE_FRAME
  • MASK(irq_num, icu) / mask it in hardware /
  • ack(irq_num) / and allow other intrs /
  • incl MY_COUNTV_INTR / statistical info /

ICU will not send us anymore of this IRQ
ACK this IRQ to the ICU. Allows it to generate
other interrupts. Without this the ICU would only
generate higher priority interrupts
When an interrupt occurs the CPU will clear the
interrupt enable bit (equivalent of cli) An iret
restores the bit.
36
sys/arch/i386/isa/vector.s
  • testb IRQ_BIT(irq_num),_C_LABEL(cpl)
    IRQ_BYTE(irq_num)
  • jnz XHOLD(irq_num) / currently masked hold it
    /
  • 1 movl _C_LABEL(cpl),eax / cpl to restore on
    exit /
  • pushl eax
  • orl _C_LABEL(intrmask) (irq_num) 4,eax
  • movl eax,_C_LABEL(cpl) / add in this intr's
    mask /
  • sti / safe to take intrs now /

Pre-computed masks for each IRQ IRQ 0
0xe0000021 IRQ 3 0xe0000039 IRQ 4
0xe0000039 IRQ 5 0xc0000020 0 0 0 0 0 0 0 0
bits 5 4 3 2 1 0 irq
Add IRQ bit to ipending
In Kernel interrupt mask
Allow CPU to accept more interrupts.
37
sys/arch/i386/isa/vector.s
  • movl _C_LABEL(intrhand) (irq_num) 4,ebx /
    head of chain /
  • testl ebx,ebx
  • jz XSTRAY(irq_num) / no handlers we're stray
    /
  • STRAY_INITIALIZE / nobody claimed it yet /
  • incl _C_LABEL(intrcnt) (4(irq_num)) / XXX /

38
sys/arch/i386/isa/vector.s
  • 7 movl IH_ARG(ebx),eax / get handler arg /
  • testl eax,eax
  • jnz 4f
  • movl esp,eax / 0 means frame pointer /
  • 4 pushl eax
  • call IH_FUN(ebx) / call it /
  • addl 4,esp / toss the arg /
  • STRAY_INTEGRATE / maybe he claimed it /
  • incl IH_COUNT(ebx) / count the intrs /
  • movl IH_NEXT(ebx),ebx / next handler in chain
    /
  • testl ebx,ebx
  • jnz 7b
  • STRAY_TEST / see if it's a stray /
  • 5 UNMASK(irq_num, icu) / unmask it in hardware
    /
  • jmp _C_LABEL(Xdoreti) / lower spl and do ASTs /

Call NetBSD Interrupt Handler
Locate a handler for this IRQ
ICU is now able to send us another interrupt for
this IRQ
Return from Interrupt Resume other
interrupts Check for pending interrupts Restore
stack iret
39
sys/arch/i386/isa/vector.s
  • IDTVEC(stray//irq_num)
  • pushl irq_num
  • call _C_LABEL(isa_strayintr)
  • addl 4,esp
  • incl _C_LABEL(strayintrcnt) (4(irq_num))
  • jmp 5b
  • IDTVEC(hold//irq_num) // XHOLD()
  • orb IRQ_BIT(irq_num),_C_LABEL(ipending)
    IRQ_BYTE(irq_num)
  • INTRFASTEXIT
  • / End of INTR Macro /

40
sys/arch/i386/isa/vector.s
  • INTR(0, IO_ICU1, ACK1) / Clock
    interrupt /
  • INTR(1, IO_ICU1, ACK1)
  • INTR(2, IO_ICU1, ACK1)
  • INTR(3, IO_ICU1, ACK1) / COM 2
    Interrupt /
  • INTR(4, IO_ICU1, ACK1) / Com 1 Interrupt
    /
  • INTR(5, IO_ICU1, ACK1) / APIC Interrupt
    /
  • INTR(6, IO_ICU1, ACK1)
  • INTR(7, IO_ICU1, ACK1)
  • INTR(8, IO_ICU2, ACK2)
  • INTR(9, IO_ICU2, ACK2)
  • INTR(10, IO_ICU2, ACK2)
  • INTR(11, IO_ICU2, ACK2)
  • INTR(12, IO_ICU2, ACK2)
  • INTR(13, IO_ICU2, ACK2)
  • INTR(14, IO_ICU2, ACK2)
  • INTR(15, IO_ICU2, ACK2)

41
sys/arch/i386/isa/vector.s
  • /Add a mask to cpl, and return the old value of
    cpl./
  • static __inline int
  • splraise(ncpl)
  • register int ncpl
  • register int ocpl cpl
  • cpl ocpl ncpl
  • return (ocpl)
  • / Restore a value to cpl (unmasking interrupts).
  • If any unmasked interrupts are pending,
  • call Xspllower() to process them./
  • static __inline void
  • splx(ncpl)
  • register int ncpl
  • cpl ncpl
  • if (ipending ncpl)
  • Xspllower()

/Same as splx(), but we return the old value of
spl, for the benefit of some splsoftclock()
callers./ static __inline int spllower(ncpl) reg
ister int ncpl register int ocpl cpl cpl
ncpl if (ipending ncpl) Xspllower() ret
urn (ocpl)
Call Xspllower if there is something pending that
is higher priority then our new cpl
42
sys/arch/i386/isa/icu.s spllower()
  • IDTVEC(spllower) // Xspllower()
  • pushl ebx
  • pushl esi
  • pushl edi
  • movl _C_LABEL(cpl),ebx save
    priority
  • movl 1f,esi address to resume
    loop at
  • 1 movl ebx,eax
  • notl eax
  • andl _C_LABEL(ipending),eax
  • jz 2f
  • bsfl eax,eax
  • btrl eax,_C_LABEL(ipending)
  • jnc 1b
  • jmp _C_LABEL(Xrecurse)(,eax,4)
  • 2 popl edi
  • popl esi
  • popl ebx
  • ret

Is there a pending interrupt that is high enough
priority?
If yes, then restart it?
Write a Comment
User Comments (0)
About PowerShow.com