Title: What
1Whats needed to receive?
A look at the minimum steps required for
programming our anchor nics to receive packets
2A disappointment
- Our former nicwatch.cpp application does not
seem to work reliably to show packets being
received by the 82573L controller - It was based on the raw sockets protocol
implemented within the Linux kernels vast
networking subsystem, thus offering us the
prospect of a hardware-independent tool -- if
only it would show us all the packets!
3Two purposes
- So lets discard nicwatch.cpp in favor of
writing our own hardware-specific module that
WILL be able to show us all the nics received
packets, independently of Linuxs various layers
of networking protocol code - And lets keep it as simple as possible, so we
can see which programming steps are the truly
essential ones for the 82573L nic
4Accessing 82573L registers
- Device registers are hardware mapped to a range
of addresses in physical memory - We can get the location and extent of this
memory-range from a BAR register in the 82573L
devices PCI Configuration Space - We then request the Linux kernel to setup an I/O
remapping of this memory-range to virtual
addresses within kernel-space
5i/o-memory remapping
Local-APIC
APIC registers
IO-APIC
nic registers
1-GB
vram
nic registers
kernel code/data
user space
vram
3-GB
dynamic ram
virtual address-space
physical address-space
6Kernel memory allocation
- The NIC requires that some host memory for
packet-buffers and receive descriptors - The kernel provides a helper function for
reserving a suitable region of memory in
kernel-space which is both non-pageable and
physically contiguous (i.e., kzalloc()) - Its our job is to decide how much memory our
network controller hardware will need
7Ethernet packet layout
- Total size normally can vary from 64 bytes up to
1522 bytes (unless jumbo packets and/or
undersized packets are enabled) - The NIC expects a 14-byte packet header and it
appends a 4-byte CRC check-sum
0 6
12 14
the packets data payload goes here (usually
varies from 56 to 1500 bytes)
destination MAC address (6-bytes)
source MAC address (6-bytes)
Type/length (2-bytes)
Cyclic Redundancy Checksum (4-bytes)
8Rx-Descriptor Ring-Buffer
RDBA base-address
0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x
80
RDH (head)
RDLEN (in bytes)
RDT (tail)
owned by hardware (nic)
owned by software (cpu)
Circular buffer (128-bytes minimum and must be
a multiple of 128 bytes)
9Our nicspy.c module
- It will be a character-mode device-driver
- It will only implement read() and ioctl()
- The read() function will cause a task to sleep
until a network packet has arrived - An interrupt-handler will wake up the task
- A get_info function will be provided as a
debugging aid, so the NICs Rx descriptor-queue
can be conveniently inspected
10Sixteen packet-buffers
- Our nicspy.c driver allocates 16 buffers of
size 1536 bytes (i.e., for normal ethernet)
for the Rx Descriptor Queue (256 bytes)
for the sixteen packet-buffers
unused
unused
32-KB allocated (16 packet-buffers, plus
Rx-Descriptor Queue)
define KMEM_SIZE 0x8000 // 32KB size of kernel
memory allocation void kmem kzalloc(
KMEM_SIZE, GFP_KERNEL ) if ( !kmem ) return
ENOMEM
11Format for an Rx Descriptor
16 bytes
Base-address (64-bits)
status
Packet- length
Packet- checksum
VLAN tag
errors
The device-driver initializes this
base-address field with the physical address
of a packet-buffer
The network controller will
write-back the values for these fields
when it has transferred a received packets
data into this packet-buffer
12Suggested C syntax
typedef struct unsigned long
long base_address unsigned short packet_length
unsigned short packet_cksum unsigned
char desc_status unsigned char desc_errors
unsigned short VLAN_tag RX_DESCRIPTOR
Legacy Format for the Intel Pro1000 network
controllers Receive Descriptors
13RxDesc Status-field
7 6 5 4
3 2 1 0
PIF
IPCS
TCPCS
VP
IXSM
EOP
DD
UDPCS
DD Descriptor Done (1yes, 0no) shows if nic
is finished with descriptor EOP End Of
Packet (1yes, 0no) shows if this packet is
logically last IXSM Ignore Checksum
Indications (1yes, 0no) VP VLAN Packet match
(1yes, 0no) USPCS UDP Checksum calculated in
packet (1yes, 0no) TCPCS TCP Checksum
calculated in packet (1yes, 0no) IPCS IPv4
Checksum calculated on packet (1yes, 0no)
PIF Passed In-exact Filter (1yes, 0no) shows
if software must check
14RxDesc Error-field
7 6 5 4
3 2 1 0
RXE
IPE
TCPE
reserved 0
SE
CE
SEQ
reserved 0
RXE Received-data Error (1yes, 0no) IPE
IPv4-checksum error TCPE TCP/UDP checksum
error (1yes, 0no) SEQ Sequence error (1yes,
0no) SE Symbol Error (1yes, 0no) CE CRC
Error or alignment error (1yes, 0no)
15Essential receive registers
enum E1000_CTRL 0x0000, // Device
Control E1000_STATUS 0x0008, // Device
Status E1000_ICR 0x00C0, // Interrupt Cause
Read E1000_IMS 0x00D0, // Interrupt Mask
Set E1000_IMC 0x00D8, // Interrupt Mask
Clear E1000_RCRL 0x0100, // Receive
Control E1000_RDBAL 0x2800, // Rx Descriptor
Base Address Low E1000_RDBAH 0x2804, // Rx
Descriptor Base Address High E1000_RDLEN 0x2808,
// Rx Descriptor Length E1000_RDH 0x2810, //
Rx Descriptor Head E1000_RDT 0X2818, // Rx
Descriptor Tail E1000_RXDCTL 0x2828, // Rx
Descriptor Control E1000_RA 0x5400, // Receive
address-filter Array
16Receive Control (0x0100)
31 30 29 28 27 26
25 24 23 22 21
20 19 18 17 16
R 0
0
0
FLXBUF
SE CRC
BSEX
R 0
PMCF
DPF
R 0
CFI
CFI EN
VFE
BSIZE
15 14 13 12 11
10 9 8 7 6 5
4 3 2 1 0
B A M
R 0
MO
DTYP
RDMTS
I L O S
S L U
LPE
UPE
0 0
R 0
SBP
E N
LBM
MPE
EN Receive Enable DTYP Descriptor
Type DPF Discard Pause Frames SBP Store Bad
Packets MO Multicast Offset PMCF Pass MAC
Control Frames UPE Unicast Promiscuous Enable
BAM Broadcast Accept Mode BSEX Buffer Size
Extension MPE Multicast Promiscuous Enable
BSIZE Receive Buffer Size SECRC Strip
Ethernet CRC LPE Long Packet reception Enable
VFE VLAN Filter Enable FLXBUF Flexible
Buffer size LBM Loopback Mode CFIEN
Canonical Form Indicator Enable RDMTS
Rx-Descriptor Minimum Threshold Size CFI
Canonical Form Indicator bit-value
We used 0x0000801C in RCTL to prepare the
receive engine prior to enabling it
17Device Control (0x0000)
31 30 29 28 27 26
25 24 23 22 21
20 19 18 17 16
PHY RST
VME
R 0
TFCE
RFCE
RST
R 0
R 0
R 0
R 0
R 0
ADV D3 WUC
R 0
D/UD status
R 0
R 0
15 14 13 12 11
10 9 8 7 6 5
4 3 2 1 0
R 0
R 0
R 0
FRC DPLX
FRC SPD
R 0
SPEED
R 0
S L U
R 0
R 0
R 1
0 0
F D
GIO M D
R 0
FD Full-Duplex SPEED (0010Mbps, 01100Mbps,
101000Mbps, 11reserved) GIOMD GIO Master
Disable ADVD3WUP Advertise Cold Wake Up
Capability SLU Set Link Up D/UD Dock/Undock
status RFCE Rx Flow-Control Enable FRCSPD
Force Speed RST Device Reset TFCE Tx
Flow-Control Enable FRCDPLX Force Duplex PHYRST
Phy Reset VME VLAN Mode Enable
82573L
We used 0x040C0241 to initiate a device reset
operation
18Device Status (0x0008)
31 30 29 28 27 26
25 24 23 22 21
20 19 18 17 16
?
0
0
0
0
0
0
0
0
0
0
0
GIO Master EN
0
0
0
some undocumented functionality?
15 14 13 12 11
10 9 8 7 6 5
4 3 2 1 0
0
0
0
0
0
PHY RA
ASDV
I L O S
S L U
0
TX OFF
0 0
F D
Function ID
L U
SPEED
FD Full-Duplex LU Link Up TXOFF
Transmission Paused SPEED (0010Mbps,01100Mbps,
101000Mbps, 11reserved) ASDV Auto-negotiation
Speed Detection Value PHYRA PHY Reset Asserted
82573L
19PCI Bus Master DMA
82573L i/o-memory
Hosts Dynamic Random Access Memory
on-chip RX descriptors
Rx Descriptor Queue
packet-buffer
on-chip TX descriptors
packet-buffer
packet-buffer
DMA
packet-buffer
RX and TX FIFOs (32-KB total)
packet-buffer
packet-buffer
packet-buffer
20Our read() algorithm
unsigned int rx_curr ssize_t my_read( struct
file file, char buf, size_t len, loff_t pos
) // our global variable rx_curr is the
descriptor-array index // for the next
receive-buffer descriptor to be processed if (
this descriptors status is zero ) put calling
task to sleep // wakeup the task when a fresh
packet has been received copy received data
from the packet-buffer to users buffer clear
this descriptors status advance our global
variable rx_curr to the next descriptor return
the number of data-bytes transferred
21nicspy.cpp
- This application calls our device-drivers
read() function repeatedly, and displays the
raw ethernet packet-data each time - It requires our nicspy.c device-driver to be
installed in the kernel, obviously - Theres no clash of filenames here and their
similarity helps keep them together - nicspy.c and nicspy.ko (the kernel-side)
- nicspy.cpp and nicspy ( the user-side )
22in-class demo
- We can install nicspy.ko on one of our anchor
machines making sure eth1 is down before we
do our module-install and then we run nicspy
on that machine - Next we install our nicping.ko module on some
other anchor machine be sure its eth1
interface is down beforehand and then use
cat /proc/nicping for a transmit
23(No Transcript)