Flow Stats Module -- Control Design - PowerPoint PPT Presentation

About This Presentation
Title:

Flow Stats Module -- Control Design

Description:

Flow Stats Module--Control Design John DeHart * * Can END timestamp be eliminated? Use archive time as the end timestamp. Actual end of flow will have occurred within ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 28
Provided by: wus89
Category:

less

Transcript and Presenter's Notes

Title: Flow Stats Module -- Control Design


1
Flow Stats Module--Control Design
John DeHart
2
SPP V1 LC Egress with 1x10Gb/s Tx
XScale
NAT Miss Scratch Ring
S W I T C H
R B U F
M S F
Rx1
Rx2
Key Extract
Lookup
Hdr Format
TCAM
T B U F
QM0
Port Splitter
Flow Stats1
M S F
R T M
1x10G Tx1
1x10G Tx2
QM1
QM2
QM3
NAT Pkt return
Stats (1 ME)
SRAM1
SRAM3
Flow Stats2
XScale
SRAM2
XScale
Archive Records
3
SPP V1 LC Egress with 10x1Gb/s Tx
XScale
NAT Miss Scratch Ring
S W I T C H
R B U F
M S F
Rx1
Rx2
Key Extract
Lookup
Hdr Format
TCAM
5x1G Tx1 (P0-P4)
T B U F
QM0
Port Splitter
Flow Stats1
M S F
R T M
QM1
5x1G Tx2 (P5-P9)
QM2
QM3
NAT Pkt return
Stats (1 ME)
SRAM1
SRAM3
Flow Stats2
XScale
SRAM2
XScale
Archive Records
4
Overview of Flow Stats
  • 2 MEs in Fastpath to collect flow data for each
    pkt
  • Byte counter per flow
  • Pkt counter per flow
  • Archive data to XScale via SRAM ring every 5
    minutes
  • XScale control daemon(s) to process data
  • Receive flow information from MEs
  • Reformat to put into PlanetFlow format
  • Maintain databases for PlanetLab archiving and
    for identifying internal flows (pre-NAT
    translation) when an external flow (post-NAT) has
    a complaint lodged against it.

5
SPP V1 LC Egress with 10x1Gb/s Tx
XScale
NAT Miss Scratch Ring
S W I T C H
R B U F
M S F
Rx1
Rx2
Key Extract
Lookup
Hdr Format
TCAM
5x1G Tx1 (P0-P4)
T B U F
QM0
Port Splitter
Flow Stats1
M S F
R T M
QM1
5x1G Tx2 (P5-P9)
QM2
QM3
NAT Pkt return
Stats (1 ME)
SRAM1
SRAM3
Flow Stats2
XScale
SRAM2
XScale
Archive Records
6
Flow Record
  • Total Record Size 8 32-bit words
  • V is valid bit
  • Only needed at head of chain
  • 1 for valid record
  • 0 for invalid record
  • Start timestamp (16-bits) is set when record
    starts counting flow
  • Reset to zero when record is archived
  • End timestamp (16-bits) is set each time a packet
    is seen for the given flow
  • Packet and Byte counters are incremented for each
    packet on the given flow
  • Reset to zero when record is archived
  • For TCP Flows, the TCP Flags are ored in from
    each packet
  • Next Record Number is next record in hash chain
  • 0x1FFFF if record is tail
  • Address of next record (next_record_num
    record_size) collision_table_base_addr

Source Address (32b)
LW0
Destination Address (32b)
LW1
SrcPort (16b)
DestPort (16b)
LW2
Protocol (8b)
LW3
Slice ID (VLAN) (12b)
Reserved (6b)
TCP Flags (6b)
LW4
Next Record Number (17b)
Reserved (14b)
V (1b)
Packet Counter (32b)
LW5
LW6
Byte Counter (32b)
Start Timestamp (16b)
LW7
End Timestamp (16b)
7
Archiving Hash Table Records
  • Send all valid records in hash table to XScale
    for archiving every 5 minutes
  • Set Command field to indicate FLOW_RECORD
  • For each record in the main table (i.e. start of
    chain) ...
  • For each record in hash chain ...
  • If record is valid ...
  • If packet count gt 0 then
  • Send record to XScale via SRAM ring
  • Set packet count to 0
  • Set byte count to 0
  • Leave record in table
  • If packet count 0 then
  • Flow has already been archived
  • No packet has arrived on flow in 5 minutes
  • Record is no longer valid
  • Delete record from hash table to free memory

Info Sent to XScale for each flow every 5 minutes
Source Address (32b)
LW0
Destination Address (32b)
LW1
SrcPort (16b)
DestPort (16b)
LW2
Protocol (8b)
LW3
Slice ID (VLAN) (12b)
TCP Flags (6b)
Command (6b)
Packet Counter (32b)
LW4
LW5
Byte Counter (32b)
Start Timestamp_high (32b)
LW6
Start Timestamp_low (32b)
LW7
End Timestamp_high (32b)
LW8
End Timestamp_low (32b)
LW9
8
Sending Time Records to XScale
  • ME Precedes a series of Flow Records with a time
    record.
  • Set Command field to indicate TIME_RECORD
  • Time Record must be same size as Flow Record,
    currently 10 Words

Time Record Sent to Xscale Preceding Flow Records
Timestamp_high (32b)
LW0
Timestamp_low (32b)
LW1
LW2
Reserved (32b)
LW3
Reserved (26b)
Command (6b)
LW4
Reserved (32b)
LW5
Reserved (32b)
LW6
Reserved (32b)
LW7
Reserved (32b)
LW8
Reserved (32b)
LW9
Reserved (32b)
9
Overview of Flow Stats Control
  • Main functions
  • Collection of Flow Information for PlanetLab Node
  • Used when a complaint is lodged about a
    misbehaving flow
  • Must be able to identify flow and the Slice that
    produced it.
  • Aggregation of Flow Information from
  • Multiple GPEs
  • Multiple NPEs
  • Correlation with NAT records to identify internal
    flow and external flow
  • External flow will be what complaint will be
    about.
  • Internal flow will be what involved PlanetLab
    researcher will know about.

10
Translations needed
  • NPE Flow Records
  • VLAN to SliceID
  • Comes from SRM
  • IXP timestamp to wall clock time
  • SCD records wall clock time it started IXP
  • How do we manage time slip between clocks?
  • GPE Flow Records
  • NAT Port translations
  • Src Port from GPE record becomes SPP Orig Src
    Port
  • Src Port from natd translation record becomes Src
    Port
  • natd provides port translation updates

11
Merging of DBs
  • NPE Flows
  • No NAT
  • Goes directly into Ext PF DB
  • SPP Orig Src Port SrcPort
  • Do they need SliceID translation?
  • We use the VLAN, but this probably needs to be
    the PlanetLab version of a Slice ID.
  • SRM will provide a VLAN to SliceID translation
  • Where and When?
  • GPE Configured Flows
  • How do we identify configured flow pkt?
  • Because they dont match a NAT Record?
  • No NAT
  • Goes directly into Ext PF DB
  • SPP Orig Src Port SrcPort
  • GPE NAT Flows
  • Find corresponding NAT Record, extract Translated
    SrcPort
  • Insert record into Ext PF DB with original
    SrcPort moved to SPP Orig Src Port
  • Set Src Port to translated SrcPort
  • CP Traffic?

12
Overview of PlanetFlow
  • PlanetFlow
  • Unprivileged slice
  • Flow Collector
  • Ulogd (fprobe-ulog)
  • Netlink socket
  • Uses VSys for privileged operations
  • Every 5 minutes dumps its cache to DB
  • DB
  • On PlanetLab Node
  • 5-minute records
  • Flows spanning 5-minute intervals aggregated
    daily.
  • Central Archive
  • At Princeton?
  • Updated periodically by using rsync to retrieve
    new DB entries from ALL PlanetLab nodes.

X
X
13
PlanetFlow Raw Data
  • 0005 0011 8e10638b 48a40477 00062638
  • 0000371d 0000 0000 80fc99cd 80fc99d3
  • 00000000 0000 0004 0000000b 0000062d
  • 8dae5570 8dae558b cc1f 01bb 00 1f 0600
  • 0000 0000 02000000 80fc99cd 80fc99d3
  • 00000000 0000 0004 0000001a 000008b7
  • 8dae54eb 8dae5533 cc1e 01bb 001e 0600
  • 0000 0000 02000000

NetFlow Header (beginning of file and
repeats every 30 flow records)
Uptime
Pad16 (unused)
Engine Type (unused)
Engine Id (unused)
SA
DA
Flow Sequence
128.252.153.205
128.252.153.211
IPv4 NextHop (Unused)
In SNMP (if_nametoindex)
Out SNMP (if_nametoindex)
Pkt Count
Byte Count
NetFlow Flow Record
1581
11
Tcp flags
First Switched (flow creation time)
Last Switched (time of last pkt)
Proto
Src Tos
Src Port
Dst Port
Pad
443
52255
Src As (Unused)
Dst As (Unused)
XID (SliceID)
SA
DA
128.252.153.205
128.252.153.211
IPv4 NextHop (Unused)
NetFlow Flow Record
In SNMP (if_nametoindex)
Out SNMP (if_nametoindex)
Pkt Count
Byte Count
2231
26
First Switched (flow creation time)
Last Switched (time of last pkt)
Tcp flags
Src Tos
Proto
Src Port
Dst Port
Pad
443
52254
Src As (Unused)
Dst As (Unused)
XID (SliceID)
14
SPP/PlanetFlow Raw Data
  • 0005 0011 8e10638b 48a40477 00062638
  • 0000371d xx yy 0000 80fc99cd 80fc99d3
  • 00000000 0000 0004 0000000b 0000062d
  • 8dae5570 8dae558b cc1f 01bb 00 1f 0600
  • zzzz 0000 02000000 80fc99cd 80fc99d3
  • 00000000 0000 0004 0000001a 000008b7
  • 8dae54eb 8dae5533 cc1e 01bb 001e 0600
  • zzzz 0000 02000000

NetFlow Header (beginning of file and
repeats every 30 flow records)
Uptime (msecs)
Pad16 (unused)
SPP Engine Type
SPP Engine Id
SA
DA
Flow Sequence
128.252.153.205
128.252.153.211
IPv4 NextHop (Unused)
In SNMP (if_nametoindex)
Out SNMP (if_nametoindex)
Pkt Count
Byte Count
NetFlow Flow Record
1581
11
Tcp flags
First Switched(msec) (flow creation time)
Last Switched(msec) (time of last pkt)
Proto
Src Tos
Src Port
Dst Port
Pad
443
52255
SPP Orig Src Port
Dst As (Unused)
XID (SliceID)
SA
DA
128.252.153.205
128.252.153.211
IPv4 NextHop (Unused)
NetFlow Flow Record
In SNMP (if_nametoindex)
Out SNMP (if_nametoindex)
Pkt Count
Byte Count
2231
26
First Switched (flow creation time)
Last Switched (time of last pkt)
Tcp flags
Src Tos
Proto
Src Port
Dst Port
Pad
443
52254
SPP Orig Src Port
Dst As (Unused)
XID (SliceID)
15
Issues and Notes
  • Time
  • Keeping time in sync among various machines
  • Flow Stats ME timestamps with IXP clock ticks.
  • Something has to convert this to a Unix time.
  • GPE(s) timestamps with Unix gettimeofday().
  • CP collects flow records and aggregates based on
    time.
  • Proposal
  • XScale, GPE(s) and CP will use ntp to keep their
    Unix times in sync
  • At the beginning of each reporting cycle, the
    Flow Stats ME should send a timestamp record just
    to allow the XScale and CP to keep the time in
    sync.
  • OR Can XScale read the IXP clock tick and report
    that to the CP with along with the XScales Unix
    time.
  • What are the times that are recorded in the
    Header and Flow Records?
  • Header
  • Uptime (msecs) msecs since a base start time
  • Time since Unix Epoch time since January 1, 1970
  • Unix secs
  • Unix nSecs
  • Uptime and Unix (secs, nSecs) represent the SAME
    time
  • So that the Flow times can be calculated based on
    them.
  • Flow Record

16
Issues and Notes (continued)
  • NetFlow Header
  • Filled in AFTER 30 flow records are filled in OR
    we get a timeout (10 minutes)
  • COUNT field tells how many flow records are
    valid.
  • File or data packet is ALWAYS padded out to a
    size that would hold 30 flow records
  • Flow Sequence Running total of number of flow
    records emitted.
  • Flow Header and Flow Records
  • Emitted in chunks of 30 flow records plus a Flow
    Header
  • Emitted either by writing to a file or sending
    over a socket to a mirror site.
  • Padded out to a size that would hold 30 flow
    records.
  • A flow is emitted when it has been inactive for
    at least a minute or when it has been active for
    at least 5 minutes.
  • Fprobe-ulog threads
  • emit_thread
  • scan_thread
  • cap_thread
  • unpending_thread
  • Flow lists
  • flows hashed array of flows, buckets chained
    off head of list
  • These are flows that have been reported over
    netlink socket

17
Issues and Notes (continued)
  • VLANs and SliceIDs
  • NPE and LC use VLANs to differentiate Slices
  • Flow records must record slice IDs
  • SRM will provide VLAN to SliceID translation
  • GPE(s) do not differentiate Slices by VLAN.
  • All flows from a GPE will use the same VLAN
  • GPE keeps flow records locally using Slice ID
  • Flow Stats ME could ignore GPE flow packets if it
    was told what the default GPE VLAN was.
  • Otherwise, one of the fs daemons could drop the
    flow records for the GPE flows that the Flow
    Stats ME reports.
  • Slice ID
  • What exactly is it?
  • Is the XID that is recorded by PlanetFlow
    actually the slice id or is it the VServer id?

18
Issues and Notes (continued)
  • NAT Port Translations
  • GPE flow records are the ones that need the NAT
    Port translation data
  • GPE flow records will come across from the GPE(s)
    to the CP via rsync or similar
  • natd will report NAT port translations with
    timestamps to the fs daemon
  • fs daemon will have to maintain NAT port
    translations (with their timestamps) for possible
    later correlation with GPE flow records
  • GPE(s) will all use the same default VLAN
  • SRM will send this VLAN to scd so it can write it
    to SRAM for the fs ME to read in
  • Fs ME will then filter out GPE flow records.
  • SRM ? ? fsd messaging
  • srm will push out VLAN ? SliceID translation
    creation and deletion messages
  • srm will wait 10 minutes before re-using a VLAN
  • srm will send the delete VLAN message after
    waiting the 10 minutes.
  • fsd should not have to keep any history of
    VLAN/SliceID translations
  • It should get the creation before it receives any
    flow records for it
  • It should get the last flow record before it gets
    the deleteion
  • fsd will also be able to query SRM for current
    translation
  • This will facilitate a restart of the fsd while
    the SRM maintains current state.

19
Issues and Notes (continued)
  • rsync of flow record files from GPE(s) to CP
  • A particular run of rsync may get a file that is
    still being written to by fprobe-ulog on the GPE
  • A subsequent rsync will may get the file again
    with additional records in it.
  • Sample rsync command
  • rsync --timeout 15 -avzu -e "ssh -i
    /vservers/plc1/etc/planetlab/root_ssh_key.rsa "
    root_at_drn02/vservers/pl_netflow/pf /root/pf
  • This will report the files that have been copied
    over

20
Issues and Notes (continued)
  • Sample fprobe-ulog command
  • /sbin/fprobe-ulog -M -e 3600 -d 3600 -E 60 -T 168
    -f pf2 -q 1000 -s 30 -D 250000
  • Started from /etc/rc.d/rc2345.d/S56fprobe-ulog
  • All linked to /etc/init.d/fprobe-ulog
  • GPE Flow record collection daemon fprobe-ulog
  • Scan thread
  • Collects flow records into a linked list
  • Emit thread
  • Periodically writes flow records out to a file
  • Every 600 seconds ten minutes!
  • Daemon can also send flow records to a remote
    collector!
  • So we could have the GPEs emit their flow records
    directly to the flow stats daemon on the CP.
  • Sample command
  • /sbin/fprobe-ulog -M -e 3600 -d 3600 -E 60 -T 168
    -f pf2 -q 1000 -s 30 -D 250000 ltremotegtltportgt/lt
    local/lttype
  • There can be multiple remote host specifications
  • Where
  • remote remote host to send to
  • port destination port to send to
  • local local hostname to use

21
SPP PlanetFlow
CP
GPE
fprobe
fsd
rsync
srm
GPE
fprobe
Ingress XScale
Egress XScale
scd
natd
FlowStats SRAM Ring
NAT Scratch Rings
MEs
HF
LK
FS2
Central Archive Record lttime, sliceID, Proto,
SrcIP, SrcPort, DstIP, DstPort, PktCnt,
ByteCntgt Ext PF DB Record ltCentral Archive
Recordgt
22
Plan/Design
  • Flow Stats daemon, fsd, runs on CP
  • Collects flow records from GPE(s) and NPE(s) and
    writes them into a series of PlanetFlow2 files
    with names
  • pf2., where is (0-162)
  • Current file is closed after N minutes and is
    incremented and new file is opened and started.
  • This mimics what fprobe-ulog does now on the
    GPE(s)
  • These files are then collected periodically by
    PLC for use and archiving
  • I dont think there is any explicit indication
    that PLC has picked up the files but the timing
    must be such that we know it is done before we
    roll over the file names and overwrite an old
    file.
  • Gets NAT data from natd
  • Keep records of this with timestamps so we can
    correlate with flow records coming from GPE(s)
  • Keep NAT records on a per Src IP Address basis.
  • One set of NAT records per external interface
  • Check with Mart on how this will work
  • Gets VLAN to sliceID data from srm
  • srm will send start translation, stop translation
    msgs with a 10 minute wait period when stopping a
    translation to make sure we are done with flow
    records for that slice
  • FS ME archives records every 5 minutes.
  • Slices are long lived (right?) so this should not
    be a problem
  • Fsd can also request a translation from srm
  • This is in case fsd has to be restarted while srm
    and other daemons continue running.

23
Plan/Design (continued)
  • Fsd gathers records from GPE(s) and NPE(s)
  • Gathers flow records from GPE(s) via socket(s)
    from fprobe-ulog on GPE(s)
  • Come across as one data packet with up to 30 flow
    records
  • Packet is padded out to full 30 flow records with
    Count in Header indicating how many of them are
    valid
  • Update NetFlow header to indicate that this is an
    SPP and which SPP node it is using Engine Type
    and Engine ID fields
  • Update with NAT data and write immediately out to
    current pf2 file keeping its NetFlow header.
  • Gathers flow records from NPE(s) via socket from
    scd on XScale
  • Come across one flow record at a time
  • No NetFlow Header
  • Create NetFlow Header
  • With appropriate Uptime and UnixTime (secs,
    nsecs)
  • With SPP Engine Type and SPP Engine ID
  • Modify Flow Record times to be msecs correlated
    with Uptime
  • Update NPE flow record with SliceID from srm.
  • Collect NPE records for a period of time or until
    we get 30 and then write them out to current pf2
    file with NetFlow header.

24
Plan/Design (continued)
  • FS ME and scd
  • Use a command field in records coming across from
    FS ME to scd
  • Use one command to set current time
  • When FS ME is starting an archive cycle, first it
    sends a timestamp command
  • When scd gets this timestamp command it
    associates it with a gettimeofday() time and
    sends the FS ME time and the gettimeofday() time
    to the fsd on the CP so it can associated ME
    times with Unix times.
  • Use another command to indicate flow records
  • Flow records can be sent directly on to fsd on CP

25
Data to fsd
  • srm ? fsd
  • Start_vlan_to_sliceId_translate(vlan, sliceId)
  • Stop_vlan_to_sliceId_translate(vlan, sliceId)
  • scd ? fsd
  • Timestamp command
  • ME Timestamp
  • Unix time
  • flowRecord(Saddr, Daddr, Sport, Dport, tcpFlags,
    VLAN, protocol, pktCnt, ByteCnt,
    startTimeStampHigh, startTimestampLow,
    endTimestampHigh, endTimestampLow)

26
Data to fsd (continued)
  • natd ? fsd
  • startNatTranslation(Saddr, Daddr, internalPort,
    externalPort, protocol, srcMAC, timeStampHigh)
  • stopNatTranslation(Saddr, Daddr, internalPort,
    externalPort, protocol, srcMAC, timeStampHigh)
  • gpe ? fsd
  • NetFlow Header
  • 30 NetFlow Flow Records

27
  • End
Write a Comment
User Comments (0)
About PowerShow.com