LINUX NETWORK IMPLEMENTATION Jianyong Zhang - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

LINUX NETWORK IMPLEMENTATION Jianyong Zhang

Description:

LINUX NETWORK IMPLEMENTATION Jianyong Zhang Introduction The layer structure of network: BSD socket layer: general data structure for different protocols. – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 18
Provided by: Amul
Learn more at: https://www.cse.psu.edu
Category:

less

Transcript and Presenter's Notes

Title: LINUX NETWORK IMPLEMENTATION Jianyong Zhang


1
LINUX NETWORK IMPLEMENTATIONJianyong Zhang
2
Introduction
  • The layer structure of network
  • BSD socket layer general data structure for
    different protocols.
  • INET socket layer end points for the IP-based
    protocols TCP and UDP
  • ARP layer
  • Link layer Ethernet, SLIP, PLIP
  • Hardware NIC, serial port, parallel port-

3
Socket system call
  • C interface system call routines Socket(),
    bind(), listen(), connect(), accept(), send(),
    sendto(), recv(), recvfrom(), getsockopt(),
    setsockopt().
  • All are based on the system call socketcall().
  • Socket() return a file descriptor, read(),
    write(), select(), ioctl() use struct file
    file?f_op?sock_read
  • Socket inode struct socket sock_alloc(void)
  • inode-gti_mode S_IFSOCKS_IRWXUGO
  • inode-gti_sock 1
  • inode-gti_uid current-gtfsuid
  • inode-gti_gid current-gtfsgid
  • sock-gtinode inode

4
Generic system call
  • socketcall() function
  • asmlinkage int sys_socketcall(int call, unsigned
    long args)
  • unsigned long a0,a1
  • / copy_from_user should be SMP safe. /
  • if (copy_from_user(a, args, nargscall))
  • return -EFAULT
  • a0a0
  • a1a1
  • switch(call)
  • case SYS_SOCKET
  • err sys_socket(a0,a1,a2
    )
  • break
  • case SYS_BIND
  • err sys_bind(a0,(struct
    sockaddr )a1, a2)
  • break .

5
Important structures
  • 1. struct socket
  • socket_state state / SS_FREE,
    SS_UNCONNECTED, SS_CONNECTING, SS_CONNECTED,
    SS_DISCONNECTIN/
  • unsigned long flags
  • struct proto_ops ops
  • struct inode inode
  • struct fasync_struct fasync_list /
    Asynchronous wake up list/
  • struct file file
    / File back pointer/
  • struct sock sk
  • struct wait_queue wait
  • short type//SOCK_STREAM,
    SOCK_DGRAM, SOCK_RAW
  • unsigned char passcred
  • unsigned char tli

6
Important structures
  • 2. struct proto_ops
  • int family
  • int (dup) (struct socket newsock,
    struct socket oldsock)
  • int (release) (struct socket sock,
    struct socket peer)
  • int (bind) () int (connect)
    ()
  • int (socketpair) (struct socket sock1,
    struct socket sock2)
  • int (accept) ()
  • int (getname) ()
  • unsigned int (poll) () int (ioctl)
    ()
  • int (listen) (struct socket sock, int
    len)
  • int (shutdown) (struct socket sock, int
    flags)
  • int (setsockopt) (struct socket sock, int
    level, int optname,
  • int (getsockopt) ()
  • int (fcntl) ()
  • int (sendmsg) ()
  • int (recvmsg) ()

7
Important structures
  • 3 . Struct sk_buff .. .
  • manage individual communication packets,
  • a doule-link list
  • 4. Struct sock
  • INET socket
  • 5. Struct device
  • contols an abstract network device network
    interface.

8
Getting the data from A to B
  • 1. A,B call socket(), then are connected by
    calling connect(), accept().
  • 2. A write(socket,data.len) verify_area().
  • file fget(socket) inode
    file-gtf_dentry-gtd_inode
  • if (!file-gtf_op !(write file-gtf_op-gtwrite))
    goto out
  • down(inode-gti_sem)
  • ret write(file, data, len, file-gtf_pos)
  • up(inode-gti_sem)
  • 3. Sock_write() struct socket sock
  • sock socki_lookup(file-gtf_dentry-gtd_inode)
  • msg.msg_ioviov iov.iov_base(void )ubuf
  • return sock_sendmsg(sock, msg, size)
  • 4. For INET socket, it will call inet_sendmsg().

9
Getting the data from A to B
  • 5. inet_sendmsg()
  • struct sock sk sock-gtsk
  • return sk-gtprot-gtsendmsg(sk, msg, size)
  • / call tcp_v4_sendmsg() /
  • 6. Call tcp_do_sendmsg(sk, msg)
  • struct sk_buff skb
  • tmp MAX_HEADER sk-gtprot-gtmax_header
  • skb sock_wmalloc(sk, tmp, 0, GFP_KERNEL)
  • skb_reserve(skb, MAX_HEADER sk-gtprot-gtmax_header
    )
  • skb-gtcsum csum_and_copy_from_user(from,
    skb_put(skb, copy),
    copy, 0, err)
  • /TCP data bytes are SKB_PUT() on top, later
    TCPIPDEV headers are SKB_PUSH()'d beneath. /
  • tcp_send_skb(sk, skb, queue_it)

10
Getting the data from A to B
  • 5. tcp_send_skb() call tcp_transmit_skb(sk,
    skb_clone(skb, GFP_KERNEL))
  • 6. tcp_transmit_skb(struct sock sk, struct
    sk_buff skb) struct tcp_opt tp
    (sk-gttp_pinfo.af_tcp)
  • / Build TCP header and checksum it. /
  • tp-gtaf_specific-gtqueue_xmit(skb)
  • 7. Ip_queue_xmit() / Queues a packet to be sent,
    and starts the transmitter if necessary. This
    routine also needs to put in the total length and
    compute the checksum. /
  • / Make sure we can route this packet. /
  • skb-gtdst dst_clone(sk-gtdst_cache)
  • / OK, we know where to send it, allocate and
    build IP header. /
  • / Do we need to fragment. Again this is
    inefficient. We need to somehow lock the
    original buffer and use bits of it. /
  • / Add an IP checksum. /

11
Getting the data from A to B
  • skb-gtdst-gtoutput(skb)
  • 7. Bh synchronization with barrier
  • start_bh_atomic(void), end_bh_atomic(void)
  • 8. Dev_queue_xmit()
  • start_bh_atomic() q dev-gtqdisc
  • if (q-gtenqueue)
  • q-gtenqueue(skb, q)
  • qdisc_wakeup(dev)
  • end_bh_atomic() return
  • if (dev-gtflagsIFF_UP)
  • dev-gthard_start_xmit(skb, dev)
  • end_bh_atomic()
  • return
  • 9. For the WD8013 card, call ei_start_xmit(),
    pass the data to network adaptor, which in turn
    sends the packet to the Ethernet.

12
Getting the data from A to B
  • 10. The data, embedded in an Ethernet packet,
    are received by NIC in B. (NIC is assumed WD8013)
  • 11. NIC trigger an interrupt. This is handled by
    ei_interrupt(). Call ei_receive() (ei_
    functions are chip-specific code for many
    8390-based ethernet adaptors)
  • 12. Ei_receive() struct sk_buff skb
  • skb dev_alloc_skb(pkt_len2).
  • netif_rx(skb)
  • 13 netif_rx() receive a packet from a device
    driver and queue it for the upper (protocol)
    levels. Call skb_queue_tail(backlog,skb)
    mark_bh(NET_BH)
  • 14. There is only one list of backlog in the
    entire system.
  • 15. Do_bottom_half() calls net_bh()

13
Getting the data from A to B
  • 10. net_bh()
  • skb skb_dequeue(backlog)
  • / Bump the pointer to the next structure.
    skb-gtdata and skb-gtnh.raw point to the MAC and
    encapsulated data /
  • skb-gth.raw skb-gtnh.raw skb-gtdata
  • / Fetch the packet protocol ID. /
  • type skb-gtprotocol
  • / We got a packet ID. Now loop over the "known
    protocols" list. There are two lists. The
    ptype_all list of taps (normally empty) and the
    main protocol list which is hashed perfectly for
    normal protocols. /
  • if (ptype-gttype type (ptype-gtdevskb-gtdev))
  • /We already have a match queued. Deliver to
    it/
  • skb2skb_clone(skb, GFP_ATOMIC)
  • pt_prev-gtfunc(skb2, skb-gtdev, pt_prev)

14
Getting the data from A to B
  • 10. Call ip_rcv()
  • / check the header for correctness and deal with
    all the IP options. Ip_forward() and ip_defrag()
    /
  • return skb-gtdst-gtinput(skb)
  • 11 ip_local_deliver()
  • / Reassemble IP fragments./ skb
    ip_defrag(skb)
  • /Deliver to raw sockets. This is fun as to avoid
    copies we want to make no surplus copies. /
  • / Pass on the datagram to each protocol that
    wants it, based on the datagram protocol.
    /...
  • ipprot-gthandler(skb2, ntohs(iph-gttot_len) -
    (iph-gtihl 4))
  • 12 tcp_v4_rcv(), udp_rcv(),icmp_rcv()

15
Getting the data from A to B
  • 13. tcp_v4_rcv()
  • / check the header for correctness /
  • if (!atomic_read(sk-gtsock_readers))
  • return tcp_v4_do_rcv(sk, skb)
  • __skb_queue_tail(sk-gtback_log, skb)
  • do_time_wait case TCP_TW_ACK
    tcp_v4_send_ack()
  • 14. tcp_v4_do_rcv() call
  • __skb_queue_tail(nsk-gtback_log, skb)
  • if (sk-gtstate TCP_ESTABLISHED) / Fast path
    /
  • if (tcp_rcv_established(sk, skb,
    skb-gth.th, skb-gtlen))
  • goto reset
  • return 0
  • tcp_rcv_state_process(sk, skb, skb-gth.th,
    skb-gtlen)

16
Getting the data from A to B
  • 15. TCP receive function for the ESTABLISHED
    state.
  • It is split into a fast path and a slow path.
    The fast path is disabled when
  • - A zero window was announced from us -
    zero window probing
  • is only handled properly in the slow
    path.
  • - Out of order segments arrived.
  • - Urgent data is expected.
  • - There is no buffer space left
  • - Unexpected TCP flags/window
    values/header lengths are received (detected by
    checking the TCP header against pred_flags)
  • - Data is sent in both directions. Fast
    path only supports pure senders or pure
    receivers (this means either the sequence number
    or the ack value must stay constant)
  • When these conditions are not satisfied it
    drops into a standard
  • receive procedure patterned after RFC793
    to handle all cases.
  • The first three cases are guaranteed by
    proper pred_flags setting,
  • the rest is checked inline. Fast
    processing is turned on in
  • tcp_data_queue when everything is OK.

17
Getting the data from A to B
  • 16. Tcp_data() enter the buffer sk_buff in the
    list
  • 17. Data_ready() wake up the waiting processes.
  • 18 The former actions are carried up in the
    kernel, outside of any process.
  • 19. B executes read(socket, data, len).
  • 20. Through sys_read() --- sock_read()
    inet_rcvmsg() tcp_rcvmsg().
  • 21 This completes the datas travels from
    process A to process B.
  • 22 The data is copied only four times
  • 1) From the user space of A to kernel memory
  • 2) From kernel memory to network card.
  • 3) From network card to another computers kernel
    memory
  • 4) From Bs kernel memory to Bs user space
Write a Comment
User Comments (0)
About PowerShow.com