Scalable IO Virtualization via SelfVirtualizing Devices - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Scalable IO Virtualization via SelfVirtualizing Devices

Description:

Example: NIC card. Ethernet ... NIC runs in promiscuous mode. It looks at all the ... virtualized NP NIC performs worse than non-self virtualized NIC. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 24
Provided by: Pur61
Category:

less

Transcript and Presenter's Notes

Title: Scalable IO Virtualization via SelfVirtualizing Devices


1
Scalable I/O Virtualization via Self-Virtualizing
Devices
  • Himanshu Raj, Ivan Ganev, Karsten Schwan

2
Device Virtualization in Xen
  • Example NIC card
  • Ethernet bridge in Dom0.
  • Ethernet Listen to all advertisements, drop the
    packets having mac address diff. from its own.
  • NIC runs in promiscuous mode.
  • It looks at all the packets in this mode.

3
Self Virtualizing Devices
  • High end devices.
  • Virtualization functionality offloaded onto the
    devices themselves.
  • Eg. IXP 2400 as a self virtualizing NIC.

4
IXP as a NIC
  • Fast path layer-2 processing on IXP.
  • Management functions performed by Dom0 and
    XScale core.
  • - Creation, destruction of VIFs.
  • Ethernet bridge in IXP.
  • Benefit Dom0 doesnt need to be scheduled for
    each packet tx or rx.

5
Virtual Interface
  • For each VIF there are 2 queues Send Receive.
  • Two types of signals are defined
  • Signal from NP to the host device driver
  • Signal from the host device driver to NP.
  • (NP Network Processor)

6
VIF (Contd)
  • Each Virtual interface has a unique ID.
  • ID derived by hashing the MAC address obtained
    from the packet.
  • Scalable for packet demultiplexing on ingress
    path.

7
Processing on IXP
  • 8 Micro Engines, 8 hardware Contexts
  • Task divided among 4 micro engines.
  • The 4 micro-engines work on
  • 1. Ingress path processing.
  • 2. Egress path processing
  • Physical Network I/O
  • 3. Receipt of packets from network.
  • 4. Sending packets to the network.
  • One micro-engine context is assigned to each VIF
    both in egress and ingress path.

8
PCI
  • IXP is connected via PCI bus.
  • PCI Read vs. PCI Write

9
PCI
  • PCI read is always avoided.
  • Send queue placed in NP SDRAM
  • Receive queue placed in host memory.
  • Using PCI bridge both host and NP can write to
    each others memory.

10
Memory mapping (Avoid multiple buffer copies)
  • Send Queue
  • Send Queue present in NPs SDRAM.
  • Access only to privileged domain.
  • Grant table mechanism of Xen Access obtained by
    corresponding DomU.
  • DomU requests xen to map this memory into its
    page tables.

11
Memory Mapping
  • Receive Queue (Present in Host)
  • Host memory accessible to NP is owned by
    controller Domain.
  • Controller domain grants access to a VIF region
    to corresponding guest.
  • Guest requests xen to map this region into its
    page tables.
  • Now, the guest can read the contents of its
    receive queue.
  • All this occurs once at VIF creation time.

12
Security Isolation
  • Memory space isolation. ( Grant table mechanism)
  • One guest can not read/write into other guests
    send/rcv queue.
  • Resource Isolation
  • Each VIF assigned only one hardware context. In
    case of multiple VIFs using same hardware
    contexts, use is based on round-robin scheduling.

13
Interrupts
  • Interrupts used only in one direction from NP to
    Host for receive queue.
  • Interrupt Processing
  • Currently performed by driver domain in Xen.
  • Interrupts routed through xen.
  • High latency because of multiple protection level
    switches ( HV to OS).
  • To avoid this overhead, interrupt virtualization
    implemented in Xen.
  • PCI Interrupt from NP intercepted by Xen. Based
    on interrupt identifier register, corresponding
    guest is interrupted.

14
Limited Interrupt Space
  • Host interrupted via PCI interrupt.
  • Interrupt identifier register has 8 bits. 1 bits
    used for each VIF.
  • If more than 8 VIFs, bits are shared by multiple
    VIFs.
  • When the master Interrupt occurs, the host looks
    at the interrupt register to determine which VIF
    had activity.
  • Leads to redundant signaling on VIFs sharing the
    same bits.

15
Experimental Setup
  • Two hosts running n guests each.
  • One host runs server processes other runs client
    processes One process per VM.
  • Two metrics used to evaluate the performance
  • Latency Used an application called psapp to
    measure the packet RTT between the two guests.
  • Throughput Measured using iperf benchmark
    application.

16
Experimental Results
  • Latency

17
Latency
  • Close to 50 latency reduction when using
    self-virtualization.
  • Latency increases with the no. of guest domains.
  • This is due to the increased CPU contention among
    guests.
  • For 32 VIFs, self-virtualized NP NIC performs
    worse than non-self virtualized NIC.
  • Reason Redundant interrupts to guest causing
    redundant domain schedules.

18
Experiments show increase in throughput
19
Future Architectural Considerations
  • Insights for future heterogeneous Multi-Cores.
  • Different buses connect different components of
    the system.
  • Implies non-uniform communication latency.
  • Eg. Communication between memory and cpu cores
    very fast while that between cpu cores and i/o
    devices is slower.
  • This is problematic for devices with multiple
    short transfers that do not take advantage of
    bursts.
  • Negative impact on self virtualized devices.
  • Future multi core systems. Heterogeneous cores
  • Implies one of the core is like a NP. Since core
    to core latency is very less, such an
    architecture would prove very good for
    self-virtualized devices.

20
Experiment
  • Ping-pong to compare RTT values between two CPU
    cores and between a CPU core and a NP.
  • Two mechanisms used
  • Both the cores poll for data received.
  • One core polls, other core is asynchronously
    notified via interrupts.

21
Graph Time for ping pong.
22
Results
  • Core-to-Core communication is much cheaper.
  • Implies heterogeneous cores beneficial for self
    virtualized devices.
  • Polling is more efficient than interrupts
  • Cost of saving and restoring CPU context.
  • This means it is beneficial to have heterogeneous
    systems with multiple cores.
  • Self-virtualized devices can resort to low
    latency polling rather than using interrupts

23
Questions?
Write a Comment
User Comments (0)
About PowerShow.com