Accelerating TwoDimensional Page Walks for Virtualized Systems - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Accelerating TwoDimensional Page Walks for Virtualized Systems

Description:

Address mapping for Virtual Machine. ... memory translation: manipulated by ... TLB(Translation look-aside buffers) caches the final physical address to reduce ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 27
Provided by: mmj6
Category:

less

Transcript and Presenter's Notes

Title: Accelerating TwoDimensional Page Walks for Virtualized Systems


1
Accelerating Two-Dimensional Page Walks for
Virtualized Systems
  • Jun Ma

2
Introduction
  • Native non-virtualized system
  • We have a OS running on a physical system.
  • OS communicates with physical system directly.
  • Address Mapping
  • Virtual Address The address used in OS
    application software.
  • Physical Address The address in physical
    machine.
  • For native system VA-gtPA.

3
Introduction
  • Virtualization
  • Multiple OS can run simultaneously but
    separately
  • on one physical system.
  • hypervisor underlying software used to insert
  • abstractions into virtualized system and
    manipulate
  • the communication between OS and physical
  • system.

4
Introduction
  • Virtualization
  • Address mapping for Virtual Machine.
  • Guest OS Guest Virtual Address (GVA), Guest
    Physical Address. (GPA)
  • Physical system System Physical Address(SPA).
  • Address translation
  • GVA-gtGPA-gtSPA

5
Introduction
  • Virtualization
  • Tradition idea for memory translation
    manipulated by hypervisor.
  • Drawbacks hypervisor intercepts operation,
    exits guest, emulates the operation and does
    memory translation and then return back to guest.
    -gt high overhead.
  • Alternative idea
  • Using hardware to finish translation.
  • Dont need hypervisor, save overhead.

6
Background
  • X86 Native Page Translation
  • Page table
  • use hierarchical address-translation tables to
    map VA to PA.
  • Page walk
  • an iterative process.
  • In order to get the final PA from VA, we need a
    page walk and traverse all level page table
    hierarch.

7
Background
  • X86 Native Page Translation
  • From level 4
    down to level 1.
  • A physical address from above
    level is used as base address and 9-bit
    VA is used as offside.
  • TLB(Translation look-aside buffers)
    caches the final physical address to reduce
    frequency of page walks.

8
Background
  • Memory Management for Virtualization
  • Without hardware support, we should use
    hypervisor to manipulate this translation. This
    is one important overhead for hypervisor. (Using
    shadow page table to map GVA to SPA)
  • Hardware mechanism
  • Same idea as X86 page walking. (2D page walking)
  • Nested paging map GPA to SPA.

9
Background
  • Memory Management for Virtualization
  • Traverse guest page table to translate
    GVA to GPA. For each level, original
    GPA should be translated to SPA by
    walking nested page table for each gL (guest
    page table) to read. TLB caches the
    final SPA to reduce page walk overhead.

10
Background
  • Large page size advantages
  • Memory saving
  • With 4 KB pages, an OS should use entire L1
    table which is 4 KB large. If we can make all 512
    4 KB into a 2 MB contiguous block, we can escape
    L1 so we save 4 KB space used by L1.
  • Reduction in TLB pressure
  • Each large page table entry can be stored in a
    single TLB entry while the corresponding regular
    page entries require 512 4 KB TLB entries to map
    the same 2 MB range of virtual address.
  • Shorter page walk
  • Escape the entire L1, the page walking is
    shorter and therefore save some overhead.

11
Page walk characterization
  • Page walk cost
  • Perfect TLB Opportunity means the performance
    improvement that could be achieved with a perfect
    TLB which eliminates cold misses as well as
    conflict and capacity misses.

12
Page walk characterization
  • Page entry reuses

13
Page walk characterization
  • Page entry reuses

14
Page walk characterization
  • Page entry reuses
  • Nested page tables have much higher reuse than
    guest page tables, in part due to the inherent
    redundancy of the nested page walk.
  • There are many more nested accesses than guest
    accesses in a 2D page walk. Each level of the
    nested page table hierarchy must be accessed for
    each guest level. In many cases the same nested
    page entries are accessed multiple times in a 2D
    page walk (high reuse rate).

15
Page walk characterization
  • Page entry reuses

ltgL1,Ggt and ltgPA, nL1gt both have high unique page
entries because both of them map guest data into
their respective address space. lt gL1,G gt maps
GVA-gt GPA. lt gPA, nL1 gt maps GPA -gt SPA. So these
two are most difficult to be cached.
16
Page Walk Acceleration
  • AMD Opteron Translation Caching
  • Page walk cache(PWC)
  • stores page entries from all page table levels
    except L1, which is stored in TLB.
  • All page entries are initially brought into L2
    cache. On a PWC miss, the page entry data may
    reside in the L2 cache, L3 cache(if present).

17
Page Walk Acceleration
  • Translation caching for 2D page walks

18
Page Walk Acceleration
  • Translation caching for 2D page walks
  • One Dimensional PWC(1D_PWC)
  • Only page entry data from the guest dimension
    are stored in the PWC and the entries are tagged
    based on the system physical address.
  • The lowest level guest page table entry G,gL1
    is not cached in the PWC because of its low reuse
    rate.
  • Two-Dimensional PWC (2D PWC)
  • Extends 1D PWC into the nested dimension of the
    2D page walk. Turning the 20 unconditional cache
    hierarchy accesses into 16 likely PWC hits
    (dark-?lled references in Figure 5(b)) and four
    possible PWC hits (checkered references. Like 1D
    PWC, all page entries are tagged with their
    system physical address and G,gL1 is not
    cached.

19
Page Walk Acceleration
  • Translation caching for 2D page walks
  • Two-Dimensional PWC with Nested Translations (2D
    PWCNT)
  • Augment 2D PWC with a dedicated GPA to SPA
    translation buffer, the Nested TLB (NTLB), which
    is used to reduce the average number of page
    entry references that take place during a 2D page
    walk.
  • The NTLB uses the guest physical address of the
    guest page entry to cache the corresponding nL1
    entry.
  • The page walk begins by accessing the NTLB with
    the guest physical address of G,gL4 and produce
    the data of nL1,gL4, allowing nested references
    1-4 to be skipped. On an NTLB hit, the system
    physical address of G,gL4 needed for the PWC
    access is calculated.

20
Result
  • Benchmark we will use in the following slides

21
Result
The three hardware-only page walk caching schemes
improve performance by turning page entry memory
hierarchy references into lower latency PWC
accesses and, in the case of 2D PWCNT, skipping
some page entry references entirely.

22
Result

Left side G column is not skipped, so it does
not change. So does gPA row. gL1 in 2D_PWCNT is
skipped in 2D_PWCNT though it has a low reuse
rate. So it exhibits a shorter space in 2D_PWC_NT
than in 2D_PWC. Right side NTLB eliminates many
of the PWC accesses, but it does not eliminate a
signi?cant portion of the accesses that have the
highest penalty.
23
Result
  • The ?rst data column states that L2 accesses
    incurred during a 2D page walk using the 2D
  • PWCNT con?guration generate 2.7-5.5 times more
    L2 misses than the native page walk.
  • This increase is primarily because the native
    page walk has fewer entries that are dif?cult
  • to cache (L1 and sometimes L2) compared to the
    2D page walk (G,gL1, nL1,gPA and
  • sometimes G,gL2, nL2,gPA, nL1,gL1, and
    nL2,gL1).
  • The second data column shows the L2 cache miss
    percentage due only to page entries from

24
Result
The 8096 w/(G, gL1) con?guration is unique in
that it writes the gL1 guest page entry to the
PWC.

25
Result

Large pages allow the TLB to cover a larger data
region with fewer translations, which will lead
to less TLB missing. (the nL1 references for the
gPA, gL1, gL2, gL3,and gL4 levels are all
eliminated. ) The ability to eliminate
poor-locality references, like nL1,gL1 and
nL1,gPA, reduces the number of L2 cache misses
by 60-64.
26
Conclusion
  • Nested paging is a hardware technique to reduce
    the complexity of
  • software memory management during system
    virtualization. Nested
  • page tables combine with the guest page tables
    to map GPA to SPA,
  • resulting in a two-dimensional (2D) page
    walk(2D_PWC, 2D_PWCNT).
  • A hypervisor is no longer required to trap on all
    guest page table
  • updates and significant virtualization overhead
    is eliminated. However,
  • nested paging can introduce new overhead due to
    the increase in page
  • entry references.
  • Therefore, the overall performance of a
    virtualized system is improved
  • by nested paging when the eliminated hypervisor
    memory management
  • overhead is greater than the new 2D page walk
    overhead.
Write a Comment
User Comments (0)
About PowerShow.com