Linux NUMA Summit - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Linux NUMA Summit

Description:

Multi-queue scheduler with NUMA aware load balancing, hierarchical queues? NUMA code to identify best memory node for a thread (with different policies) as ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 14
Provided by: voxelDlSo
Category:
Tags: numa | linux | numa | summit

less

Transcript and Presenter's Notes

Title: Linux NUMA Summit


1
(No Transcript)
2
Linux NUMA Summit
Kanoj Sarcar Member of Technical Staff, Linux
Kernel John Wright Manager, Linux Kernel
3
IRIX NUMA features
  • Kernel text low level exception handler
    replication
  • Per node kernel data structures mem_map,
    bootmem, intr_desc, per node kmalloc
  • Global memory stealer
  • NUMA aware scheduler (aware of per node memory)
  • Processor pinning/isolation and cpusets (sysmp())
  • HWGfs (expose machine arch to user)
  • NUIA driver api

4
IRIX NUMA features
  • Node Affinity (memory, threads, devices)
  • Specifiable policies allocation, pagesize,
    fallback, replication, memory migration. Address
    space and address range wide.
  • Rmaps for memory migration
  • Workload managers (soft partition) - LSF, PBS

5
IRIX NUMA lessons learned
  • Topology aware algorithms
  • Memory migration/replication
  • Performance variability
  • Dedicated usage
  • Tools
  • Dplace, dprof, kernprof

6
Where are we in Linux
  • CONFIG_DISCONTIGMEM
  • MIPS64, ARM, IA64-DiG, IA64-SN1
  • CONFIG_NUMA(first touch placement)
  • MIPS64, IA64-SN1
  • CONFIG_REPLICATE_KTEXT
  • MIPS64
  • TODO mem driver, kdb
  • Bug fixes (patches)
  • NUMA specific page allocation and fallback,
    especially for user programs
  • Bootmem

7
Where are we in Linux
  • Per cpu irq tables, device intr distribution
  • IA64-SN1
  • Process Pinning (pr_sysctl())
  • TODO isolation, cpusets
  • NUMA aware scheduling
  • Arch-specific algorithm calls for cpu node
    level (nested-nodes?)
  • TODO based on IBM MQ scheduler

8
Where do we want to get to
  • What possible reasonable demands may be made by
    user programs?
  • NUMA conceptual machine (node w/holes/interleaving
    )
  • internode distance vector
  • node-device distance vector (/proc/nodeinfo??)
  • localremote
  • cpus/node
  • Expose machine arch (portability vs performance)
  • Alternatives to exposing machine arch - toy
    example
  • Well known topology mapping to specific archs

9
Where do we want to get to
  • PIO write race handling dki (pci_pio_sync(pci_dev)
    )
  • Automatic and tagged text repl (fs/pagecache
    work)
  • Automatic and directed migration
  • process, pgroup, page, address range
  • Policies on objects or address ranges/spaces?
  • Need versioning scheme?

10
Where do we want to get to
  • Affinity input from users, possibly dynamic
  • attraction/repulsion, mem-thread, mem-device,
    thread-device
  • Policy input from users
  • address space wide and address range wide
  • Advance options of (dynamic) nodeset definitions

11
Where do we want to get to
  • At least one new syscall to cover everything
    discussed options to pass in new architecture
    defined parameters (names, etc). Madvise
    extended.
  • One field each in struct mm and struct vma, a few
    in struct thread
  • Architecture specific parts of pgdat? Internode
    distances
  • Affinity hooks during fork/clone/exec

12
Where do we want to get to
  • Multi-queue scheduler with NUMA aware load
    balancing, hierarchical queues?
  • NUMA code to identify best memory node for a
    thread (with different policies) as input to cpu
    scheduler. Memory scheduler might use more cpu
    scheduling state.
  • Generic single private user page migration code
    single object page migration (kernel page
    migration is a huge beast!). Share code with
    object page swapping.

13
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com