Linux NUMA Summit

About This Presentation

Title:

Linux NUMA Summit

Description:

Multi-queue scheduler with NUMA aware load balancing, hierarchical queues? NUMA code to identify best memory node for a thread (with different policies) as ... – PowerPoint PPT presentation

Number of Views:127

Avg rating:3.0/5.0

Slides: 14

Provided by: voxelDlSo

Category:

more less

Transcript and Presenter's Notes

Title: Linux NUMA Summit

1
(No Transcript)
2
Linux NUMA Summit
Kanoj Sarcar Member of Technical Staff, Linux
Kernel John Wright Manager, Linux Kernel
3
IRIX NUMA features

Kernel text low level exception handler
replication
Per node kernel data structures mem_map,
bootmem, intr_desc, per node kmalloc
Global memory stealer
NUMA aware scheduler (aware of per node memory)
Processor pinning/isolation and cpusets (sysmp())
HWGfs (expose machine arch to user)
NUIA driver api

4
IRIX NUMA features

Node Affinity (memory, threads, devices)
Specifiable policies allocation, pagesize,
fallback, replication, memory migration. Address
space and address range wide.
Rmaps for memory migration
Workload managers (soft partition) - LSF, PBS

5
IRIX NUMA lessons learned

Topology aware algorithms
Memory migration/replication
Performance variability
Dedicated usage
Tools
Dplace, dprof, kernprof

6
Where are we in Linux

CONFIG_DISCONTIGMEM
MIPS64, ARM, IA64-DiG, IA64-SN1
CONFIG_NUMA(first touch placement)
MIPS64, IA64-SN1
CONFIG_REPLICATE_KTEXT
MIPS64
TODO mem driver, kdb
Bug fixes (patches)
NUMA specific page allocation and fallback,
especially for user programs
Bootmem

7
Where are we in Linux

Per cpu irq tables, device intr distribution
IA64-SN1
Process Pinning (pr_sysctl())
TODO isolation, cpusets
NUMA aware scheduling
Arch-specific algorithm calls for cpu node
level (nested-nodes?)
TODO based on IBM MQ scheduler

8
Where do we want to get to

What possible reasonable demands may be made by
user programs?
NUMA conceptual machine (node w/holes/interleaving
)
internode distance vector
node-device distance vector (/proc/nodeinfo??)
localremote
cpus/node
Expose machine arch (portability vs performance)
Alternatives to exposing machine arch - toy
example
Well known topology mapping to specific archs

9
Where do we want to get to

PIO write race handling dki (pci_pio_sync(pci_dev)
)
Automatic and tagged text repl (fs/pagecache
work)
Automatic and directed migration
process, pgroup, page, address range
Policies on objects or address ranges/spaces?
Need versioning scheme?

10
Where do we want to get to

Affinity input from users, possibly dynamic
attraction/repulsion, mem-thread, mem-device,
thread-device
Policy input from users
address space wide and address range wide
Advance options of (dynamic) nodeset definitions

11
Where do we want to get to

At least one new syscall to cover everything
discussed options to pass in new architecture
defined parameters (names, etc). Madvise
extended.
One field each in struct mm and struct vma, a few
in struct thread
Architecture specific parts of pgdat? Internode
distances
Affinity hooks during fork/clone/exec

12
Where do we want to get to

Multi-queue scheduler with NUMA aware load
balancing, hierarchical queues?
NUMA code to identify best memory node for a
thread (with different policies) as input to cpu
scheduler. Memory scheduler might use more cpu
scheduling state.
Generic single private user page migration code
single object page migration (kernel page
migration is a huge beast!). Share code with
object page swapping.

13
(No Transcript)

Write a Comment

User Comments (0)