Memory Management in Linux - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Memory Management in Linux

Description:

Memory Zones. Each node has several 'zones' Each zone is different type of memory ... The Buddy System Algorithm ... if the adjacent buddy block is also free, ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 46
Provided by: jianzho
Category:

less

Transcript and Presenter's Notes

Title: Memory Management in Linux


1
Memory Management in Linux
  • Q36921204 ???

2
Memory Model
  • Linear Address Space
  • Each process has its own memory address space
  • Mapping to Physical Memory by kernel MM
  • X86 architecture 32bit (4GB)
  • Actual Physical Memory
  • Can range from KB to TB
  • Larger than 4GB in i386 architecture? Need PAE
  • Linux Memory Model
  • Architecture-independent (mm/)
  • Must be mapped to architecture specific
    (arch/xxx/mm)

3
Process Memory Layout
  • User Space
  • Code segment, data segment (include/asm/page.h)
  • Addressable 0 to PAGE_OFFSET-1
  • Kernel Space
  • Code segment, data segment
  • Shared by all processes
  • Addressable PAGE_OFFSET-1 to (unsigned)(-1)
  • In i386 architecture
  • PAGE_OFFSET0xc0000000
  • This means 1GB reserved for kernel, 3GB for user

4
A Very Abstract Model
5
Architecture Specific
  • Page Basic unit of memory management
  • PAGE_SIZE (include/asm/page.h)
  • i386 architecture page size is 4KB
  • Alpha architecture page size is 8KB
  • Addressable memory space
  • 32-bit iX86 architectures 4GB
  • 64-bit Alpha architecture 8TB
  • Memory management highly related to hardware
  • Lots of routines implemented in assembly code

6
Page Translation Tables
  • Linear virtual address
  • Low bits offset in the page (e.g., 12 bits in
    32b arch)
  • High bits to identify the page unit (e.g., 20
    bits)
  • Page translation
  • Mapping the high bits to the base address of the
    physical page
  • OS kernel prepare the tables (pointed by CR3)
  • Actual translation done by hardware
  • Multi-level page tables
  • Single-level is not efficient for large address
    space

7
3-Level Page Table
  • 3-level of indirect table access to address a
    page
  • Page Global Directory -gt Page Middle Directory -gt
    Page Table -gt Page
  • Look for pgd_t, pmd_t, pte_t
  • In include/asm-xxx/ page.h, pgalloc.h, pgtable.h
  • x86 memory address
  • 10-bit (table), 0-bit (table), 10-bit (table),
    12-bit (page)
  • include/asm-i386/pgtable-2level.h
  • Alpha memory address
  • 10-bit, 10-bit, 10-bit, 13-bit (page)

8
Page Table Entries
  • Page Table pte_t
  • Each entry 32-bit integer
  • Fields the high bits, flags
  • Flags present, accessed, dirty, R/W, user, ...
  • Look for _PAGE_ macros
  • Functions
  • Look for pte_() macros
  • Page Directories Functions
  • Look for pgd_() and pmd_() macros

9
Virtual Memory
  • Linux Virtual Memory Management
  • Mapping, allocating, and managing physical pages
  • Managing secondary memory swapping
  • Linux terminology
  • Swapping demand-paging
  • Architecture independent
  • A clean interface to support different memory
    mapping used under different architecture
  • To start include/linux/mm.h, mm/

10
Using Memory in Kernel
  • Kernel space is permanent mapped
  • No virtual or secondary memory
  • Be careful when you write code in kernel space
  • Dont use too much memory
  • Never use recursive call (kernel stack is limited
    too)
  • Routines
  • kmalloc(), kfree()
  • vmalloc(), vfree()
  • Macros to allocate pages

11
Managing Physical Memory
  • NUMA (Non-Uniform Memory Access)
  • Memory in a machine is divided into nodes
  • memory access costs (non-uniform)
  • same node fast
  • different node slower
  • Each node has the same access time
  • Data structure pg_data_t (include/linux/mmzone.h)
  • Memory Zones
  • Each node has several zones
  • Each zone is different type of memory
  • Data structure zone_t (include/linux/mmzone.h)
  • Pages (Frames)
  • Data structure struct page (include/linux/mm.h)

12
Nodes and Zones
pgdat_list
Nodes
pg_data_t
free_area0

Zones
free_area9
zone_t
zone_pgdat
zone_mem_map
13
3 Zones
  • ZONE_DMA
  • 0-16M (i386)
  • DMA capability some device driver need to use
    this memory for I/O
  • ZONE_NORMAL
  • 16M-896M (i386)
  • Normal memory direct mapped by kernel
  • ZONE_HIGHMEM
  • gt896M (i396)
  • Not used in 64-bit architecture

14
Physical Pages
List
List
List
struct page struct list_head list ...
atomic_t count unsigned long flags struct
list_head lru mem_map_t
free_area0

free_area9
zone_pgdat
zone_mem_map
15
Page Allocation
  • Contiguous and non-contiguous allocation
  • The Buddy System Algorithm
  • Pages are allocated in blocks which are powers of
    2 in size (1, 2, 4, 8, 16, 32, 64, 128, 256, 512
    pages)
  • Each size with its own free list (called free
    area)
  • De-allocation if the adjacent buddy block is
    also free, combine to form a new free block for
    the next size block of pages
  • Data structure free_area_t

16
Page Allocation Functions
  • Allocate a block of 2order contiguous pages
  • struct page alloc_pages(unsigned int gfp_mask,
    unsigned int order)
  • Allocate a single page
  • struct page alloc_page(unsigned int gfp_mask)
  • Other functions See include/linux/mm.h
  • GFP flags where to allocate the page(s)
  • GFP_KERNEL, GFP_USER, GFP_DMA, ...

17
Noncontiguous Memory
  • Noncontiguous page frames in contiguous linear
    address
  • Not all virtual memory maps to the contiguous
    page frames
  • Make sense for infrequent use
  • To allocate (in include/linux/vmalloc.h)
  • void vmalloc(unsigned long size)
  • To release
  • void vfree(const void addr)

18
Contiguous vs Non-Contiguous
19
Dynamic Kernel Memory
  • For small memory use 32, 64, ..., 131072 bytes
  • For kernel data objects that need to be
    frequently allocated and released
  • Examples file descriptors, sockets, inodes,
  • Slab Allocator
  • Group memory area (called cache) by kernel data
    object type
  • For each type, maintain memory cache (free list)
    for objects that are previously allocated then
    released
  • Interface with page frame allocator
  • Source code mm/slab.c

20
Caches
  • One cache for each type of object
  • Look at /proc/slabinfo
  • size-N and size-N (DMA) general purpose caches
  • Data structures
  • Cache struct kmem_cache_s
  • Slab struct slab_s
  • Functions
  • To create a cache kmem_cache_create(...)
  • To allocate an object kmem_cache_alloc(...)
  • To free (return to cache) kmem_cache_free(...)

21
Caches and Slabs
22
General Purpose Cache
  • For general purpose object
  • 13 general purpose cache in slab allocator
  • Power of 2 32, 64, , 131072
  • To allocate memory in this category
  • include ltlinux/slab.hgt
  • void kmalloc(size_t size, init flags)
  • Flags SLAB_
  • To free memory
  • void kfree(const void objp)

23
Summary Kernel Memory
Kernel data objects
Allocation functions kmem_cache_alloc(),
kmalloc()
Caches and slabs
Kernel virtual address space
Allocation functions alloc_pages(), vmalloc()
Page Tables
Physical Memory nodes, zones, pages
24
Process Address Space
  • Abstract model of user-space memory use

25
Address Space in ELF Format
  • Program segments
  • Code start_code to end_code
  • Data start_data to end_data
  • BSS (Heap) start_brk to brk (growable)
  • Can have more than one segment (program, shared
    libraries, etc.)
  • Run-Time segment
  • Stack start_stack (growable)
  • Arguments arg_start to arg_end
  • Environment env_start to env_end (0xbfffffff)

26
Process Memory Descriptor
  • One per process (in include/linux/sched.h)
  • struct mm_struct
  • struct vm_area_struct mmap
  • pgd_t pgd
  • unsigned long start_code, end_code, start_data,
    end_data
  • unsigned long start_brk, brk, start_stack
  • unsigned long arg_start, arg_end, env_start,
    env_end

27
VM Area Descriptor
  • Linear Address Interval (in include/linux/mm.h)
  • struct vm_area_struct
  • struct mm_struct vm_mm
  • unsigned long vm_start
  • unsigned long vm_end
  • struct vm_area_struct vm_next
  • .

28
VM Area
29
VM Area Handlers
  • To Create VM Area and Do the Mapping
  • do_mmap() (in include/linux/mm.h)
  • Used in system calls execve() , brk()
  • To Release a VM Area and Shrink the Address Space
  • do_munmap() (in mm/mmap.c)
  • Used in system calls brk()
  • To Find a VM Area that includes a given address
  • find_vma() (in mm/mmap.c)

30
VM Area Lookup by Address
  • Given a virtual address, need to look up a VM
    Area fast
  • Used in page fault, memory mapping, VM area
    operations (like locking, etc.)
  • Data structure Red-Black tree and Cache
  • struct mm_struct
  • ....
  • rb_root_t mm_rb
  • struct vm_area_struct mmap_cache
  • ....

31
Find VMA
  • More data structure
  • struct vm_area_struct
  • ....
  • rb_node_t vm_rb
  • ....
  • Function
  • struct vm_area_struct find_vma(struct mm_struct
    mm, unsigned long addr)

32
Shared Memory
  • Two (or more) processes can shared the same
    physical memory region
  • May mapped to different virtual addresses
  • Through system calls
  • shmat(), shmdt()
  • Implicitly
  • Shared library

33
VM Area Shared Memory
34
Backing Store
  • Each VM area can be mapped to a file (in
    secondary memory)
  • Explicit memory mapping through system call
  • mmap(), munmap(), mremap()
  • Implicit mmaping
  • Code segment (loading from an excutable binary
    file)
  • Swapping (mapped to the swap file)

35
VM Area Backing Store
  • Data structure (in include/linux/mm.h)
  • struct vm_area_struct
  • ....
  • unsigned long vm_pgoff / offset in page /
  • struct file vm_file / mapped file /
  • ....

36
Swapping and Page Cache
  • For Pages in a Processs User Space
  • Swap Secondary Memory on the disk
  • Page Cache Main Memory
  • Data Structure 3 sets of lists
  • Active pages, usually mapped by a processs PTE
  • active_list (in mm/page_alloc.c)
  • Inactive, unmapped, clean or dirty
  • inactive_dirty_list (in mm/page_alloc.c)
  • Clean pages, unmapped (one list per zone)
  • zone_t.inactive_clean_list (in
    include/linux/mmzone.h)

37
Kernel Swap Daemon
  • Implemented as a kernel thread
  • kswapd() (in mm/vmscan.c)
  • Wake up periodically
  • Wake up more frequently if memory shortage
  • Check memory and if memory is tight
  • Age pages that have not be used
  • Move pages to inactive lists
  • Write dirty pages to disk
  • Swap pages out if necessary

38
More kswapd()
  • Call swap_out() to scan inactive page lists
  • Removes page reference from process s page table
  • Actual swapping is done independently by file I/O
  • Call refill_inactive_scan() to
  • Scan the active_list to find unused page
  • Call age_page_down() to reduce page-gtage count
  • If page-gtage is zero, move to inactive_dirty_list
  • Call page_launder() to clean dirty pages
  • Scan inactive_dirty_list for dirty pages, write
    to disk
  • Move clean pages to the zone s
    inactive_clean_list

39
Life Cycle of a Page in Cache
  • A page is read into memory from disk
  • Caused by page-fault, or read-ahead
  • Added to page cache active_list
  • Page is dirty if written by the process
  • If not used, page-gtage count gradually reduced
  • If page-gtage is zero, moved to inactive_dirty_list
  • May be moved to an inactive_clean_list later
  • Page can be recovered from an inactive list
  • Inactive page can be released
  • reclaim_page() to free page for used by others

40
Demand Paging
  • Page frame for a VM area is not in core
  • Page frame is not allocated when VM area is
    created
  • Page frame can be swapped out
  • Handled by page fault

41
Page Fault
  • Exception handler
  • Raised by address translation (hardware)
  • Call do_page_fault() to handle this interrupt
  • do_page_fault()
  • Architecture-specific
  • i386 arch/i386/mm/fault.c
  • Find a page frame in the physical memory
  • Load the missing page in
  • Update the page tables

42
do_page_fault()
  • vma find_vma(mm, address)
  • if (!vma)
  • goto bad_area
  • if (vma-gtvm_start lt address)
  • goto good_area
  • good_area
  • handle_mm_fault(mm, vma, address, write)
  • bad_area
  • force_sig_info(SIGSEGV, info, tsk)

43
handle_mm_fault()
  • pgd pgd_offset(mm, address)
  • pmd pmd_alloc(mm, pgd, address)
  • if (pmd)
  • pte pte_alloc(mm, pmd, address)
  • if (pte)
  • return handle_pte_fault(...)
  • handle_pte_fault()
  • do_no_page() if pte entry is all-zero
  • Do_swap_page() if pte entry is none-zero

44
Summary Process Memory
  • Process virtual address space

Memory Area
Page Tables
Page Fault
Backing Store
Kswapd
Physical Memory
45
Memory Management Summary
  • Physical Page Management
  • Page Table Structure
  • Memory Zones and Page Allocation
  • Kernel Memory Management
  • Slab and Allocation Routines
  • Virtual Memory Management (Process Address Space)
  • Memory Layout and VM Area
  • Page Cache and Swapping
  • Page Fault
Write a Comment
User Comments (0)
About PowerShow.com