Memory Management in Linux - PowerPoint PPT Presentation

About This Presentation
Title:

Memory Management in Linux

Description:

Should be flexible and ... Void *kmalloc(size, priority), Void kfree (void *) // in mm/kmalloc.c ... For each physical frame (mm.h): typedef struct page ... – PowerPoint PPT presentation

Number of Views:713
Avg rating:3.0/5.0
Slides: 31
Provided by: AnandSivas2
Learn more at: https://www.cse.psu.edu
Category:

less

Transcript and Presenter's Notes

Title: Memory Management in Linux


1
Memory Management in Linux
  • Anand Sivasubramaniam

2
Two Parts
  • Architecture Independent Memory
  • Should be flexible and portable enough across
    platforms
  • Implementation for a specific architecture

3
Architecture Independent Memory Model
  • Process virtual address space divided into pages
  • Page size given in PAGE_SIZE macro in asm/page.h
  • (4K for x86 and 8K for Alpha)
  • The pages are divided between 4 segments
  • User Code, User Data, Kernel Code, Kernel Data
  • In User mode, access only User Code and User Data
  • But in Kernel mode, access also needed for User
    Data

4
  • put_user(), get_user(), memcpy_tofs(),
    memcpy_fromfs() allow kernel to access user data
    (defined in asm/segment.h)
  • Registers cs and ds point to the code and data
    segments of the current mode
  • fs points to the data segment of the calling
    process in kernel mode.
  • Get_ds(), get_fs(), and set_fs() are defined in
    asm/segment.h

5
  • Segment Offset 4 GB Linear address (32 bits)
  • Of this, user space 3 GB (defined by
    TASK_SIZE macro) and kernel space 1GB
  • Linear Address converted to physical address
    using 3 levels

Index into Page Middle Dir.
Index into Page Table
Index into Page Dir.
Page Offset
6
Page Dir. And Middle Dir. Access Functions (in
asm/page.h and asm/pgtable.h)
  • Structures pgd_t and pmd_t define an entry of
    these tables.
  • pgd_alloc_alloc()/pgd_free() to allocate and free
    a page for the page directory
  • pmd_alloc(),pmd_alloc_kernel()/pmd_free(),pmd_free
    _kernel() allocate and free a page middle
    directory in user and kernel segments.
  • pgd_set(),pgd_clear()/pmd_set(),pmd_clear() set
    and clear a entry of their tables.
  • pgd_present()/pmd_present() checks for presence
    of what the entries are pointing to.
  • pgd_page()/pmd_page() returns the base address
    of the page to which the entry is pointing
  • ..

7
Page Table Entry (pte_t)
  • Attributes
  • Presence (is page present in VAS?)
  • Read, Write and Execute
  • Accessed ? (age)
  • Dirty
  • Macros of Pgprot_type
  • PAGE_NONE (invalid)
  • PAGE_SHARED (read-write)
  • PAGE_COPY/READ_ONLY (read only, used by
    copy-on-write)
  • PAGE_KERNEL (accessibe only by kernel)

8
Page Table Functions
  • mk_pte(), Pte_clear(), set_pte()
  • pte_mkclean(), pte_mkdirty(), pt_mkread(), .
  • pte_none() (check whether entry is set)
  • pte_page() (returns address of page)
  • pte_dirty(), pte_present(), pte_young(),
    pte_read(), pte_write()

9
Process Address Space (not to scale!)
Kernel
0xC0000000
File name, Environment
Arguments
Stack
_end
bss
_bss_start
_edata
Data
_etext
Code
Header
0x84000000
Shared Libs
10
Address Space Descriptor
  • mm_struct defined in the process descriptor. (in
    linux/sched.h)
  • This is duplicated if CLONE_VM is specified on
    forking.
  • struct mm_struct
  • int count // no. of processes sharing this
    descriptor
  • pgd_t pgd //page directory ptr
  • unsigned long start_code, end_code
  • unsigned long start_data, end_data
  • unsigned long start_brk, brk
  • unsigned long start_stack
  • unsigned long arg_start, arg_end, env_start,
    env_end
  • unsigned long rss // no. of pages resident in
    memory
  • unsigned long total_vm // total of bytes in
    this address space
  • unsigned long locked_vm // of bytes locked in
    memory
  • unsigned long def_flags // status to use when
    mem regions are created
  • struct vm_area_struct mmap // ptr to first
    region desc.
  • struct vm_area_struct mmap_avl // faster
    search of region desc.

11
Region Descriptors
  • Why even allocate all of the VAS? Allocate only
    on demand.
  • Use region descriptors for each allocated region
    of VAS
  • Map allocated but unused regions to same physical
    page to save space.
  • struct vm_area_struct
  • struct mm_struct vm_mm // descriptor of VAS
  • unsigned long vm_start, vm_end // of this region
  • pgprot_t vm_page_prot // protection attributes
    for this region
  • short vm_avl_height
  • struct vm_avl_left
  • vm_area_struct vm_avl_permission // right hand
    child
  • vm_area_struct vm_next_share, vm_prev_share
    // doubly linked
  • vm_operations_struct vm_ops
  • struct inode vm_inode // of file mapped, or
    NULL anonymous mapping
  • unsigned long vm_offset // offset in file/device

12
  • If vm_inode is NULL (anonymous mapping), all PTEs
    for this region point to the same page.
  • If the process does a write to any of these
    pages, the faulting mechanism creates a new
    physical page (copy-on-write).
  • This is used by the brk() system call.
  • Operations specific to this region (including
    fault handling) are specified in
    vm_operations_struct.
  • Hence, different regions can have different
    functions.

13
  • Struct vm_operations_struct
  • void (open)(struct vm_area_struct )
  • void (close)(struct vm_area_struct )
  • void (unmap)()
  • void (protect)()
  • void (sync)()
  • unsigned long (nopage)(struct vm_area_struct ,
    unsigned long address, unsigned long page, int
    write_access)
  • void (swapout)(struct vm_area_struct , unsigned
    long, pte_t )
  • pte_t (swapin)(struct vm_area_struct , unsigned
    long, unsigned long)

14
Traditional mmap()
  • int do_mmap(struct file , unsigned long addr,
    unsigned long len, unsigned long prot, unsigned
    long flags, unsigned long off)
  • Creates a new memory region
  • Creates the required PTEs
  • Sets the PTEs to fault later
  • The handler (nopage) will either copy-on-write if
    anonymous mapping, or will bring in the required
    page of file.

15
How is brk() implemented?
  • Check whether to allocate (deny if not enough
    physical memory, exceeds its VA limits, or
    crosses stack).
  • Then call do_mmap() for anonymous mapping between
    the old and new values of brk (in process table).
  • Return the new brk value.

16
Kernel Segment
  • On a sys call, CS points to kernel segment. DS
    and ES are set to kernel segment as well.
  • Next, FS is set to user data segment.
  • Put_user() and get_user() can then access user
    space if needed.
  • The address parameters to these functions cannot
    exceed 0xc0000000.
  • Violation of this should result in a trap,
    together with any writes to a read-only page
    (creates a problem on 386, while the problem does
    not exist in 486/Pentium)
  • Hence, verify_area() is typically called before
    performing such operations.
  • Physical and Virtual addresses are same except
    for those allocated using vmalloc().
  • Kernel segment shared across processes (not
    switched!)

17
Memory Allocn for Kernel Segment
  • Static
  • Memory_start console_init(memory_start,
    memory_end)
  • Typically done for drivers to reserve areas, and
    for some other kernel components.
  • Dynamic
  • Void kmalloc(size, priority), Void kfree (void
    ) // in mm/kmalloc.c
  • Void vmalloc(size), void vmfree(void ) // in
    mm/vmalloc.c
  • Kmalloc is used for physically contiguous pages
    while vmalloc does not necessarily allocate
    physically contiguous pages
  • Memory allocated is not initialized (and is not
    paged out).

18
kmalloc() data structures
sizes
size_descriptor
page_descriptor
bh
bh
bh
bh
bh
bh
Null
Null
19
vmalloc()
  • Allocated virtually contiguous pages, but they do
    not need to be physically contiguous.
  • Uses __get_free_page() to allocate physical
    frames.
  • Once all the required physical frames are found,
    the virtual addresses are created (and mappings
    set) at an unused part.
  • The virtual address search (for unused parts) on
    x86 begins at the next address after physical
    memory on an 8 MB boundary.
  • One (virtual) page is left free after each
    allocation for cushioning.

20
vmalloc vs kmalloc
  • Contiguous vs non-contiguous physical memory
  • kmalloc is faster but less flexible
  • vmalloc involves __get_free_page() and may need
    to block to find a free physical page
  • DMA requires contiguous physical memory

21
Paging
  • All kernel segment pages are locked in memory (no
    swapping)
  • User pages can be paged out
  • Complete block device
  • Fixed length files in a file system
  • First 4096 bytes are a bitmap indicating that
    space for that page is available for paging.
  • At byte 4086, string SWAP_SPACE is stored.
  • Hence, max swap of 40868-1 32687 pages
    130784KB per device or file
  • MAX_SWAPFILES specifies number of swap files or
    devices
  • Swap device is more efficient than swap file.

22
  • Inform swap space to kernel using
  • int sys_swapon(char swapfile, int swapflags)
  • Ceates an entry in swap_info table.
  • struct swap_info_struct
  • unsigned int flags
  • kdev_t swap_device
  • struct indoe swap_file
  • unsigned char swap_map // ptr to table, with 1
    byte for each page to indicate how many processes
    are referring to this page
  • unsigned char swap_lockmap // ptr to bitmap,
    bit indicating lock
  • int lowest_bit, highest_bit // to calculate
    maximum page number
  • unsigned long max // highest_bit 1
  • int prio // priority for this swap space
  • int cluster_nr, cluster_next // to cluster pages
    on storage device
  • int next

23
  • For each physical frame (mm.h)
  • typedef struct page
  • struct page prev, next // doubly linked
  • struct inode inode unsigned long offset //
    where to swap
  • struct page prev_hash, next_hash // in hash
    list of pages in page cache
  • atomic_t count // number of users of this page
  • unsigned dirty16, age8
  • struct buffer_head buffers // if it is part of
    a block buffer
  • unsigned long map_nr // frame
  • struct wait_queue wait // Tasks waiting for
    page to be unlocked
  • unsigned flags
  • mem_map_t

24
Finding a Physical Page
  • unsigned long __get_free_pages(int priority,
    unsigned long order, int dma) in
    mm/page_alloc.c
  • Priority
  • GFP_BUFFER (free page returned only if available
    in physical memory)
  • GFP_ATOMIC (return page if possible, do not
    interrupt current process)
  • GFP_USER (current process can be interrupted)
  • GFP_KERNEL (kernel can be interrupted)
  • GFP_NOBUFFER (do not attempt to reduce buffer
    cache)
  • order says give me 2order pages (max is 128KB)
  • dma specifies that it is for DMA purposes

25
  • First tries to find a free frame using Buddy
    system.
  • Table free_area keeps appropriate data
    structures.

26
  • If you cannot find a free page,
  • int try_to_free_page(int priority, int dma, int
    wait)
  • static int state 6 int I 6 int stop
  • stop 3 if (wait) stop 0
  • switch (state)
  • do
  • Case 0
  • if (shrink_mmap(i,dma)) return 1
  • state 1
  • Case 1
  • if (shm_swap(i,dma)) return 1
  • state 2
  • Default
  • if (swap_out(i,dma,wait)) return 1
  • state 0
  • i--
  • while ((i-stop) gt 0)
  • return 0

27
  • shrink_mmap() tries to discard pages in page
    cache or buffer cache that have only one user
    currently, and have not been references since the
    last cycle. The number of examined pages depends
    on priority.
  • shm_swap() tries pages allocated for shared
    memory.
  • swap_out()
  • Uses swap_cnt to determine how many pages to swap
    out for current process before moving on to next.
  • Always start where you left off last time (Clock
    algorithm)
  • Uses swap_out_process() function, which then
    calls try_to_swap_out() for each possible page
    present in memory (and is not locked).
  • try_to_swap_out() checks the age attribute in
    mem_map data structure, and the page is selected
    if this is 0.
  • VM areas swapout() operation is called.
  • Write back if the page is dirty
  • Invalidate page table entry.

28
  • kswapd kernel thread running in background is
    activated each time the number of free pages
    falls below a critical level.
  • This thread calls the try_to_free_page()
    function.
  • A block of memory is released using free_pages().
    When the number of users reaches 0, the frames
    are entered in free_area.

29
Page Fault
  • Error code written onto stack, and the VA is
    stored in register CR2
  • do_page_fault(struct pt_regs regs, unsigned long
    error_code) is now called.
  • If faulting address is in kernel segment, alarm
    messages are printed out and the process is
    terminated.
  • If faulting address is not in a virtual memory
    area, check if VM_GROWSDOWN for the nexy virtual
    memory area is set (I.e. Stack). If so, expand
    VM. If error in expanding send SIGSEGV.
  • If faulting address is in a virtual memory area,
    check if protection bits are OK. If not legal,
    send SIGSEGV. Else, call do_no_page() or
    do_wp_page().

30
  • void do_wp_page(struct task_struct task, struct
    vm_area_struct vma, unsigned long address, int
    write_access)
  • Check if page is mapped in
  • If it is referenced only once, change permissions
  • Else, create a copy and entry in page table to
    this copy with write permissions on
  • void do_no_page(struct task_struct task, struct
    vm_area_struct vma, unsigned long address, int
    write_access)
  • If there is no nopage() handler, an empty page is
    mapped to this area
  • Else, do_swap_page(struct task_struct, struct
    vm_area_struct, unsigned long address, pte_t, int
    write_access) is called.
  • If no swap_in() handler is defined,
    swap_in(struct task_struct, struct
    vm_area_struct, pte_t , unsigned long entry, int
    write_access) is called.
  • entry contains swap space and page number.
  • This calls swap_free() to release te page in swap
    space and the swap_map counter is decremented.
Write a Comment
User Comments (0)
About PowerShow.com