Title: Linux2.2%20on%20x86%20??%20???????
1Linux2.2 on x86 ?????????
2??
- x86????????????????
- ???????????????
- ???????
- Linux ?????????
- Linux?????????
- ???????????????
- ????????
- ???????(demand paging, copy on write)
- ???????(buddy system, slab allocator)
3x86???????????????
- x86?????????????????????????
- ???????????
- ??????(????)??????????????????????????????????????
???????(required) - ????(?????)?????????????????
- ???????
- ???????????????????(??????????????????????????????
???)??????????(optional) - ????????????????????
4??????????
- x86????????????????
- Physical address(??????)
- ??????????????????????????????(????????????)??????
??(0x0 -0xFFFFFFFFh) 4G(232)?????????????????? - Linear address(???????)
- 32bit ????????????????????????????????????????????
??????????????????????????????????????????????????
??? - Logical address(??????)
- 16bit???????????32bit?????????????????????????
??????
??????
???????
Segmentation Unit
Paging Unit
5???????????????
Segmentation
Paging
Linear address
Logical address
Segment selector
offset
Directory
offset
Table
Page Directory Entry
Page Table Entry
Physical address
Segment Descriptor
Linear Address
Global Descriptor Table
Page Directory
Page Table
Page
Linear Address Space
CR3
6x86??
- ?????????
- OS ???????????????????????????????????????????????
?????????????1??????? ?????????????? - ?????????
- 4K ???????????? 32bit ? PDE(?????????????)????1024
???? PDE ?????? - ???????
- 4K ???????????? 32bit ? PTE(???????????)????1024
???? PTE ?????? - ???
- 4K ???, 2M ???, ???? 4M ??????????????????????????
??? - TLB (Translation Lookaside Buffer)
- PDE, PTE ??????????????
7Linux????????????????????
- Linux?????????????????????????????
- ???????????????????????????
- ??????????????????????????????????????
- ???????????????????
- RISC??????????????????????????
1???????????????????? ??????????????
8Linux ??????????
- ?????????????
- ??????????????????????????????????
- ?????????/????????
- ????????????????????????????????????????/?????????
????? - ??????
- ??????????????????????????????????????????????????
???????????????????????????? - ?????????????
- ??????????????????????????????????????
9Linux?????????????
- ?????????(buffer cache)
- ??????????????????????????????????????????????????
?????????????????????????? ID ?????? ??????? - ????????(page cache)
- ?????????????????????????(mmap ??file???????)?????
?????????????????????????????????? - ?????????(swap cache)
- ??????????????????????????????????????(???????????
????)??????????????????????????????????????????
10??????????????
- ???????
- Linux ?? 64bit???????????????3 ?????????(PGD,
PMD,PTE)?????? - alpha ?3????????????
- x86 ? 2 ???????PMD ????????????????????(??????????
)?x86 ????????????????????????????????????????????
???? - ??????
- ??????PAGE_OFFSET(0xC0000000h)??????
- ????? ????????????????? PAGE_OFFSET ?????????????
- ?????????????????????
- 1GB?????????????????PAGE_OFFSET????
11Linux???????(32bit)
?????? ????????
????
File System
0x00000000
Swap
0xc0000000
kernel
kernel
0x40000000
vmalloc space
0xffffffff
High mem
0xc0000000
?????
122????????(x86)
Linear Address
31
22
21
12
11
0
Directory
Offset
Table
Page
Page Table
Page Directory
133????????(Linux)
Linear Address
Global Dir
Offset
Middle Dir
Table
Page
Page Table
Page Middle Directory
Page Global Directory
?x86 Linux?? PMD??????????? (PGD?????????????PT?
??)
14?????????
vm_area_struct
mm_struct
vm_area_struct
NULL
vm_next
vm_next
mmap
vm_file
vm_file
pgd
vm_start
vm_start
NULL
task_struct
vm_end
vm_end
mm
virtual address space
PTE
PGD
PTE
PMD
Physical memory
Page cache
Swap cache
File system
Swap device
15????????????????
- struct task_struct
- ???????????????
- ???UNIX ??u?????????????
- ????????
- struct mm_struct
- ?????????????????????
- ????????
- task_struct ?????????
- struct vm_area_struct
- ?????????????????????????????
- ???????????????????????(vm_area_struct)??????/AVL
??????????? - ??????? mm_struct ?????????
16?????????
- fork(2)????
- init ???????????
- ???????(kernel/fork.c)
- 1) ??? mm_struct????????????????????(copy_
mm()) - 2) ?????????????????????????????(dup_mmap())
- PTE???????????ReadOnly?????
- Read? ????
- Write? Page Fault ????????????????
17????????????
- execve(2)????
- ???????(fs/exec.c)
- 1) ??mm_struct ?????????????(exec_mmap())
- 2) ?????????????????????????mmap??
- vm_area_struct ? mmap???(do_mmap())??????
- ???????????????????????????(????????????)
- exit(2)????
- ???????(kernel/exit.c)
- 1) mm_struct ???(exit_mm())
18????????Page fault???????
Does the address belong to the process address
space?
YES
NO
Does the access type match the memory region
access right?
Did the exception occur in User Mode?
YES
NO
YES
NO
Legal access Allocate a new Page frame
Illegal access Send a SIGSEGV signal
Kernel bug Kill the process
handle_mm_fault()
19Page fault
NO
YES
NO
YES
NO
YES
YES
NO
YES
NO
YES
NO
YES
NO
YES
NO
NO
YES
Kernel process And kernel Oops
Copy on write
Demand Paging
Send SIGSEGV
Fixup code (typically send SIGSEGV)
20??????????????? (1)
YES
NO
Is the PTE present?
?3
NO
invoke do_no_page()
Demand paging
Is the PTE empty?
invoke do_swap_page()
Have a file already mapped?
YES
NO
?2
invoke vma-gtvm_ops-gtno_page()
invoke do_anonymous_page()
?1
YES
NO
Write access?
Mapped ZERO_PAGE Set PTE as ReadOnly
Allocate a page 0 clear Set PTE as writable
21???????????????(2)
Demand paging
?1
Found the page in page cache?
NO
YES
invoke page_cache_read()
YES
NO
Is the cache valid?
Allocate a page Set PTE load the file
YES
Is the page to be shared?
NO
load the file
Allocate a page Set PTE load the file
Set PTE as shared page
22???????????????(3)
?2
Demand paging
NO
Is present vm_ops-gtswapin?
YES
invoke swap_in()
NO
YES
Is swap cache present?
Make a swap cache
Is read access or shared?
YES
NO
Set PTE with Writabledirty
Set PTE
23???????????????(4)
?3
Set PTE aging
NO
YES
Is write access?
NO
YES
invoke do_wp_page()
Do nothing
Is writable page?
Set PTE with dirty flag
?4
Copy on write
24???????????????(5)
?4
Copy on write
Does multiple Processes refer the page?
NO
YES
Set PTE as writable dirty (copy no page)
Allocate a page Set PTE as writable Copy data
from old page
25?????????(Demand Paging)
- ?????????????????????????????????????????????????
- ????
- ????????CPU???????????
- ????????????
- ?????
- Page fault ??CPU??????????
- ????
- ???????????page fault???????????
- ????????????????????
- Linux???????????????PTE ?????????NULL???
- ????????? page fault???
26????????(Copy on Write)
- ??????????????????????????????????????????????????
- ????
- ????????CPU???????????
- ????????????
- ?????
- Page fault ??CPU??????????
- ????
- ???????????????????
- ???????????????????
- Linux?????????????????????????????????ReadOnly????
?????????? - ?????? page fault ??? -gt Copy on Write ???
27??????(Swapping)
- ??????????????????????????????????????????????????
??? - ????
- ?????????????????????????
- ??????????????????????????????
- ?????
- ?????????CPU???????????
- ?????????????????????????????????????????
- ????
- ???/????????????????????
- Linux?????????????????????????????
- CPU???????????????
28??????????????
- ??? LRU (Least Recently Used)???
- ??????LRU????
- x86??????????????(?????????)
- Linux x86????????? ?????aging???
- ?????????????????????????????????????
- ????????????????????????????????????????(?????????
?????) - ???????????????????????????????????????????
- ????????????????????????????????????
29?????????
The page fault handler must swap in a page
A page must be swapped out
swap_out()
do_swap_page()
swap_out_process()
swap_in()
swap_out_vma()
swapin_readahead()
swap_out_pgd()
read_swap_cache_async()
swap_out_pmd()
try_to_swap_out()
rw_swap_page()
Low-level swapping function
brw_page()
Block device driver function
30???????????(1)
- do_try_to_free_pages()
- try_to_free_pages() -gt do_try_to_free_pages()
- ????????????? kswapd ??????
for (priority 6priority priority--)
while (shrink_mmap(priority,gfp_mask)) /
??????????? / if (????????) return
while (shm_swap(priority ,gfp_mask)) /
???????? / if (????????) return
while (swap_out(priority,gfp_mask)) /
???????????? / if (????????) return
shrink_dcache_memory()/ ?????????????/
31???????????(2)
for (counternr_tasks/(priority1)countercounter
--) int max_cnt0 struct task_struct
pbest for (init???????????) if
(??????????????? ???????? gt
max_cnt) max_cnt ????????
pbest ??????
swap_out_process(pbest,gfp_mask)
32???????????(3)
- try_to_swap_out()
- swap_out_process() ? ? try_to_swap_out()
if (????????? ??/?????????) return if
(??????????) PTE ???????????? page
????????????? return if (?????????????)
????????????????? PTE ??? __free_page ()
return if (??????????) PTE ???
__free_page () return if (??swapout ?????)
PTE??? swapout ??????? __free_page()
return ????????? PTE??? ???????????????
__free_page() return
33kswapd
- ??????????????????????????????
- ???(1?/?)??????????????????????????
- ?????????????2563???
- ???????????
while (1) while (???????????) if
(do_try_to_free_pages(GFP_KSWAPD)
if (????????) schedule()
schedule_timeout(10HZ)
34?????????(1) Buddy system
- ???????????????2??????????????
- ??? split, coalescing ????????????????????????????
???????? - ????????????2??????????????
- ?????????????????1???????????????(split)
- ?????????????????????1????????????????????(coalesc
ing) - External Fragmentation ??????????
- ???????????????????????
- Linux x86?? 20 ?? 29 ???????10??
35?????????(1) Buddy system
Buddy system
Free area list (n page/block)
10 kbytes ? ??????
1page
1page
n 20
pager
2page
2page
merge
n 21
4page
4page
n 22
split
22page1 ? ????????
8page
8page
n 23
36?????????(2) Slab allocator
- ??????????????????????????????????????????
- ?????????????????
- ???????????????????????
- ????????????????
- ???????????????????????????
- Internal Fragmentation ??????????
- ????????????????????????
- e.g.) /etc/slabinfo ??
- i-node cache, socket buffer,
37?????????(2) Slab allocator
Page-level allocator (buddy system)
back end
vnode cache
proc cache
file cache
Slab allocator
front end
active vnodes
active procs
active files
unused
coloring area
Slab data
Linked list
N page
NULL
38?????????(2) Slab allocator
- Linux ??????? slab ???
- ????????????structure??????
- slabinfo,kmem_cache, tcp_tw_bucket,
tcp_bind_bucket, tcp_open_request,
inet_peer_cache, ip_fib_hash, ip_dst_cache,
arp_cache, uhci_urb_priv, blkdev_requests,
nfs_read_data, nfs_inode_cache, nfs_write_data,
nfs_page, journal_head, revoke_table,
revoke_record, dnotify, file, fasync, uid_cache,
skbuff_head_cache, sock, sigqueue, kiobuf,
cdev_cache, bdev_cache, mnt_cache, inode_cache,
dentry_cache, filp, names_cache, buffer_head,
mm_struct, vm_area_struct, fs_cache, files_cache - 25,6,7,8,9,10,11,12,13,14,15,16,17bytes?????
- ??/DMA?
- Â ? /proc/slabinfo ?????
39????
- Understanding Linux Kernel
- Daniel P. Bovet Marco Cesati, Oreilly, 2001
- UNIX Internals The New Frontiers
- Uresh Vahalia, Prentice Hall, 1996
- Intel Architecture Developers Manual Vol1,2,3
- Intel Corp., 1999
40??????????
- ?????? 1 (real mode)
- 4MB ?????????????????????????????(pg0,
arch/kernel/head.S ???) - ???(static ???)
- ??PAGE_OFFSET, PAGE_PFFSET0x3fffff
- ?????????????
- ??????? 2 ??????????
- ?????? 2 (protect mode)
- paging_init() ????????(swapper_pg_dir,
arch/mm/init.c) - ???????? PAGE_OFFSET ?????????????????????????
- 0x0 ????????(NULL Pointer access???????)
- ?Pentium ???? 4MB ????????????