Title: Windows Heap Exploitation Win2KSP0 through WinXPSP2
1Windows Heap Exploitation(Win2KSP0 through
WinXPSP2)
Original CanSecWest 04 Presentation Matt Conover
Oded Horovitz XP SP2 Additions added/presented,
Matt Conover _at_ SyScan 2004
2Agenda
- Practical Windows heap internals
- How to exploit Win2K WinXP SP1 heap overflows
- 3rd party (me ?) assessment of WinXP SP2
improvements - How to exploit WinXP SP2 heap overflows
- Summary
3Windows Heap Internals
- Many heaps can coexist in one process (normally
2-3)
PEB
Default Heap
2nd Heap
4Windows Heap Internals
- Important heap structures
Segments
Segment List
Virtual Allocation list
Free Lists
Lookaside List
5Windows Heap Internals
- Introduction to Free Lists
- 128 doubly-linked list of free chunks (from 8
bytes to 1024 bytes) - Chunk size is table row index 8 bytes
- Entry 0 is a variable sized free lists contains
buffers of 1KB lt size lt 512KB, sorted in
ascending order
1400
2000
2000
2408
16
16
48
48
6Windows Heap Internals
- Lookaside Table
- Used for fast allocates and deallocates when
available - Starts empty
- 128 singly-linked lists of busy chunks (free but
left marked as busy)
16
48
48
7Windows Heap Internals
- Why have lookasides at all? Speed!
- Singly-linked
- Used to quickly allocate or deallocate
- No coalescing (leads to fragmentation)
- So the lookaside lists fill up quickly (4
entries)
8Windows Heap Internals
- Basic chunk structure 8 Bytes
Overflow direction
9Windows Heap Internals
- Free chunk structure 16 Bytes
Previous chunk size
Self Size
Segment Index
Flags
Unused bytes
Tag index (Debug)
Next chunk
Previous chunk
10Windows Heap Internals
- Allocation algorithm (high level)
- If size gt 512K, virtual memory is used (not on
heap) - If lt 1K, first check the Lookaside lists. If
there is no free entries on the Lookaside, check
the matching free list - If gt 1K or no matching entry was found, use the
heap cache (not discussed in this presentation). - If gt 1K and no free entry in the heap cache, use
FreeLists0 (the variable sized free list) - If still cant find any free entry, extend heap
as needed
11Windows Heap Internals
- Allocate algorithm FreeLists0
- This is usually what happens for chunk sizes gt 1K
- FreeLists0 is sorted from smallest to biggest
- Check if FreeLists0-gtBlink to see if it is big
enough (the biggest block) - Then return the smallest free entry from free
list0 to fulfill the request, like this - While (Entry-gtSize lt NeededSize)
- Entry Entry-gtFlink
12Windows Heap Internals
- Allocate algorithm Virtual Allocate
- Used when ChunkSize gt VirtualAlloc threshold
(508K) - Virtual allocate header is placed on the
beginning of the buffer - Buffer is added to busy list of virtually
allocated buffers (this is what Halvars
VirtualAlloc overwrite is faking)
13Windows Heap Internals
- Free Algorithm (high level)
- If the chunk lt 512K, it is returned to a
lookaside or free list - If the chunk lt 1K, put it on the lookaside (can
only hold 4 entries) - If the chunk lt 1K and the lookaside is full, put
it on the free list - If the chunk gt 1K put it on heap cache (if
present) or FreeLists0
14Windows Heap Internals
- Free Algorithm Free to Lookaside
- Free buffer to Lookaside list only if
- The lookaside is available (e.g., present and
unlocked) - Requested size is lt 1K (to fit the table)
- Lookaside is not full yet (no more than 3
entries already) - To add an entry to the Lookaside
- Put to the head of Lookaside
- Point to former head of Lookaside
- Keep the buffer flags set to busy (to prevent
coalescing)
15Windows Heap Internals
Step 1 Buffer free
Step 2 Buffer removed from free list
Step 3 Buffer removed from free list
A B Coalesced
A B C Coalesced
Step 4 Buffer placed back on the free list
16Windows Heap Internals
- Free Algorithm Coalesce
- Where coalesce cannot happen
- Chunk to be freed is virtually allocated
- Chunk to be freed will be put on Lookaside
- Chunk to be coalesced with is busy
- Highest bit in chunk flags is set
17Windows Heap Internals
- Free Algorithm Coalesce (cont)
- Where coalesce cannot happen
- Chunk to be freed is first ? no backward
coalesce - Chunk to be freed is last ? no forward coalesce
- The size of the coalesced chunk would be gt 508K
18Windows Heap Internals
- Summary Questions?
- Just remember
- Lookasides are allocated from and freed to before
free lists - FreeLists0 is mainly used for 1K lt ChunkSize lt
512K - Coalescing only happens for entries going onto
FreeList, not lookaside list - Entries on a certain lookaside will stay there
until they are allocated from
19Heap Exploitation Basic Terms
- 4-byte Overwrite
- Able to overwrite any arbitrary 32-bit address
(WhereTo) with an arbitrary 32-bit value
(WithWhat) - 4-to-n-byte Overwrite
- Using a 4-byte overwrite to indirectly cause an
overwrite of an arbitrary-n bytes
20Arbitrary Memory Overwrite Explained
- Coalesce-On-Free 4-byte Overwrite
- Utilize coalescing algorithms of the heap
- This is the method first discussed by Oded and I
at CSW04 it is our preferred method for
reliable heap exploitation on all versions lt
XPSP2 - Just make sure to fill the LookasideChunkSize
(put 4 entries on heap) before freeing a chunk of
ChunkSize to ensure coalescing - Arbitrary overwrite happens when the overflowed
buffer gets freed
Overflow start
21Arbitrary Memory Overwrite
- Lookaside List Head Overwrite
- 4-to-n-byte overwrite
- What we want to do is overwrite a Lookaside list
head and then allocate from it - We must be the first one to allocate that size
- We will get a chunk back pointing to whatever
location in memory we want - Use this to overwrite a function pointer or put
the shellcode at a known writable location
22Arbitrary Memory Overwrite
- Lookaside List Head Overwrite How To
- Use the Coalesce-on-Free Overwrite, with these
values - FakeChunk.Blink LookasideChunkSize where
ChunkSize is a pretty infrequently allocated size - FakeChunk.Flink what we want a pointer to
- To calculate the FakeChunk.Blink value
- LookasideTable HeapBase 0x688
- Index (ChunkSize/8)1
- FakeChunk.Blink LookasideTable Index
EntrySize (0x30) - Set FakeChunk.Flags 0x20, FakeChunk.Index
1-63, FakeChunk.PreviousSize 1, FakeChunk.Size
1
23Exploition Made Simple
- Overwrite PEB lock routine to point to PEB space
- Put shellcode into PEB space
- Then cause the PEB lock routine to execute
PEB Header
PEB lock/unlock function pointers 0x7ffdf020,
0x7ffdf024
0x7ffdf130
1k of payload
24Exploitation Made Simple
- Win2K through WinXP SP1 in a single attempt
- First 4-byte overwrite
- Blink 0x7ffdf020,
- Flink 0x7ffdf154
- 4-to-n-byte overwrite
- Blink Lookaside(n/8)1
- Flink 0x7ffdf154
- Be the first to allocate n bytes (cause
HeapAlloc(n)) - Put your shellcode into the returned buffer
- All done! Either wait, or cause a crash
immediately - For example, do 4-byte overwrite with Blink
0xABABABAB
25Exploitation Made Simple
- Forcing Shellcode To Run
- Most applications (read everyone but MSSQL)
dont specially handle access violations - An access violation results in ExitProcess()
being called - Once the process attempts to exit, ExitProcess()
is called - The first thing ExitProcess() does is call the
PEB lock routine - Thus, causing crash instant shellcode execution
- Nice ?
26Exploitation Made Simple
27Heap Exploitation
- Questions?
- This technique we just covered is very reliably,
providing success almost every time on all Win2K
(all service packs) and WinXP (up to SP2) - On to XP SP2.
28XP Service Pack 2
- Effects on Heap Exploitation
- New low fragmentation heap for chunks gt 16K
- PEB shuffling (aka randomization)
- New security cookie in each heap chunk
- Safe unlinking (usually) stops 4-byte overwrites
29XP Service Pack 2
- PEB Randomization
- In theory, it could have a big impact on heap
exploitation though not in reality - Prior to XP SP2, it used to always be at the
highest page available (0x7ffdf000) - The first (and ONLY the first) TEB is also
randomized - They seem to never be below 0x7ffd4000
30XP Service Pack 2
- PEB Randomization Does it make any difference?
- Not much, randomization is definitely a misnomer
- If 2 threads are present
- We can write to 0x7ffdf000-0x7ffdffff, and
- 2 other pages between 0x7ffd4000-0x7ffdefff
- If 3 threads are present
- 0x7ffde000-0x7ffdffff
- 2 other pages between 0x7ffd4000-0x7ffdefff
-
- If 11 threads are present
- 100 success, no empty pages
31XP Service Pack 2
- PEB Randomization Summary
- Provides little protection for
- Any application that have m workers per n
connections (IIS? Exchange?) - Any service in dllhost/services/svchost or any
other active surrogate process
32XP Service Pack 2
reminder overflow direction
XP SP2 Header
Current Header
33XP Service Pack 2
- Heap header cookie calculation
- If ((AddressOfChunkHeader / 8) XOR Chunk-gtCookie
XOR Heap-gtCookie ! 0) CORRUPT - Since the cookie has only 8-bits, it has 28
256 possible keys - Well randomly guess the security cookie, on
average, 1 of every 256 attempts
34XP Service Pack 2
- On the normal WinXP SP2 system, corrupting a
chunk will do nothing - Since we only overwrite the Flink/Blink of the
chunk, we corrupt no other chunks - Thus we can keep trying until we run out of memory
35XP Service Pack 2
- Summary so far
- At this point, we see that we can with enough
time trivially defeat all the other protection
mechanisms. - On to safe unlinking
36XP Service Pack 2
- Safe Unlinking
- Safe unlinking means that RemoveListEntry(B) will
make this check - (B-gtFlink)-gtBlink B (B-gtBlink)-gtFlink B
- In other words
- C-gtBlink B A-gtFlink B
- Can it be evaded? Yes, in one particular case.
Header to free
37XP Service Pack 2
- UnSafe-Unlinking FreeList Overwrite Technique
- p HeapAlloc(n)
- FillLookaside(n)
- HeapFree(p)
- EmptyLookaside(n)
- Overwrite p0 (somewhere on the heap) with
- p-gtFlags Busy (to prevent accidental
coalescing) - p -gtFlink (BYTE )ListHead(n/8)1 - 4
- p -gtBlink (BYTE )ListHead(n/8)1 4
- HeapAlloc(n) // defeats safe unlinking (ignore
result) - p HeapAlloc(n) // defeats safe unlinking
- // p now points to ListHead(n/8).Blink
38XP Service Pack 2
- Defeating Safe Unlinking (before overwrite)
4 Blink
ListHeadn-1
0 Flink
0 Flink
FreeChunk
ListHeadn
4 Blink
4 Blink
0 Flink
ListHeadn1
39XP Service Pack 2
- Defeating Safe Unlinking Step 1 (Overwrite)
4 Blink
ListHeadn-1
0 Flink
0 Flink
FreeChunk
ListHeadn
4 Blink
4 Blink
0 Flink
ListHeadn1
Now call HeapAlloc(n) to unlink FreeChunk from
ListHead FreeChunk-gtBlink-gtFlink
((FreeChunk4)0) FreeChunk-gtFlink-gtBlink)
((FreeChunk0)4) Both point to FreeChunk,
unlink proceeds!
40XP Service Pack 2
- Defeating Safe Unlinking Step 2 (1st alloc)
4 Blink
ListHeadn-1
0 Flink
ListHeadn
4 Blink
0 Flink
ListHeadn1
FreeChunk-gtBlink-gtFlink FreeChunk-gtFlink FreeChu
nk-gtFlink-gtBlink FreeChunk-gtBlink Returns
pointer to previous FreeChunk
41XP Service Pack 2
- Defeating Safe Unlinking Step 3 (2nd alloc)
4 Blink
ListHeadn-1
0 Flink
ListHeadn
4 Blink
0 Flink
ListHeadn1
Returns pointer to ListHeadn-1.Blink Now the
FreeLists point to whatever data the user puts in
it
42XP Service Pack 2
43XP Service Pack 2
- Unsafe-Unlinking FreeList Overwrite Technique
- For vulnerabilities where you can control the
allocation size, safe unlinking can be evadable. - But is this reliable? Hardly.
44XP Service Pack 2
- Unsafe-Unlinking FreeList Overwrite Technique
(cont) - We have to flood the heap with this repeating 8
byte sequence - FreeListHead-4FreeListHead4
- And hope the Chunks Flink/Blink pair is within
the range we can overflow - But there is an even easier method
45XP Service Pack 2
- Chunk-on-Lookaside Overwrite Technique
- In fact on XP SP2, there is an even easier method
- Lookasides lists take precedence over free lists
- This is quite convenient because
- Lookaside lists (singly linked) are easier to
exploit than the free lists (doubly linked)
46XP Service Pack 2
- Chunk-on-Lookaside Overwrites
- HeapAlloc checks the lookaside before the free
list - There is no check to see if the cookie was
overwritten since it was freed - It is a singly-linked list, thus the safe
unlinking check doesnt apply - Result a clean exploitation technique (albeit
with brute-forcing required)
47XP Service Pack 2
- Chunk-on-Lookaside Overwrites (Technique Summary)
- // We need at least 2 entries on lookaside
- a_n0 HeapAlloc(n)
- a_n1 HeapAlloc(n)
- HeapFree(a_n1)
- HeapFree(a_n0)
- Overwrite a_n0 (somewhere on the heap) with
- a_n0.Flags Busy (to prevent accidental
coalescing) - a_n0.Flink AddressWeWant
- HeapAlloc(n) // discard, this returns a_n0
- p HeapAlloc(n)
- p now points to AddressWeWant
48XP Service Pack 2
- Chunk-on-Lookaside Overwrite - Success rate?
- Reqiures overwriting a chunk already freed to the
lookaside - If an attacker overflows a buffer repeatedly, how
often will he/she need to before succeeding?
49XP Service Pack 2
- Chunk-on-Lookaside Overwrite Empirical results
- 64K heap with 1 segment
- All chunk sizes sizes between 8-1024 bytes
- Max overflow size 1016 bytes
- Random number of allocs between 10-1000
- Free probability of 50
- Took an average of 84 allocations to be within
overflow range - It will take at least 2 overwrites (one to
overwrite a function pointer, one to place
shellcode)
50XP Service Pack 2
- Chunk-on-Lookaside Overwrite Empirical results
- Application specific function pointer and
writable location for shellcode - 842 168 attempts to execute shellcode
- Using PEB lock routine PEB space (application
generic) - 842122,016 attempts to execute shellcode
- The 12 is for the 12 possible locations of the
PEB due to PEB randomization
51XP Service Pack 2
- Chunk-on-Lookaside Overwrite Summary
- To exploit a non-application specific heap
exploit will take 2000 attempts to do it
reliably - But now ask yourself how long does it take
generate 2000 heap overwrite attempts? - Lets be overly conservative and assume 5 minutes
- That will really slow down a worm
- But will it help you if someone is specifically
trying to hack your machine?
52XP Service Pack 2
- Low Fragmentation Heap (LFH)
- Looks really solid kudos to its author ?
- Uses 32-bit cookie
- Obscures address of Lookaside list heads
- ChunkSizes ((DWORD )Chunk) //
(ChunkSizeltlt16PrevChunkSize) - pLookasideEntry (DWORD)Chunk / 8
- pLookasideEntry Lookaside-gtKey
- pLookasideEntry ChunkSizes
- pLookasideEntry RtlpLFHKey
53XP Service Pack 2
- Low Fragmentation Heap (LFH)
- The RtlpLFHKey is a show stopper
- push eax
- call _RtlRandomEx_at_4
- mov _RtlpLFHKey, eax
- lea eax, ebpvar_4
- push eax
- call _RtlRandomEx_at_4
- imul eax, _RtlpLFHKey
- push esi
- mov _RtlpLFHKey, eax
54XP Service Pack 2
- Low Fragmentation Heap (LFH)
- Must be enabled manually (via NTDLL!RtlSetHeapInf
ormation or KERNEL32!HeapSetInformation) - It is used for chunks lt 16K
- It is not used by anything on XP SP2 Professional
- What irony ?
55Summary
- Win2K WinXP SP1
- Fixed heap base and fixed PEB allow for writing
very stable exploits - Overwriting FreeList/Lookaside list heads gives
us the ability to overwrite any writable address
with 1K of data
56Summary
- WinXP SP2
- Decreases reliability (more bruteforcing is
necessary) - But with enough time, exploitation will still
succeed - XP SP2 will really slow worm propagation, but not
help a targeted victim - ...
57Summary
- WinXP SP2
- Heap corruption handling is weak
- PEB randomization is weak
- Safe unlinking is evadable
- Non-LFH cookie checks are weak
- LFH looks good
58Summary
- Solutions
- Use low fragmentation heap by default
- Just be sure it is the lowest address on the heap
- Expand PEB randomization over 1MB or so
- Most machines have 1GB RAM these days
- Inform user if heap corruption exceeds a
threshold - If I have an application with 50 corrupt chunks
in 60 seconds, I want to know someone is owning
me - Check security cookies on allocation also
59Summary
- The eventual death of 4 byte overwrites
- Whether an attacker can predict the
ChunkSize/PrevSize or not, he/she wont be able
to predict a larger security cookie (like LFH
has). - Heap exploits will focus more on attacking
application data on the heap (not the heap itself)