Title: How much does Exception Handling cost, really?
1How much does Exception Handling cost, really?
- Kevin Frei
- Visual C Code Generation Tools
- http//blogs.msdn.com/freik
2Reasons for this talk (too many assumptions)
- Pros of EH Ive heard
- More centralized error handling recovery
- More robust code
- More readable code
- Cons of EH Ive heard
- Can result in people not thinking about error
conditions - Can make error recovery difficult (must put
handler in the right place) - Enables abuse of exceptions
3Summary of the previous Pros Cons
- They can all be dealt with
- Coding Convention enforcement
- Code Reviews
- Good initial architecture
- Consistent API designs
41 reason I hear to not use EH
- Exception handling makes my code too slow
- May be true, but may also be masking a more
serious problem - Some Facts
- EH performance cost is dependent on the runtime,
CPU architecture, and ABI/OS specifics. - You cant simply examine source code to determine
performance impact. - Deciding whether to use EH should depend on the
team, the libraries youre using, and a myriad of
other issues.
5Classes of Code Quality impact
- Usage Penalty EH tax
- General overhead of a function with any EH
construct - Cost of entering a protected region
- __try, try, C object with a destructor
- Cleanup costs
- __finally invocation
- C object destructors
- Optimization constraints
- Cost of actually handling an exception
- If youre really concerned about this, youre
probably abusing exceptions.
6EH tax for Structured Exception Handling
- X86
- All functions with SEH contain a complex prolog
epilog - X64
- No required cost to the function itself
7EH tax for C exception handling
- X86
- All functions with C EH contain a complex
prolog epilog - X64
- 1 additional DWORD allocated on stack,
initialized to -2 - never again used in the functions code
- Its used by the C runtime in the event of an
exception being thrown or caught.
8Protected Region entry exit costs
- X86
- Entry exit from any protected region requires a
1 or 4 byte constant value written to the stack - /EHs can reduce this cost
- /EHa may be required by your code base, though
- X64
- If an entry or an exit is preceded by a call,
there is a single byte NOP to properly identify
region boundaries - Entry preceded by a call is pretty common for C
EH (constructors)
9Non-exception cleanup costs
- X86
- SEH __finally clause is called
- current implementation, not required
- call/ret overhead
- Some other minor register allocation issues
- CEH Destructor invoked inline C standard
- Destructor can be inlined, based on compiler (
user) decision - X64
- SEH __finally clause inlined zero overhead
- again, current implementation, not required
- CEH same as x86
10Optimization Constraints Disclaimer
- Consider the complete alternative solution!
- HRESULT checking is messy, and error prone
- The goto solution to handle termination can
result in pessimized dataflow - Most optimizations that must be constrained for
EH should be constrained for implementations that
dont use EH.
11Optimization constraints
- Mandatory optimization constraints
- Limitations required by the language standard
- ABI specific limitations
- Current Implementation constraints
- Ill focus on UTC (current optimizer) in VC8
- Code base from VC5 origins.
- Many constraints have been removed, which exist
in earlier versions
12Mandatory optimization constraintsLanguage
specific limitations
- The C language standard does not specify
anything about non-C throw exceptions! - The C language standard does not specify anything
about exceptions at all, really. - I know nothing about C99
13Language specific limitations C
- Flow from trys to catch (and out)
- Results in additional flow edges at call sites
that may throw exceptions - Variable values must be updated accordingly
- Slightly less constant propagation, common sub
expression elimination, dead stores, etc - /EHs assume only the C throw statement can
cause an exception - Prior to VC8.0, you could compile /EHs, and even
with an AV, most destructors would be invoked. - For VC8.0 /EHs
- If you throw a C exception, destructors will be
run. - If any other exception occurs, no destructors
will run.
14Language specific limitations /EHa
- /EHa all exceptions should be considered when
destroying C objects - Results in far more potential flow from a try
block to a catch block - Less stack packing (no stack pack prior to VC8)
- Much less constant propagation, common sub
expression elimination, etc
15Quick /EHc description
- Only has impact with /EHs
- Tells the compiler that any extern C function
will not throw any C exceptions - Win32 API calls fall under this class
- Sometimes true, sometimes not be careful.
- Only side effect is pruning a few additional
edges in the flow graph - A few more opportunities for optimization
16Mandatory Optimization ConstraintsWin32/Win64
ABI specific limitations
- Tail-call (call/return -gt jump) is illegal inside
a protected region - Instruction level performance hit is typically
negligible - Stack usage increase (can be serious)
- Instruction scheduling constraints
- Scheduling into out of handler regions is
limited - rarely worth doing, even if it is legal
17VC8.0 optimization constraints
- No impact on any functions that do not contain
some EH construct - Sometimes requires the programmer add volatile to
get required constraints to occur in function
invoked inside a try - Exception handling is only one of a large number
of things that can artificially constrain
optimizations - setjmp/longjmp (old school EH in C)
- __alloca
- __declspecs
- /GS
- /fpexcept, /fpprecise, /fprestrict
- Many many more.
18VC8.0 optimization constraintsSpecifics
- Late flow optimizations for x64
- Primarily head tail merging
- Loop optimizer disabled (all platforms) for any
function with a try/__try - Loop unrolling/peeling
- Induction variable creation
- Some strength reduction
- Doesnt impact functions with only C objects!
- Stack Packing restrictions
- Prior to VC8, all variables inside a try block
were written back to the stack whenever their
values were updated - With VC8, only variable values that may be
visible outside of the try are written back to
the stack.
19Source code used for samples
- SEH Version
- void seh_finally()
- init()
- __try
- foo()
- bar()
- blah()
- __finally
- done()
-
-
- C Version
- struct obj
- obj() init()
- obj() done()
-
- void cpp_dtor()
- obj a
- foo()
- bar()
- blah()
- No EH Version
- int noeh_cleanup()
- int result 0
- init()
- result foo_err()
- if (result)
- goto fail
- result bar_err()
- if (result)
- goto fail
- result blah_err()
- fail
- done()
- return result
20Generated code for x86 SEH /O2
- push ebp
- mov ebp, esp
- push -1
- push OFFSET __sehtable?seh_finally_at__at_YAXXZ
- push OFFSET __except_handler3
- mov eax, DWORD PTR fs0
- push eax
- mov DWORD PTR fs0, esp
- sub esp, 8 End Prolog
- call init
- mov DWORD PTR __SEHRecebp20, 0 Enter __try
- call foo
- call bar
- call blah
- mov DWORD PTR __SEHRecebp20, -1 Exit __try
- call seh_finally_funclet Invoke __finally
- mov ecx, DWORD PTR __SEHRecebp8 Begin
Epilogue - mov DWORD PTR fs0, ecx
- mov esp, ebp
21Generated code for x86 SEH /O1
- push 8
- push OFFSET __sehtableseh_finally
- call __SEH_prolog End Prologue
- call init
- and __SEHRecebp20, 0 Entry __try
- call foo
- call bar
- call blah
- or __SEHRecebp20, -1 Exit __try
- call seh_finally_funclet Invoke __finally
- call __SEH_epilog Begin Epilogue
- ret 0
- seh_finally_funclet
- call blah
- ret 0
22Generated code for x86 C /O2
- push -1
- push __ehhandler?cpp_dtor_at__at_YAXXZ
- mov eax, DWORD PTR fs0
- push eax
- mov DWORD PTR fs0, esp End Prologue
- push ecx allocate space for obj
- call init obj() inlined
- mov DWORD PTR __EHRecesp24, 0 Enter try
- call foo
- call bar
- call blah
- mov DWORD PTR __EHRecesp24, -1 Exit try
- call done obj() inlined
- mov ecx, DWORD PTR __EHRecesp16 Begin
Epilogue - mov DWORD PTR fs0, ecx
- add esp, 16
- ret 0
23Generated code for x86 C /O1
- mov eax, __ehhandler?cpp_dtor_at__at_YAXXZ
- call __EH_prolog End Prologue
- push ecx allocate space for obj
- call init obj() inlined
- and DWORD PTR __EHRecebp8, 0 Entry try
- call foo
- call bar
- call blah
- or DWORD PTR __EHRecebp8, -1 Exit try
- call done obj() inlined
- mov ecx, DWORD PTR __EHRecebp Begin Epilogue
- mov DWORD PTR fs0, ecx
- leave
- ret 0
24Generated code for x86 No EH (/O1 /O2 are
basically identical)
- push esi Save nonvolatile register for result
- call init
- call foo_err
- mov esi, eax Save return code
- test esi, esi Return code check
- jne SHORT fail
- call bar_err
- mov esi, eax Save return code
- test esi, esi Return code check
- jne SHORT fail
- call blah_err
- mov esi, eax Save return code
- fail
- call done
- mov eax, esi Return result
- pop esi
- ret 0
25Generated code for x64 SEH
- sub rsp, 40 End Prologue
- call init
- nop
- call foo First instruction of __try
- call bar
- call blah
- nop Last instruction of __try
- call done __finally invoked inline
- add rsp, 40 Begin Epilogue
- ret 0
26Generated code for x64 C EH
- sub rsp, 56 End Prologue
- mov QWORD PTR Trsp, -2 C setup
- call init
- nop
- call foo First instruction of try
- call bar
- call blah
- nop Last instruction of try
- add rsp, 56 Begin Epilogue
- jmp done obj() inlined tail called
27Generated code for x64 No EH
- push rbx Save nonvolatile register for result
- sub rsp, 32 End Prologue
- call init
- call foo_err
- mov ebx, eax Save return code
- test eax, eax Return code check
- jne SHORT fail
- call bar_err
- mov ebx, eax Save return code
- test eax, eax Return code check
- jne SHORT fail
- call blah_err
- mov ebx, eax Save return code
- fail
- call done
- mov eax, ebx Get return code
- add rsp, 32
- pop rbx Restore nonvolatile register
- ret 0
28Costs of handling an exception
- Disclaimer
- If you are really concerned about this, there is
a good chance youre abusing or misusing
exceptions. - Exceptions are not to deal with standard
scenarios! Performance of exceptions is
generally stacked in favor of the non-exceptional
case - Theres a reason the term is exception!
29Costs of handling an exceptionX86 Win32 SEH
C EH
- Without /SAFESEH (this is a big no-no potential
security hole) - O(n)
- n is the number of frames on the stack with a
protected region between throw catch - Walk a linked list of elements on fs0
- Invoke filters to determine handler
- C type check is just a special filter
- Walk the list again, invoking __finally funclets
destructors - Finally, jump to __except block or call catch
block - With /SAFESEH (this is good)
- O(n log(m))
- n is the number of frames on the stack with a
protected region between throw catch - m is the number of EH entry points in the entire
program - For SEH, only 1. For C EH, one for each
function! - Walk a linked list of elements of fs0
- For each element, verify the callback is in a
list O log(m) - Invoke the filter to determine the handler
- Walk the list again, invoking __finallys, with
callback verification O log(m)
30Costs of handling an exceptionx64 Win64 SEH
C EH
- O(n log(m))
- n is the number of functions on the stack between
throw catch (not just the number with EH code
in them!) - m is the number of distinct regions in the image
.pdata size - Not just a function count hot/cold sections and
register allocation regions can increase this
pretty dramatically (1-4x) - Walk each function frame on the stack O(n)
- Find its .pdata entry to get its unwind
information O(log(m)) - If it has a filter, call it to determine the
handler - Restore nonvolatile registers as described in the
unwind information - Once a handler has been determined
- Walk the stack again (using .pdata lookup)
- Each frame that has cleanup code, invoke the
finallys or destructors - Jump to handler (or call catch)
31Cost of handling an exceptionx86 WoW64 SEH
CEH
- There is some degree of thunking between the 64
bit kernel and 32 bit subsystem, so performance
really varies. - Worst case, its as slow as x64 on Win64.
- Best case its about the same as x86 on Win32.
- If you use exception handling in performance
sensitive areas of code, you may notice a
difference in your application - If you do notice a difference, this should be a
red flag regarding your use of exceptions.
32Final gotchas (non-standard C!)
- Some optimizations that are constrained inside of
a try result in observable differences, based on
program structure, compiler settings, and
compiler implementation ?.
- int g // add a volatile to fix the problem
- int p
- void func1()
- g 0
- __try
- g 1
- p 0
- g 2
- __except(1)
- printf("d\n", g)
-
- void update()
- g 1
- p 0
- g 2
-
- void func2()
- g 0
- __try
- update()
- __except(1)
- printf("d\n", g)
-
33Summary Conclusions
- Do not use exceptions for normal program flow.
- Exception handling does have a performance cost
- Not always measurable
- Cost really depends on usage
- Frequently similar to what correct code would be,
without EH - at least in VC8
- Do not use exceptions for normal program flow.
- C is cheaper than SEH for cleanup in VC8.
- Use common sense, and knowledge of your teams
strengths/weaknesses if youre mandating SEH/C
EH/No EH - New hires rarely know about SEH.
- Source level readability visibility of
performance - And finally, do not use exceptions for normal
program flow.
34More info
- If youre looking for detailed ABI docs for X64,
check my blog. - http//blogs.msdn.com/freik
- Herb Sutters got some good books on using
exceptions with C - He doesnt give me kick backs ?