Title: .NET
1.NET Framework Performance Tips, Tricks And
Tools Peter Ty Developer Evangelist .NET and
Developer Group
2Outline
- Performance of .NET Framework for various
application types - ASP .NET Apps
- Windows.Forms Apps
- CLR drill down GC, JIT, Transitions..
- Useful Performance Tools/Pointers
- CLR, ASP .NET Performance Counters
- Profilers
3.NET Framework
Orchestration
.NET Enterprise Servers
Building Block Services
.NET Framework
Windows (CE, ME, 2000, and .NET)
4Performance Web Apps
- Porting and Measuring Real Customer Applications
- MSN, Dell, US West DSL ordering app, etc.
- Porting/measuring enterprise samples
- FMStocks, Duwamish, IBuySpy, etc.
- Measuring published benchmarks such as Doculabs
Nile - Benchmarking against technologies such as JSP
5Nile Web Application Benchmark
1 Compaq ProLiant 8500 550 Mghtz Application
Server 1.2 gig RAM 1 Compaq ProLiant 8500 8CPU
550 Mghtz Database Server 4 gig RAM
4900
Dynamic Pages Served per Second
8 CPU 4 CPU 2 CPU
4200
3500
2800
2100
1400
700
6Nile Web Application Benchmark
1 Compaq ProLiant 8500 550 Mghtz Application
Server 1.2 gig RAM 1 Compaq ProLiant 8500 8CPU
550 Mghtz Database Server 4 gig RAM
4900
Dynamic Pages Served per Second
8 CPU 4 CPU 2 CPU
4200
3500
2800
2100
1400
700
7Web Apps Why The Win?
- Early versus late bound (ASP) environment
- Compiled versus interpreted environment
- Output caching gives big wins in some scenarios
- Highly scaleable Framework and foundation
- CLR (GC, JIT, Threadpool, etc.)
- Efficient ASP.NET HTTP pipeline
- Super efficient data access layer ADO.NET
8Performance Best PracticesAll Managed Apps
9Minimize Throwing Exceptions
- Exception handling is fast in managed code
- Throwing exceptions degrades performance
- Perf counters tell you exactly how many your app
is throwing - Port Visual Basic 6 OnError/Goto -gt Try/Catch
10Use Early Binding
- Visual Basic and JScript support early and late
binding - Late binding can negate performance
- Late Binding requires work at runtime
- Late Binding (much slower)
- Early Binding (better performance)
Dim ds Ds New DataSet
Dim ds As New DataSet
11Make Chunky Not Chatty Calls
- P/Invoke
- Interop
- Intra-App Domain
- Remoting (x-proc, x-machine)
12Use Value Types
- Think OO structs on the stack
- Passed by value (default) and reference
- Use Value Types for small data
- .NET Frameworks primitive types are value types
(for perf) - Useful for items like Point (x,y coordinates)
13Strings
- Use the mutable StringBuilder object when
stitching together strings - Strings are immutable i.e. creates a new object
when using String.Concat - Dont do (in V1)
- foreach(Char c in str) // use c
- Instead do
- for (int i 0 i lt str.Length i)
- // stri
- Both of the above cases make a big difference in
large loops
14Performance Best PracticesASP .NET Apps
15Design For Caching
- Leverage the built-in ASP .NET caching features
- Output Caching
- Declarative lt_at_ OutputCache gt directive
- Vary by duration, param, header, custom etc.
- Fragment Caching
- Cache regions of a page by using user controls
- Cache API
- System.Web.Caching.Cache class
- Recommendation
- Specifically design your pages around these
features can lead to massive perf wins
16Nile Web Application Benchmark
1 Compaq ProLiant 8500 8CPU 550 Mghtz Application
Server 1.2 gig RAM 1 Compaq ProLiant 8500 8CPU
550 Mghtz Database Server 4 gig RAM
4900
Dynamic Pages Served per Second
4200
Output Cache off
Output Cache on
3500
2800
2100
1400
700
Major J2EE App ServerLinux 7.1 Major DB Vendor
ATL ServerWin 2KSQL Server
ASP.NET In Proc Win 2K SQL Server
17EnableSessionStatefalse
- Session state is enabled out of the box by
default - Is an overhead if you dont leverage it
- Recommendation
- Disable session state for all pages that dont
require/need session data - Set to readonly if you read but do not update
session state
18EnableViewStatefalse
- ASP .NET allows pages/controls to maintain state
across round trips - State stored within viewstate hidden field
- Disabled with enableviewstate attribute
- Some downsides
- Increases network payload
- Performance overhead to serialize this
- Recommendation
- Examine your usage of this feature
- Always disable if you are not doing postback
19Avoid Apartment Transition
- Managed code is free threaded
- Optimized for new .NET Components
- Very sub-optimal for apartment threaded
components (i.e., STA COM objects) - Recommendations
- Enable the lt_at_ AspCompattrue gt directive for
pages that utilize apartment COM objects in ASP
.NET - Dont new STA com objects in page constructors
- Windows Forms apps are marked STA
- If possible, upgrade apartment threaded
components to .NET (VB6-gtVB .NET) - Always generate early-bound managed wrappers for
COM components (avoid late bound hit) using
tlbimp
20Performance Best PracticesWinForms Apps
21Startup Working Set - Model
- Console.Writeline(Hello World)
- 2.5MB
- Winform Empty Form
- 4.7MB
- Winform Form with 100 controls
- 5.6MB
- Once you bring in System.Data
- 1MB
22Windows Forms Tips
- Use AddRange instead of Add
- All Windows Forms controls provide a method
called AddRange - AddRange takes a collection and adds it at once
- Instead of iteratively Adding each item use
AddRange e.g. treeView1.Nodes.AddRange(treeNodes
) - Minimize the of assemblies loaded
- If theres no need for data connection pooling
dont use it save 1MB
23Pre-JIT To Startup Faster
- A Pre-JITed Assembly is a persisted form of
JITed MSIL with class/v-table layout - Done at install time or on demand
- Reduces start-up time
- SDK contains several Pre-JITed assemblies
- mscorlib, Windows Forms and drawing, etc.
- Native code has version checks and reverts to
runtime JIT if they fail - ngen.exe
24Performance Best PracticeCLR (Common Language
Runtime) Drill down
25GC Overview
- State of the art Mark and Compact Garbage
Collector - Supports pinning and interior pointers
- Two GCs architectures provided
- Server multi-thread, highly-scaleable GC
- Client single-threaded, concurrent GC
- Self tuning
26GC Details
- Maintains multiple generations
- Based on object lifetime (G0-gtG1-gtG2)
- Cache conscious
- Gen 0 fits in L2 cache
- Low fragmentation, low memory overhead
- Keeps locality of reference intact
- Separate large block heap
27GC Perf
- Perf of GC allocation similar to Heap allocation
- Limited by PC Bus speed
- Typical Pause time (on a P200)
- lt 1ms for Gen 0
- lt 10ms for Gen 1
- Full GC varies with working set
- Infrequent
28Transitions (mgd/unmgd)
- Runtime makes it simple to call the OS
- Transitions happen when you call
- Unmanaged code from managed code
- Managed code from unmanaged code
- At a transition, the Runtime
- Performs requested data type marshaling
- Fixes calling convention
- Protects callee saved registers
- Switches thread mode so that GC wont block on
threads in unmanaged code - Erects an EH frame on calls in to managed code
for EH clean up
29Transition (Fixed Cost)
- Baseline cost
- PInvoke
- As low as 8 instructions if no marshaling
required - 31 if marshaling is required
- COM Interop
- About 65 instructions
- SuppressUnmanagedSecurity attribute is set in
these measurements
30Transition (Variable Cost)
- Incremental parameter marshaling cost
- Primitives are almost free
- Classes with explicit layout are almost free
- Data transformation (e.g., Unicode -gt Ansi,
Object-gt IUnknown, etc.) is expensive
31Just-In-Time Compilation
- All code runs as compiled native code
- No interpretation
- JIT performs standard optimizations
- Method Inlining
- Constant Folding
- Copy Propagation
- Common SubExpression Elimination
- Dead Code Elimination
- Enregistration of locals
- Instruction scheduling
- But, optimizations are limited in order to reduce
compile time
32Performance Best Practice Performance Tools
33Perf Counters Overview
- Perf Counters are metrics about resource usage
exposed by the .NET Framework - First line of defense
- Always available
- Can be obtained non-intrusively even in
production environment
34Runtime Perf Counters (I)
- Loader
- of Execution Time Loading
- AppDomains, Assemblies, Classes etc.
- Memory
- Time in GC, bytes/sec, heap sizes and various
metrics related to GC GenX - Interop
- managed to unmanaged transitions
- Metrics on marshalling, stubs etc.
- Contexts and Remoting
- Remote Calls/Sec, Total Remote Calls
- of Channels, Contexts, Proxies etc.
35Runtime Perf Counters (II)
- Threads and Locks
- of logical and physical threads
- Contention Rate/sec, Total of contentions
- JIT Compilation
- of time in JIT, of methods JITed, MSIL Bytes
JITed/sec, etc. - Others include Exceptions, Security and
Networking - All of the above are available through
System.Diagnostics.PerformanceCounter - Easy to create new perf counters
36ASP .NET Perf Counters
- Requests/Sec
- Request Bytes in Total
- Request Bytes out Total
- Requests Executing
- Requests Total
- Application Restarts
-
37Profiler Architecture
Console
Program under test
38Profiling API
- CLR provides a rich set of services to monitor
the health of your app - These services expose events as your .NET app
executes in CLR - Through two COM interfaces ICorProfilerCallback
and ICorProfilerInfo
39Resources
- Intels VTune
- http//msdn.microsoft.com/vstudio/partners/tools/i
ntel.asp - Numegas TrueTime
- http//www.compuware.com/products/devpartner/profi
ler/ - Rationals PurifyPlus
- http//www.rational.com/solutions/solutions_dotnet
/
40Summary
- .NET Framework is built for high performance and
scalability - With .NET Framework, your app inherits
high-performance - Easily leverage rich set of tools and techniques
to build performant apps - It is just easier to be faster on the .NET
Framework