Title: Customizing Middleware to Improve Performance and Footprint
1Customizing Middleware to Improve Performance
and Footprint
Arvind S. Krishna arvindk_at_dre.vanderbilt.edu
Institute for Software Integrated Systems
Vanderbilt University Nashville, Tennessee
2Motivation (1/2)
- Where are we right now?
- Maturation of Distributed Object Computing
Middleware (DOC) - ACETAO middleware
- Open-source implementation of CORBA and Real-time
CORBA - Highly optimized implementation implementing
almost all features of CORBA - From Stovepiped to reusable architectures
Functionality factored in middleware
- Product Line Architectures
- Set of Systems that share common core features
- Families of systems then built using core
features - Reduce time to market pressures, cost
productivity etc - Example Boeing Bold Stroke Architecture
Product line architectures minimize cost for
building variants
3Motivation (2/2)
- Model Driven Development Paradigm (MDD)
- Reduces costs of building new families of
systems - Compose different systems at modeling level
- Model Check for correctness
- Code-generators synthesize artificats XML
deployment information, configuration
information, benchmarking code..
Models capture System properties structure and
behavior
- Middleware for Product-Lines
- Still general purpose layered
- Enables different variants to be hosted by
different configurations - However not optimized for each variant
Information propagation
What we need? Optimizations that customize
middleware based on system invariants
4Customizing Middleware via Partial Evaluation
- Partial Evaluation
- Technique of automatically specializing programs
based on ahead of time known parameters - Two level mechanism
- First level annotating information
- Second level involves synthesizing code
- Templates and Template meta-programming
- Research will examine
- Techniques used in programming languages can be
used in middleware - Move from a general purpose to a more specialized
architecture
Optimized Implementation Stack
General Purpose Layered Architecture
Optimize the known knowns leave known uknowns
to the middleware and use exceptions for unknown
unknowns
5Existing Middleware Optimizations
- Footprint Reduction Optimization
- Micro ORB Architecture ? Virtual Component
Pattern - Micro POA Architecture ? Pluggable components
- Request Demux/Dispatch Optimizations
- Connection Management ? Acceptor-Connector
pattern, Reactor - Buffer Management Strategies
- Request Demultiplexing ? Active Demultiplexing
Perfect Hashing
- Arent these optimizations enough?
- Have worked really well for different
applications in domains - General purpose middleware is still layered
- Techniques that will fold layers (code and
run-time checks) to improve performance - Will add more to the general purpose optimizations
6Capturing System Invariants in Models (1/2)
- Example System
- Basic Simple (BasicSP) three component
Distributed Real-time Embedded (DRE) application
scenario - Timer Component triggers periodic refresh rates
- GPS Component generates periodic position
updates - Airframe Component processes input from the GPS
component and feeds to Navigation display - Navigation Display displays GPS position
updates
- Hypothesis ? Solution Approach
- Use early binding parameters to tailor middleware
- Techniques applied could range from
- Conditional Compilation
- Optimize/Stub skeleton generation
- Strategy pattern to handle alternatives
- Program Specialization Invariants
- Must hold for all specializations
- output(porig) output (pspl)
- speed (pspl) gt speed(porig)
Boeing Product line scenario Representative DRE
application rate based
ACE_wrappers/TAO/CIAO/DaNCE/examples/BasicSP
CoSMIC/examples/BasicSP
7Capturing System Invariants in Models (1/2)
Component Deployment
Component Interactions
Same Endianess
Periodic Timer
Single method interfaces
Collocated Components
- Mapping Ahead of Time (AOT) System Properties to
Specializations - Periodicity ? Pre-create marshaled Request
- Single Interface Operations ? Pre-fetch POA,
Servant, Skeleton servicing request - Same Endianess ? Avoid de-marshaling (byte order
swapping) - Collocated Components ? Specialize for target
location (remove remoting) - Same operation invoked ? Cache CORBA Request
header/update arguments only
8Specializations Implemented in TAO
- Client Side Specialization
- Request Header Caching
- Pre-creating Requests
- Marshaling checks
- Target Location
- Server Side Specialization
- Specialize Request Processing
- Avoid Demarshaling checks
- Cumulative Effect
- More than additive increase of adding
specializations - For example
- Client side request caching
- Server side specialize request processing
- 11 3?
9Specialize for Target Location (1/2)
Intent Specialize a path based on knowledge that
objects are collocated
- Model Invariants
- All communication between GPS, Airframe and
Display components are collocated - All Invocations are local
- Do not need remoting code (Connection code not
required)
- Transformations to TAO (foot-print)
- Eliminate Connection handling code
- Connection Strategies, Flushing Strategies
- Eliminate Invocation classes
- Remote Invocation classes
- One way and two way invocation classes
- Transformations to TAO (performance)
- Eliminate Remoting Checks
- Object Proxy checks for remoting
- Invocation Adapter checks for remoting for each
invocation - Checks for one-way or two-way invocation
10Specialize for Target Location (2/2)
- TAO Implementation Automation
- All implementations present in branch
TAO_PE_Collocation - Specialization implemented by Conditional
compilation technique (TAO_HAS_COLLOCATION) flag
to remove remoting - Profiled optimistic case of absolute no remoting
(i.e. no code to handle requests and replies)
- Configuration
- 2.4.21-27.0.1.ELsmp 1 SMP Redhat kernel
- Athlon dual processor 2 GHz processor
- 1 GB RAM and 256 KB cache for each processor
- Test run TAOs performance-tests/Latency/Collocati
on
Optimization Performance Improvements CORBA Compliance Automation
Code subsetting removed connection related code Performance elimination of remoting checks libTAO 6 (100 kB of reduction) Application 15 Improved by 10 (over and above Thru_POA) collocation Compliant with CORBA specification Realized by macros Invocation classes can be separated out as libraries
11Specialize CORBA Request Header (1/4)
Intent Avoid the considerable overhead of
creating new CORBA requests and replies for each
of a series of request calls
- Model Invariants
- Timer Component periodically sends same event
- Operations to retrieve data from the models are
also the same.
- Update Rather than Create
- Do not create new Request each time
- Use old request and re-use the Request Header
- Various levels of re-use possible
- Reuse only Request Header
- Reuse both Request Header Message Specific
Header - Reuse entire request
This approach similar to TCP header prediction
12Specialize Request Header (2/4)
- Request Header Caching
- First level specialization Cache only the
Request Header Part - Everything else in the request is variable
- Avoid marshaling de-marshaling costs for the
header part alone - Implemented at client side
- TAO Implementation
- First request creates the entire request (code
flow same as normal path) - Cache header information (marshaled)
- Update only the total size and ID after request
creation on subsequent messages - Implemented via conditional compilation
Optimization Performance Improvements CORBA Compliance Automation
Cache GIOP Request Header part Roundtrip throughput improved by 50-100 calls/sec Compliant with CORBA specification Realized by macros Not much gain by doing this
13Specialize CORBA Request Header (3/4)
- TAO Implementation
- Move buffer pointer to start of data segment
- Write out the arguments for the call
- Update the total size of the request (SIZE) and
REQUEST_ID fields in the request
- Message Specific Header Caching
- Cache both Request Header and Message Specific
Header - Object Key is the same
- Service context information (same)
- Operation name same e.g., get_data
Server side ? Only when Thread per connection
used GIOP Formats ? Only for GIOP 1.2 as 1.0 and
1.1 service contexts are written first
Optimization Performance Improvements CORBA Compliance Automation
Cache Request Header Request Message Roundtrip throughput improved by 300 350 calls/sec ( 5 ) Latency 3 µsecs ( 5) Compliant with CORBA specification (service contexts) Realizable by using policies at object level at client side
14Specialize CORBA Request Header (4/4)
- Intent
- Instead of caching only the header (Request
Message specific) pre-create entire CORBA request
- Model Invariants
- Timer component sends trigger (heart beats) to
recipient component. Similar situation for
timeouts - Request and data contents are the same
- Proposed TAO implementation
- Special IDL flag that will pre-create (marshal
the request) - Each time same request is sent to the client
- Update request ID of the request only
- Save cost of request construction and marshaling
Optimization Performance Improvements CORBA Compliance Automation
Entire CORBA Request Avoids marshaling data completely Can eliminate multiple layers by directly sending request Not Compliant with spec IDL compiler can pre-create and generate entire request
15Specialized Request Processing (1/2)
- Intent
- Resolve the mapping of incoming requests to the
POA, Servant, Skeleton, and operation to which
they are dispatched only once, then use these pre
computed results to optimize the dispatch of
subsequent requests
- Model Invariants
- get_data operation invokes operation on the same
component, located in the same POA serviced by
the same servant and operation
- Once Per Connection Resolution of Dispatch
- TAO provides Active Demultiplexing Perfect
Hashing for O(1) lookup time bound - Caching just POA may not give a lot of
performance improvement
16Specialized Request Processing (2/2)
- TAO Implementation
- As the operation names are the same We directly
cache the skeleton and advance the current buffer
pointer to beginning of arguments - The length is calculated only for the first
request and re-used. Cost amortized over number
of operations - Implemented via TAO_CACHE_SERVANT_REF conditional
compilation macro - TAO_ROOT/performance-tests/Latency/Single-Threade
d
Optimization Performance Improvements CORBA Compliance Automation
Cache skeleton directly Round-trip latency 6µsecs (5) Throughput 300 calls/sec ( 5) Caching Skeletons not compliant Cannot be used in Default Servant and Servant Locator classes Provide policies at POA (now that it is refactored) to implement this layer folding Implemented as separate IIOPConnection handler class
This is similar to Direct Collocation
optimization for a collocated request
17Specialize Marshaling/De-marshaling
- Intent
- To mask endianess GIOP Request header contains a
flag that indicates endianess of the request - If different endianess, do byte swapping
- Model Invariants
- The two machines on which the components are
hosted have the same endianess (byte order) No
checks for byte order required - ACE Implementation
- ACE_CDR streams provide for ACE_SWAP_ON_WRITE and
ACE_DISABLE_SWAP_ON_READ macros that can be used
to eliminate checks for byte-ordering - Macros and not set by default. Model interpreters
could generate configuration setting to enable
these macros
Optimization Performance Improvements CORBA Compliance Automation
Demarshaling check elimination Will improve more than 10 if conditions for a normal CORBA request Improvements in both client and server side Used in conjunction with header caching optimizations Compliant with CORBA specification Conditional compilation techniques
18Concluding Remarks Future Work
- Specialization techniques can be used as a
technique for folding layers based on system
invariants - Current implementation first cut uses
conditional compilation strategies. Examine more
appropriate strategies for implementing these
specialization - Request Header Caching Strategies controlled by
svc.conf - Specialize Request Processing POA request
processing policy - Marshaling/de-marshaling ACE level
- Pre-create request IDL Generated code
- Collocation specialization Macros Strategies
(Invocation classes)
Examine specialization at the Component
Middleware level and Infrastructural Middleware
level