Customizing Middleware to Improve Performance and Footprint presentation

About This Presentation

Transcript and Presenter's Notes

Title: Customizing Middleware to Improve Performance and Footprint

1
Customizing Middleware to Improve Performance
and Footprint
Arvind S. Krishna arvindk_at_dre.vanderbilt.edu
Institute for Software Integrated Systems
Vanderbilt University Nashville, Tennessee
2
Motivation (1/2)

Where are we right now?
Maturation of Distributed Object Computing
Middleware (DOC)
ACETAO middleware
Open-source implementation of CORBA and Real-time
CORBA
Highly optimized implementation implementing
almost all features of CORBA
From Stovepiped to reusable architectures

Functionality factored in middleware

Product Line Architectures
Set of Systems that share common core features
Families of systems then built using core
features
Reduce time to market pressures, cost
productivity etc
Example Boeing Bold Stroke Architecture

Product line architectures minimize cost for
building variants
3
Motivation (2/2)

Model Driven Development Paradigm (MDD)
Reduces costs of building new families of
systems
Compose different systems at modeling level
Model Check for correctness
Code-generators synthesize artificats XML
deployment information, configuration
information, benchmarking code..

Models capture System properties structure and
behavior

Middleware for Product-Lines
Still general purpose layered
Enables different variants to be hosted by
different configurations
However not optimized for each variant

Information propagation
What we need? Optimizations that customize
middleware based on system invariants
4
Customizing Middleware via Partial Evaluation

Partial Evaluation
Technique of automatically specializing programs
based on ahead of time known parameters
Two level mechanism
First level annotating information
Second level involves synthesizing code
Templates and Template meta-programming

Research will examine
Techniques used in programming languages can be
used in middleware
Move from a general purpose to a more specialized
architecture

Optimized Implementation Stack
General Purpose Layered Architecture
Optimize the known knowns leave known uknowns
to the middleware and use exceptions for unknown
unknowns
5
Existing Middleware Optimizations

Footprint Reduction Optimization
Micro ORB Architecture ? Virtual Component
Pattern
Micro POA Architecture ? Pluggable components
Request Demux/Dispatch Optimizations
Connection Management ? Acceptor-Connector
pattern, Reactor
Buffer Management Strategies
Request Demultiplexing ? Active Demultiplexing
Perfect Hashing

Arent these optimizations enough?
Have worked really well for different
applications in domains
General purpose middleware is still layered
Techniques that will fold layers (code and
run-time checks) to improve performance
Will add more to the general purpose optimizations

6
Capturing System Invariants in Models (1/2)

Example System
Basic Simple (BasicSP) three component
Distributed Real-time Embedded (DRE) application
scenario
Timer Component triggers periodic refresh rates
GPS Component generates periodic position
updates
Airframe Component processes input from the GPS
component and feeds to Navigation display
Navigation Display displays GPS position
updates

Hypothesis ? Solution Approach
Use early binding parameters to tailor middleware
Techniques applied could range from
Conditional Compilation
Optimize/Stub skeleton generation
Strategy pattern to handle alternatives

Program Specialization Invariants
Must hold for all specializations
output(porig) output (pspl)
speed (pspl) gt speed(porig)

Boeing Product line scenario Representative DRE
application rate based
ACE_wrappers/TAO/CIAO/DaNCE/examples/BasicSP
CoSMIC/examples/BasicSP
7
Capturing System Invariants in Models (1/2)
Component Deployment
Component Interactions
Same Endianess
Periodic Timer
Single method interfaces
Collocated Components

Mapping Ahead of Time (AOT) System Properties to
Specializations
Periodicity ? Pre-create marshaled Request
Single Interface Operations ? Pre-fetch POA,
Servant, Skeleton servicing request
Same Endianess ? Avoid de-marshaling (byte order
swapping)
Collocated Components ? Specialize for target
location (remove remoting)
Same operation invoked ? Cache CORBA Request
header/update arguments only

8
Specializations Implemented in TAO

Client Side Specialization
Request Header Caching
Pre-creating Requests
Marshaling checks
Target Location

Server Side Specialization
Specialize Request Processing
Avoid Demarshaling checks

Cumulative Effect
More than additive increase of adding
specializations
For example
Client side request caching
Server side specialize request processing
11 3?

9
Specialize for Target Location (1/2)
Intent Specialize a path based on knowledge that
objects are collocated

Model Invariants
All communication between GPS, Airframe and
Display components are collocated
All Invocations are local
Do not need remoting code (Connection code not
required)

Transformations to TAO (foot-print)
Eliminate Connection handling code
Connection Strategies, Flushing Strategies
Eliminate Invocation classes
Remote Invocation classes
One way and two way invocation classes

Transformations to TAO (performance)
Eliminate Remoting Checks
Object Proxy checks for remoting
Invocation Adapter checks for remoting for each
invocation
Checks for one-way or two-way invocation

10
Specialize for Target Location (2/2)

TAO Implementation Automation
All implementations present in branch
TAO_PE_Collocation
Specialization implemented by Conditional
compilation technique (TAO_HAS_COLLOCATION) flag
to remove remoting
Profiled optimistic case of absolute no remoting
(i.e. no code to handle requests and replies)

Configuration
2.4.21-27.0.1.ELsmp 1 SMP Redhat kernel
Athlon dual processor 2 GHz processor
1 GB RAM and 256 KB cache for each processor
Test run TAOs performance-tests/Latency/Collocati
on

Optimization Performance Improvements CORBA Compliance Automation
Code subsetting removed connection related code Performance elimination of remoting checks libTAO 6 (100 kB of reduction) Application 15 Improved by 10 (over and above Thru_POA) collocation Compliant with CORBA specification Realized by macros Invocation classes can be separated out as libraries
11
Specialize CORBA Request Header (1/4)
Intent Avoid the considerable overhead of
creating new CORBA requests and replies for each
of a series of request calls

Model Invariants
Timer Component periodically sends same event
Operations to retrieve data from the models are
also the same.

Update Rather than Create
Do not create new Request each time
Use old request and re-use the Request Header
Various levels of re-use possible
Reuse only Request Header
Reuse both Request Header Message Specific
Header
Reuse entire request

This approach similar to TCP header prediction
12
Specialize Request Header (2/4)

Request Header Caching
First level specialization Cache only the
Request Header Part
Everything else in the request is variable
Avoid marshaling de-marshaling costs for the
header part alone
Implemented at client side

TAO Implementation
First request creates the entire request (code
flow same as normal path)
Cache header information (marshaled)
Update only the total size and ID after request
creation on subsequent messages
Implemented via conditional compilation

Optimization Performance Improvements CORBA Compliance Automation
Cache GIOP Request Header part Roundtrip throughput improved by 50-100 calls/sec Compliant with CORBA specification Realized by macros Not much gain by doing this
13
Specialize CORBA Request Header (3/4)

TAO Implementation
Move buffer pointer to start of data segment
Write out the arguments for the call
Update the total size of the request (SIZE) and
REQUEST_ID fields in the request

Message Specific Header Caching
Cache both Request Header and Message Specific
Header
Object Key is the same
Service context information (same)
Operation name same e.g., get_data

Server side ? Only when Thread per connection
used GIOP Formats ? Only for GIOP 1.2 as 1.0 and
1.1 service contexts are written first
Optimization Performance Improvements CORBA Compliance Automation
Cache Request Header Request Message Roundtrip throughput improved by 300 350 calls/sec ( 5 ) Latency 3 µsecs ( 5) Compliant with CORBA specification (service contexts) Realizable by using policies at object level at client side
14
Specialize CORBA Request Header (4/4)

Intent
Instead of caching only the header (Request
Message specific) pre-create entire CORBA request

Model Invariants
Timer component sends trigger (heart beats) to
recipient component. Similar situation for
timeouts
Request and data contents are the same

Proposed TAO implementation
Special IDL flag that will pre-create (marshal
the request)
Each time same request is sent to the client
Update request ID of the request only
Save cost of request construction and marshaling

Optimization Performance Improvements CORBA Compliance Automation
Entire CORBA Request Avoids marshaling data completely Can eliminate multiple layers by directly sending request Not Compliant with spec IDL compiler can pre-create and generate entire request
15
Specialized Request Processing (1/2)

Intent
Resolve the mapping of incoming requests to the
POA, Servant, Skeleton, and operation to which
they are dispatched only once, then use these pre
computed results to optimize the dispatch of
subsequent requests

Model Invariants
get_data operation invokes operation on the same
component, located in the same POA serviced by
the same servant and operation

Once Per Connection Resolution of Dispatch
TAO provides Active Demultiplexing Perfect
Hashing for O(1) lookup time bound
Caching just POA may not give a lot of
performance improvement

16
Specialized Request Processing (2/2)

TAO Implementation
As the operation names are the same We directly
cache the skeleton and advance the current buffer
pointer to beginning of arguments
The length is calculated only for the first
request and re-used. Cost amortized over number
of operations
Implemented via TAO_CACHE_SERVANT_REF conditional
compilation macro
TAO_ROOT/performance-tests/Latency/Single-Threade
d

Optimization Performance Improvements CORBA Compliance Automation
Cache skeleton directly Round-trip latency 6µsecs (5) Throughput 300 calls/sec ( 5) Caching Skeletons not compliant Cannot be used in Default Servant and Servant Locator classes Provide policies at POA (now that it is refactored) to implement this layer folding Implemented as separate IIOPConnection handler class
This is similar to Direct Collocation
optimization for a collocated request
17
Specialize Marshaling/De-marshaling

Intent
To mask endianess GIOP Request header contains a
flag that indicates endianess of the request
If different endianess, do byte swapping
Model Invariants
The two machines on which the components are
hosted have the same endianess (byte order) No
checks for byte order required
ACE Implementation
ACE_CDR streams provide for ACE_SWAP_ON_WRITE and
ACE_DISABLE_SWAP_ON_READ macros that can be used
to eliminate checks for byte-ordering
Macros and not set by default. Model interpreters
could generate configuration setting to enable
these macros

Optimization Performance Improvements CORBA Compliance Automation
Demarshaling check elimination Will improve more than 10 if conditions for a normal CORBA request Improvements in both client and server side Used in conjunction with header caching optimizations Compliant with CORBA specification Conditional compilation techniques
18
Concluding Remarks Future Work

Specialization techniques can be used as a
technique for folding layers based on system
invariants
Current implementation first cut uses
conditional compilation strategies. Examine more
appropriate strategies for implementing these
specialization
Request Header Caching Strategies controlled by
svc.conf
Specialize Request Processing POA request
processing policy
Marshaling/de-marshaling ACE level
Pre-create request IDL Generated code
Collocation specialization Macros Strategies
(Invocation classes)

Examine specialization at the Component
Middleware level and Infrastructural Middleware
level

Write a Comment

User Comments (0)

About PowerShow.com

Customizing Middleware to Improve Performance and Footprint PowerPoint PPT Presentation