Title: Software Performance Engineering Failure Modes and Effects Analysis
1Software Performance Engineering Failure Modes
and Effects Analysis
- Presented by Kevin Mobley
2SPE FMEA Agenda
- Definition
- What is the problem solved with a SPE FMEA
- SPE FMEA anatomy
- Risk anti-pattern assessment
- Likelihood of occurrence frequency analysis
- Voice of the customer willingness to wait
- Detection control plan
- SPE FMEA life cycle with examples
- SPE FMEA perquisites and tools
3Modeling
- Problem/Goal/Scope
- Problem In order to simulate the day to day and
peak day operations of an application, an
accurate identification of the most critical
business process is required - Goal Define the top 20 business processes that
create 80 of the server requests, as well as
outlier business processes that are severe
performance risks - In Scope Definition of business processes,
Anti-Pattern analysis, business frequency,
willingness to wait and detection review
- Resource Plan
- Business Analyst
- Development Architect and Leads
- Performance Architects
- Business Case
- Ensures performance engineering focuses on the
most critical user activity that will impact the
application and system performance - Sources of Financial Benefits Ensures investment
in performance delivers the operational readiness
of application - First Year Annualized Benefits Establishes a
traceable and defensible methodology of how and
why business processes were and were not included
in the performance engineering effort. The
confidence in this BP set will be tested during
the first year of production
- Milestones
- Gather the business processes used currently by
client(s) - Rank anti-patterns for each business process
- Analyze historical business process usage
- Complete willingness to wait with client
(implementations only) - Complete detection section
- Review business process selection with
stakeholders - Sign off on business processes
4SPE FMEA is a structured approach
- Identifies ways in which software can fail to
meet critical performance requirements (response
time, CPU and network utilization, etc.) - Estimates the risk of an unanticipated
performance failure - Evaluates the current control plan for
identifying and/or preventing these performance
failures from occurring - Prioritizes actions that should be taken to
improve the software
5SPE FMEA Anatomy
- Risk Anti-Pattern Assessment
- Likelihood of Occurrence Frequency Analysis
- Voice of the Customer Willingness to Wait
- Detection Control Plan
Business process Description Key risks Risk Frequency Willingness to wait Detection Phase 1 Rank Priority Number (RPN)
6SPE FMEA Risk Section
TP Anti-Patterns TP Anti-Patterns TP Anti-Patterns TP Anti-Patterns TP Anti-Patterns
Customization to Core Product Parse Cycles of Returned Data Used RT - Browser to Web Server DB Inserts
1 - used out of box 1 - no parse cycles 1 - 100 1 - 1 or less round trips 1 - 1 or less inserts
2 - minor 3 - one parse cycle 2 - 90 2 - 2 round trips 2 - 2 inserts
5 - modest 10 - 2 or more parse cycles 3 - 80 5 - 3 round trips 3 - 3 inserts
9 - substantial 5 - 70 8 - 4 round trips 4 - 4 inserts
10 - new functionality 7 - 60 10 - 5 or more round trips 5 - 5 inserts
8 - 50 6 - 6 inserts
9 - 40 7 - 7 inserts
10 - 30 or less 8 - 8 inserts
9 - 9 inserts
10 - 10 or more inserts
7SPE FMEA Risk Section
TP Anti-Patterns TP Anti-Patterns TP Anti-Patterns TP Anti-Patterns TP Anti-Patterns
Content/Message Size Sort Tier Debug Configuration RT ES to DB DB Reads
1 - 95 - 100 1 - no sort 1 - full debug admin 1 - 5 or less round trips 1 - 1 or less reads
2 - 90 - 95 2 - client sort 3 - limited debug admin 3 - 10 round trips 2 - up to 10 reads
3 - 85 - 90 3 - database sort 8 - no debug admin 7 - 15 round trips 3 - up to 20 reads
4 - 80 - 85 6 - application layer sort 10 - no debug data 10 - 20 or more round trips 4 - up to 30 reads
5 - 70 - 80 10 - web server sort 5 - up to 50 reads
6 - 60 - 70 7 - up to 100 reads
7 - 50 - 60 10 - up to 500 reads
8 - 40 - 50
9 - 30 - 40
10 - 30 or less
8SPE FMEA Risk Section
TP Anti-Patterns TP Anti-Patterns TP Anti-Patterns TP Anti-Patterns
Message Size Cache Hit Ratio RT ES to Host DB Updates
1 - small -- lt 5K 1 - no cache used 1 - 3 or less round trips 1 - 1 or less updates
3 - average -- lt 20k 1 - 100 5 - 5 or less round trips 2 - 2 updates
5 - high -- gt 20k and lt 50K 2 - 90 10 - 6 or more round trips 3 - 3 updates
8 - very high -- gt 50k and lt 100K 3 - 80 4 - 4 updates
10 - extreme -- gt 100k 4 - 70 5 - 5 updates
5 - 60 6 - 6 updates
6 - 50 7 - 7 updates
7 - 40 8 - 8 updates
8 - 30 9 - 9 updates
9 - 20 10 - 10 updates
10 - 10 or less
9SPE FMEA Risk Section
TP Anti-Patterns TP Anti-Patterns TP Anti-Patterns
Bandwidth Impacts XSL Transformation RT WS to ES
1 - 0 empty XML tags 1 - 0 transformations 1 - 5 or less round trips
2 - 5 empty XML tags 3 - 1 transformation 3 - 10 round trips
5 - 10 Empty XML tags 10 - 2 or more transformations 7 - 15 round trips
8 - 15 empty XML tags 10 - 20 or more round trips
10 -20 or greater empty XML tags
10Detection with a Zero Wait
Detection Criteria Likelihood the existence of a defect will be detected by a "test content before software advances to next life cycle phase
1 - Fully covered by previous SPE SPE has already analyzed the business process in its current software and implementation state.
7 - Increased usage Business process will be used at least 20 more than in previous SPE analysis
8 - Infrastructure differs from previous SPE Effort SPE has analyzed the business process with different infrastructure
10 - SPE has never analyzed business process SPE has never analyzed business process
11Detection with a Two Month Wait
Detection Criteria Likelihood the existence of a defect will be detected by a "test content before software advances to next life cycle phase
1 - Dashboard Covers Dashboard as is will detect and report performance problem, and monitoring team knows how to detect problem
5 - Dashboard Admin Training Update Required Dashboard as is will detect and report performance problem, however monitoring team requires detection training
6 - Dashboard Admin Change Required Dashboard with administration updates will detect and report performance problem. Note, monitoring team will require detection training
12Detection with a Six Month Wait
Detection Criteria Likelihood the existence of a defect will be detected by a "test content before software advances to next life cycle phase
1 - Dashboard Covers Dashboard as is will detect and report performance problem, and monitoring team knows how to detect problem
2 - Dashboard Admin Training Update Required Dashboard as is will detect and report performance problem, however monitoring team requires detection training
3 - Dashboard Admin Change Required Dashboard with administration updates will detect and report performance problem. Note, monitoring team will require detection training
5 - Dashboard Code Modification Required Dashboard with customization will detect and report performance problem. Note, monitoring team will require detection training.
13SPE FMEA Life Cycle
- SPE FMEA during the design phase
- SPE FMEA during software development
- SPE FMEA during performance testing and
optimization - SPE FMEA during production
14SPE FMEA During Design
- Guessing is okay as long as the process is
structured and consistent - Review code and interview application developers
to complete risk section - Survey clients and product management for
frequency data - Advantages
- Huge opportunity for redesign with smaller
software budget impact - Localize the concept of performance into
architecture design and coding decisions - Establish early how the non-functional
requirements will be assessed - Challenges
- Stakeholders have lower confidence in SPE FMEA
- High resistance by developers because most have
never thought of the Anti-patterns during design - Non-functional requirements are not standard in
software development
15Risk Example
- Message size is 24 kb Risk Rank is 5
- Content to Message Ratio is 11.2 -- Risk Rank is
10 - A snippet of XML Message
- Â ltInterestDueFromClosingAmountgt0.00lt/InterestDueF
romClosingAmountgt - Â ltDailySimpleInterestOverdueInterestAmountgt0.00lt/
DailySimpleInterestOverdueInterestAmountgt - Â ltPiggybackPrincipalBalancegt0.00lt/PiggybackPrinci
palBalancegt - Â ltBuydownSubsidyRemainingBalancegt0.00lt/BuydownSub
sidyRemainingBalancegt - Â ltAccruedLateChargeBalancegt0.00lt/AccruedLateCharg
eBalancegt - Â ltRuleOf78sUnearnedInterestUnpaidBalancegt0.00lt/Ru
leOf78sUnearnedInterestUnpaidBalancegt - Â ltRuleOf78sOriginalUnearnedInterestDueBalancegt0.0
0lt/RuleOf78sOriginalUnearnedInterestDueBalancegt
16SPE FMEA During Development
- Measure data values for anti-patterns
- Make code changes
- Each developer delivers an Anti-Pattern spec
sheets with checked in code - Advantages
- Greatest breath of analysis of software code
- Product SPE FMEA is more comprehensive with each
release - Challenges
- Architecture changes are harder
- Impact to software budget increases
17Risk Example
- We wrote a bridge parse tool
- Message Name      SMValidateUser (REQUEST)
- Total Tags        5
- Total Empty Tags  0
- Content/Message Size Ratio 15.13
- Empty Tag/Total Tag Ratio    0.00
18SPE FMEA During Performance Testing and
Optimization
- Demonstrate correlation between Anti-patterns and
performance metrics - Key driver for focus of the testing and
optimization effort - Advantages
- High risk areas are focus of performance testing
and optimization, assuring the performance of
most important business processes - Creates a paper trail of why certain business
process were focused on and others were left out - Forces a review of performance monitoring
solution to determine if adequate and timely
detection of poor performance with second and
third tier business processes is in place - Statistical analysis between performance and
anti-patterns is available - Challenges
- Early generations of an application SPE FMEA will
be less accurate, causing some business processes
to be improperly categorized - Anti-pattern weights may require adjusting
19Risk Example
- Relationship between response size and response
time - One-way ANOVA Time versus Response Size
- S 2243 R-Sq 38.24 R-Sq(adj) 36.96
Source DF SS MS F P
Response Size 10 1498158538 149815854 29.79 0
Error 481 2419190490 5029502
Total 491 3917349029
20SPE FMEA Perquisites and Tools
- Develop application anti-patterns
- Put system under load
- Create a cause and effect model
- Assign initial risk values
- Collect and maintain usage data from clients
- Create parse tools to use during unit and
performance testing
21Thank You
- Kevin Mobley
- kevin.mobley_at_gmail.com
- kevin.mobley_at_fnf.com