Title: A Brave New Frontier:
1Presentation at AsiaSTAR2004, Canberra,
Australia, 7 Sep 2004
- A Brave New Frontier
- Testing Live Production Applications
Dr Kelvin Ross, Steve Woodyatt, Dr Steven
Butler SMART Testing Technologies Pty Ltd
2Roadmap
- Avoiding Production Problems
- Testing for Service Level Management
- Case Study
- Considerations Unique to Production Testing
- Information for SLM
- Implementation Choices
- Wrap-Up
3Why Test On Production
- Despite best efforts to test an application prior
to deployment there are still post-deployment
problems that frequently occur - Server offline
- No response
- Functions not available
- Incorrect response
- Slow response
- Security breach
- Data out-of-date
4The user experience
- What is it that the user will experience in
dealing with our application - E.g. Airline Reservation business process
- Search for flights
- Make a reservation
- Pay with credit card
- Obtain electronic ticket reservation code
- Confirmation by email with matching details
- Reservation details reported in frequent flyer
Information Flow
5Distributed Architecture
Airline
External Systems
Remote Prices
Payment
PaymentGateway
Firewall
Email Gateway
Email
6Roadmap
- Avoiding Production Problems
- Testing for Service Level Management
- Case Study
- Considerations Unique to Production Testing
- Information for SLM
- Implementation Choices
- Wrap-Up
7Service Level Management
- Service Level Management (SLM)
- set of people and systems that allows the
organisation to ensure that SLAs are being met
and that the necessary resources are being
provided efficiently - Service Level Agreement (SLA)
- contracts between service providers and
customers that define the services provided, the
metrics associated with these services,
acceptable and unacceptable service levels,
liabilities on the part of the service provider
and the customer, and actions to be taken in
specific circumstances
Definitions from IEC, Service Level Management
tutorial, www.iec.org
8SLM in the context of ITIL
9SLA KPIs
10Approaches
11Business Process Auditing (BPA)
Business Process Auditing
Performance Availability Security Functionality Co
rrectness Accuracy Completion
? Automated
? Real-time
Fault Diagnosis Remedy
Reporting
Alerting
Service Level Management
12Post-Deployment Testing and SLM
- Testing can be used to synthesise business
transactions - Interact with system through various interfaces
- Collect and report metrics
- Transfer of technology predominantly used
pre-deployment
13Problems Detected
- Problems detected
- End-To-End processes not available
- Responses slow
- Incorrect data
- Problems not detected
- Issues localised to individual clients
- Actual response times to all clients
14Who Owns Production Testing
- The testing group?
- The support group?
- The operations group?
- The application owners?
- Marketing?
- Marriage of skills and technology required for
efficiency
We dont call that testing syndrome
15Which applications most benefit
- Those with real time dependence for completion of
vital business processes - High risk dependence
- Financial
- Market reputation
- Probity, Accountability and Liability
- Potentially unreliable or difficult to manage
technology dependencies - increasingly complex linkages
- distributed application architectures
- history of failure, problems
- Risk assessment
- SEVERITY X LIKELIHOOD
16Roadmap
- Avoiding Production Problems
- Testing for Service Level Management
- Case Study
- Considerations Unique to Production Testing
- Information for SLM
- Implementation Choices
- Wrap-Up
17Objectives
- BPA Planning checklist
- What are the critical business processes
- Who are the users
- What is the user experience
- How can success be determined
- How can the test be automated
18Airline Reservation Case Study
- Critical business processes
- Search available flights
- Make online booking
- Change booking
- Cancel booking
- Etc.
- Users
- Consumers
- Travel agents
- Call centre
19Airline Reservation Case Study
- What is the user experience
- Search for flights
- Available
- Function accessible
- Response returned
- Correct
- Correct flights, source and destination, time,
etc. - Complete
- No missing flights with available seats
- Responsive
- With tolerable response times
- How can success be determined
- What is the source of truth
20Airline Reservation Case Study
- Choose what to monitor based on risk
- SEVERITY x LIKELIHOOD
- Previous operational reliability problems,
complex dynamic behaviour - What was previously tested and will continue to
function - Are there problems with distributed components
continuing to run appropriately, e.g. tuxedo
services, LDAP authentication, payment gateway
not accessible - Are there problems with timely propagation/retriev
al of data, e.g. flight data not retrieved
consistently, bookings not updated in timely
manner
21Test Frameworks
- Outcomes have to be reported at business level,
not application object level - Object level Too Low Level for Audience
- getURL search.jsp
- saveForm, submitflight
- setParam, submitflight, startime, 200412011100
-
- submitForm, submitflight
- Business level Appropriate for Audience
- searchFlight, return, 20041201110000, SYD,
- Action Word approaches recommended
- See Carl Nagle or Hans Buwaldas work
22Dynamic behaviour
- searchFlight, return, 200412011100, SYD,
- Wont remain useful for long as production data
is dynamic - Dynamic input data
- searchFlight
- Type return
- DepartTime today()_at_10am 1 month
- ReturnTime today()_at_10am 1 month 5 days
- Depart Sydney
- Arrive Melbourne
- May even want to randomise data
- Vary depart and arrive on successive runs
23The Test Oracle
- Mechanisms for determining correct response
- Get any response
- Get a response containing predefined expected
values - Expected values are checked using an oracle
- E.g. formula determining whether valid date
returned - Results are compared to reference data
- 3rd party data feed
- Trusted internal source, e.g. Mainframe
243rd Party Reference
- Trending against price data
25Airline Reservation Case Study
- Verification failures for searchFlight response
26Roadmap
- Avoiding Production Problems
- Testing for Service Level Management
- Case Study
- Considerations Unique to Production Testing
- Information for SLM
- Implementation Choices
- Wrap-Up
27Scheduling the test
- How often
- 1 minutes, 5 minutes, hourly, daily, weekly
- Depends on how quickly support can respond
- What business hours
- 24x7, 9 to 5, higher frequency at certain events
- What about scheduled outages
- Planned outages, public holidays
- Coordinating tests
- Locking to prevent simultaneous tests
- E.g. dont check prices or submit orders unless
logged in - Semaphores
28Sensitive Data
- Frequently there may be sensitive information
stored in scripts and test logs - Logins and passwords
- Credit card ids
- Personal details, e.g. phone numbers, ABNs, etc
- Where possible avoid
- Use dummy accounts
- Dont log sensitive information
- Can be difficult to control, eg. failure may save
screen shot that then displays credentials - Use encryption
- Sensitive data is stored in encrypted, but test
engine still required key to send - At least it is obfuscated
29Where tests should be run from
- Many tools allow tests to be run from multiple
locations - Simulate users of different geographies
- Different connection speeds to report on a
variety of user experiences - Inside/outside firewall
- Probably the largest concern
- Consumer users outside, Corporate users inside
- To provide end-to-end scenarios, may need
combination - Scenario initiated internally, and end results
are propagated to external, or vice-versa - External view of web may be verified using Test
Oracle data that is internal - Agents may be deployed internal and external to
run tests
30Problems to Avoid
- Need to be aware of impact of testing
- Performance hits
- Volatile features
- Intrusive tests
- Biased results
- Compliance restrictions
- Impact on Business KPIs
- Taking measurements may distort the system being
measured
31Minimising the Effect of Transactions
- Cost of Transaction
- Financial purchase flight may incur credit card
merchant fee - Resource seats unavailable until refund
provided, searching places additional load on
resource pool - Reversing the transaction
- Providing a refund, merchant fee may still apply
- What if the transaction is incomplete
- What happens if refund process doesnt
occur/complete - Compliance issues
- Corporate
- Legislative
32Managing the Test Impact
- Modifications to the application under test to
cleanup data or control test effects - Manual fallback may be convenient option
- Test Objects
- Dummy frequent flyer accounts
- Dummy cost centres
- Testing the tests
- Access to test environment pre-deployment
- Endurance test that can be part of application
test strategy - Transfer of load, stress and endurance test
scripts
33Roadmap
- Avoiding Production Problems
- Testing for Service Level Management
- Case Study
- Considerations Unique to Production Testing
- Information for SLM
- Implementation Choices
- Wrap-Up
34Effective Reporting
- Who are the users of the reports, different
expectations on presentation/content - Business/Application Manager
- Operations
- Development
- Support
- SLM
- How do they access reports?
- Web, email, Thick client
- Which reports are real-time or batched
- Is data summarised, or is original data accessible
35Historic Reporting
- Service level reports
- Trends
- Progress
- Post Mortem Analysis
36Realtime Reporting
- Alerts
- Current status
- Diagnosis
37Diagnosing root cause and remedies
- Accessing fault and failure data for multiple
components - Pinpoint failures
- Correlation is a skill
- manual, expert analysis required
- Variety of support
- Saved actual results
- Unattended collection for debugging
- Correlation with component performance analysis
- Automated correlation with component failure
modes - Sophisticated expert system
- Rules that correlate tested events to arrive at
diagnosis of root cause(s)
38Fault Analysis
39Roadmap
- Avoiding Production Problems
- Testing for Service Level Management
- Case Study
- Considerations Unique to Production Testing
- Information for SLM
- Implementation Choices
- Wrap-Up
40Tool Requirements
- Evaluation Checklist
- Test script can interact with a variety of
systems - GUI, Terminal, APIs, HTTP, SOAP, POP/SMTP, etc.
- Test script can respond to dynamic behaviour
- Agents can be deployed internal/external of the
WAN - Ability to control frequency
- Time based functions can be used to control
execution - Functions available for data manipulation for
dynamic responses (time, extraction, etc.) - Inter-process coordination between tests using
locking/semaphores - Test steps can be reported on business process
steps, object actions can be hidden in reports - Test outcomes saved to repository for later
analysis - Ability to export data for other purposes, e.g.
trending, visualisation, etc. - Reporting capability on stored data
- Online ability to drill into test data for
problem diagnosis - Alerting mechanisms to email, SMS, online
dashboards - Alerting can be controlled, ie. escalation,
filtering - Apply weighting to each criteria according to need
41Implementation Choices
- Available Commercial Tools/Services
- SmartTestTech - SMARTCat
- Mercury Topaz
- Compuware Vantage
- Keynote
- Lesser extent, enterprise monitoring tools
- BMC Patrol, Tivoli, HP Openview
- Home Brew Tools
- Extensive support for testing protocols in open
source frameworks - E.g. Java/Junit, .Net/Nunit, Perl/Ruby/Python
- Extend Existing In-house Regression Test Suites
- Automated scripts may be adapted
- Robot, QARun, WinRunner, Silk
- Post results to Database
- Provide reporting capability
- e.g. Crystal Reports, Cognos, etc
42Roadmap
- Avoiding Production Problems
- Testing for Service Level Management
- Case Study
- Considerations Unique to Production Testing
- Information for SLM
- Implementation Choices
- Wrap-Up
43Wrap-up
- Strong business case
- Benefit in bringing testing to the production
world - Small age availability increase translates to
large - Manages reputational risk with user base
- Large investment in SLM
- SLAs very ad-hoc and not measured
- Uses tests to provide SLM reports to Business /
Application Managers - Leveraging the investment in test resources
- Protects overall investment
44Questions Answers
- Contact details
- Dr Kelvin Ross
- SMART Testing Technologies Pty Ltd
- PO Box 131, West Burleigh, Q4219
- AUSTRALIA
- Ph 61 7 5522 5131
- Email kelvinr_at_smarttesttech.com