Title: Managing your Blackboard
1Managing your Blackboard System for Growth and
Performance
- Presented By Steve Feldman
April 13, 2005
2Welcome
- Session Objectives
- Introduction to Capacity Planning
- Introduction to Performance Management
- Handling Performance and Capacity Issues
- Introduction to Load Testing
- Innovation
- Methodology for Resolving Issues
- Results/Outcomes
- Awareness of what you are doing well or not doing
at all.
3Introduction About Your Presenter
- What do I do at Blackboard?
- Director, Software Performance Engineering and
Architecture - Part of Product Development, but interface with
every department in Blackboard. - Manage the Software Performance Engineering (SPE)
Process as part of the development lifecycle. - A few key points
- Been at Blackboard since the Fall of 2003.
- Worked on AP2, AP3 and R7.0
- Manage a team of several developer/engineers.
- Practicing Member of CMG
4Performance Maturity Model Where do you fit in?
Level 5 Process Optimizing
Level 1 Reactive Fire Fighting
Level 4 Business Optimizing
Level 2 Monitoring And Instrumenting
Level 3 Performance Optimizing
Michael Maddox, MCI A Performance Process
Maturity Model
5Invest Your Time Understanding Performance and
Capacity
- Set Performance Objectives from the Start
- Optimize Your Environment from the Start.
6Set Performance and Capacity Objectives from the
Start
- Its Never too late to define a performance or
capacity objective. - Come as the result of a problem or issue
- Solving a maintenance window or schedule
- Planning for an upgrade
- Planning for a rollout to new users
- New Blackboard Building Blocks, Features or
Integration - Define Clear and Concise Objectives
- Measurable/Quantifiable and Achievable
- Differentiate between Performance and Capacity
- Processing Time versus Workload
- Growth versus Adoption
- Resource Utilization and Maintenance
7Optimize Your Environment from the Start
- Blackboard environments moving from supported to
mission critical (Application Management Maturity
Model) - Dedicate equipment and even network bandwidth.
- Understand the working parts
- Acquire knowledge about the integrated
sub-systems. - Dont need to be a web, app or db guru, but know
enough to - Manage and Maintain Independently
- Research Knowledge Gaps
- Solve Common Issues without Help
8Optimize Your Environment from the Start
- Optimize Environment from the Start based on
Knowledge of Sub-Systems - Monitor and Instrument Regularly
- Talk to Your Users about their Experience.
- Investigate Yourself
- Finding the Right Configuration takes time
- Make 1 Change at Time
- Make the Change Based on Empirical Information
(Not Hunches) - Maintain a Consistent Configuration for 1 period
of time (month, semester or a grading period)
9Introduction to Capacity Planning
10Capacity Planning Building an Ideal Blackboard
Environment
- What is Capacity Planning?
- Capacity Planning Factors
- Determine an Initial Deployment Architecture.
- Handling Adoption and Growth
- Archiving Data
- Backups and Restoration
- Maintenance Windows and Tasks
- Integrating with External Systems
- Redundancy and Failover
- Business Processes
- Upgrades
- Rolling out New Features
- Capacity Planning Tools
11Capacity Planning Factors Determine an Initial
Deployment Architecture
- Its Never Too Late to Consider or Reconsider
Your Deployment Architecture. - Try to Understand Key Components
- Eventual Audience Rollout
- User Behavior
- Session Patterns
- Frequency
- Concurrency
- Data Management Strategy
- Resource Needs
- Processing
- Storage
12Capacity Planning Factors Handling Adoption and
Growth
- Work with Functional Leaders to Understand
Deployment Strategy - Adoption Patterns of Users and Features
- Study Growth
- Not just users and courses, but data and content.
- Instrument daily, weekly, monthly, yearly, etc.
- Study the Activity Patterns of your Users
(Behavior Modeling) - Session Times
- Where they go and what they do
13Capacity Planning Factors Archiving Data
- A Lot of Data Can be Viewed as Disposable to Many
and Priceless by Few - Define a Strategy Early On About Archiving Data.
- Enable Tracking and Study Last Modified
- Use BB Tools to Archive and Export
- Remove from the System
- Maintain Activity Accumulator Data
- Export
- Purge Regularly
14Capacity Planning Factors Backups and Restoration
- Database Backups
- Differential versus Full
- Depends on Size, Confidence in Process and Usage
- Plan for the Unexpected
- Restore on Development Environments Routinely
- Store in a Safe Place
- Practice During Maintenance Windows
- File System Backups
- Perform Regularly
- Just as Valuable as database back-ups
- Not just data, but configuration
15Capacity Planning Factors Maintenance Windows
and Tasks
- Keep Your Users Informed
- Downtime/Outages
- Periods where Performance Can be Affected
- Schedule Regularly
- Log Rotations
- Server Restarts
- Database Statistics, Index Rebuilt and Extent
Management - Data Fragmentation
- Archiving and Purging Data
- Service Packs and Upgrades (discussed later)
16Capacity Planning Factors Integrating with
External Systems
- Understand the integration
- What data is affected
- Inbound versus Outbound
- Frequency of Integration
- Real-time versus Batched/Scheduled
- Hopefully not manually intervened
- Performance of both systems should not be
affected based on integration
17Capacity Planning Factors Failover and
Redundancy
- Have a Plan
- Make a Budget
- If no budget, communicate plan and downtime
- Practice for the Unexpected
- Be Realistic
- Built-In Capabilities for Redundancy and Failover
- Blackboard Load-Balancing
- SQL-Server Clustering and Oracle RAC
- Quality of Service Models
- Tomcat Clusters
18Capacity Planning Factors Business Processes
- Define Schedule with Functional and Technical
Leaders - Schedule for an extended period of time
- Map out window based on need and usage
- Model and Prototype
- Make Sure the Window is Large Enough
- Business processes should make sense and be
realistic - Schedule During Periods of Low Usage and Non-Peak
Times - Make it Repeatable, Automated and Easy to Debug
19Capacity Planning Factors Planning for Upgrades
- Updating Versions of Blackboard
- Take Advantage of New Features
- Functional Patches
- Performance Same or Optimized
- Performance Requirement for Every Development
Release - Updating Platform Technology
- Platform Patches
- Operating System Upgrades
- Plan for Downtime (Data Restoration)
- Updating Hardware Architecture
- Plan for Downtime (Data Restoration)
- Take Advantage of Faster, Cheaper Equipment
20Capacity Planning Factors Rolling Out New
Features
- Understand How New Features Change the Following
- Customer/User Behavior
- Adoption
- Growth
- Resource Utilization
- Integration Patterns
- Business Process Changes
21Capacity Planning Tools
- Behavior Modeling
- What is it?
- What tools can you use?
- Valid Instrumentation Periods.
- What to look for and to learn from the data.
- Homegrown Tools (What to Mine)
- Last Modified
- Growth Changes
- Adoption Patterns
- Concurrency Patterns
- Business Processes (Run Times)
22Behavior Modeling
23Capacity Planning Resources
- Modeling
- SPEED
- IBM Rational
- Simul8
- Opnet
- NetIq (WebTrends)
- Many Freeware Products on SourceForge
- Resources
- Performance by Design Computer Capacity
Planning By Example Menasce, Daniel
24Introduction to Performance Management
25Measuring Performance
- What to Focus On
- Response Time
- Processing Time
- Storage/Growth (volumetric patterns)
- Workload (Processing and Memory)
- Network Utilization/Bandwidth
- Adoption/Behavior
- New Features and Deployments
- Plot, Measure and Model
- Distinct Sessions
- Physical Resource Utilization (Workload)
- Logical Resource Utilization
26Measuring Performance
x
Slope of Recovery
Users
Peak of Saturation
Point of Max Workload
Workload
Peak of Concurrency
s
? / Time
i 0
Sessions Per Hour
Slope of Abandonment
Time
0
60
27Quality of Service Paradigm
- A web applications quality of service is
measured by response time, throughput and
availability. - Poor quality of service leads to abandonment,
decline in adoption and potentially permanently
lost users. - QoS is key to assessing how well Web-based
applications meet user expectations on two
primary measures availability and response time.
28Quality of Service All for One and One for All
Architecture
- What exactly does this mean?
- In todays architecture no system, sub-system,
use case, transaction, data element, etc. has a
greater utility value then its neighbor component
in the system. - Is this an accurate representation of the
product? - In Blackboard, all things are not created equally
or weighted equally in value as deemed by our
users. - However, our architecture is such that all things
are created and weighted equally. - Why is this bad?
- The QoS of the application becomes unpredictable.
- No guarantees can be made for capacity planning
and utilization. - Clients rarely have the comfort level that their
application environment is ever stable other then
periods of light usage.
29Quality of Service All Things are Not Equal, So
Lets Not Treat them Equal
- From a psychological perspective, its easy to
predict which systems have greater QoS needs then
others. - Taking an assessment has a greater utility then
reading an announcement. - Entering gradebook scores has a greater utility
then adding a course document or folder. - From a workload perspective, its easy to
conceptualize which systems demand greater QoS
needs then others. - A lab of 20 students taking an assessment has a
greater workload on the system then a lab of 20
students reading a course document. - A virtual workshop of 20 users collaborating has
a greater workload then 20 students navigating
through a course.
30Quality of Service Where Can We Go With This
- Resource management policies and procedures can
be implemented to support the workload needs of
the system. - Sub-system or potentially task workload
monitoring. - Administrator defined thresholds for application
management. - Seasonal deployment changes based on
patterns/trends of usage or even predefined
scheduling by course administrators. - Better utilization of capital expenditures.
- Potentially more expensive with greater adoption.
- Quantifiably reliable.
31Quality of Service Example
General Workload
Distributed Workload
32Dealing with Performance and Capacity Issues
33Dealing with Performance Issues
- Solving a performance issue is no different then
solving a functional issue. The same level of
care and effort in solving the issue should be
given. We recommend the following three steps as
the appropriate path for problem determination
and resolution - Decompose the Problem
- Resolve the Issue
- Follow Up and Prevent
34Dealing with Performance and Capacity Issues
- Most clients fail to report performance issues.
The bulk users of the system (students) rarely
report issues. - Most Issues are reported when
- Administrators experience performance issues
first hand for their own tasks. - Instructors are performing course administration
activities. - Instructors are working on the product in a
classroom environment. - Administrators pick up student chatter in BLOGS
and Discussion Boards. - What does that mean?
- Identifying the actual performance bottleneck is
hard and requires a well formulated approach. - Primarily performance issues are the result of
- Poor System Management in Dealing with Growth
- Changes in Adoption Patterns (Concurrency
Thresholds) - Functional Issues in the Application
- Undersized Hardware and Resources
- User Error (Unrealistic Operations)
35Characteristics of a Good Problem Resolution
Methodology
- Measurable
- Reliable
- Deterministic
- Practical
- Finite
- Predictive
- Efficient
- Impact Aware
36Performance Resolution Methods
- Trial and Error Method
- Response Time Method
- Do Nothing and Ignore Method
- Blame the Users Sub-Method
- Blame the Hardware Sub-Method
- Blame the Vendor Sub-Method
37Trial and Error Method
- Identify that a particular operation X has an
unacceptable response time. - Make changes with the intent of improving X.
- Remove any changes that make X worst.
- If improvement is not perceived, go back and make
additional changes. - If the improvement is minor, then go back and
make more changes as it is possible to produce
more improvements with additional changes.
38Response Time Method
- Select the critical operations for which the
business needs improved performance. - Collect proper diagnostic data during periods of
poor performance with a focus on - Response Time Consumption
- Execute the optimization activity that will have
the greatest net payoff to the business. - If the best payoff activity fails to yield
desired results, then suspend optimization
activities until something changes
39Example 1
- Scenario Butch (Student) logs into Blackboard to
access music files he stores in Content
Collection. He selects the appropriate tab and
waits for the left navigation frame to completely
load. He ends up waiting for 2 minutes until the
tree fully loads. Angered by repeated incidents
of this he sends a furious email to the system
administrators complaining about his lost time
waiting for the tree to load. - Question How do we address this problem
appropriately?
40Example 2
- Scenario The accounting department has decided
to utilize the Blackboard assessment engine for
high-stakes testing during semester mid-terms.
The department has issued a 1000 question random
block assessment, in which students will be
responsible for answering 25 questions in an
all-at-once deployment fashion. The department
wants all 500 students to complete testing during
a 2 hour window over the course of a week. - The last time the department used Blackboard for
high-stakes assessment, students complained about
page load times and a few incidents in which
students were kicked out of the application
resulting in a locked assessment. - Question The department has approached your
help. How do you avoid a repeated issue?
41Example 3
- Scenario An integration between the campus SCT
system and Blackboard must take place to ensure
students and faculty exist in the system and with
the appropriate course enrollment based on recent
course registration. The integration must take
place prior to the beginning of the semester. The
same integration took place last semester, but
was deemed a failure by the faculty as it took
over a week for all courses, faculty and students
to be entered and associated on the system. - You were/are the administrator in charge of the
integration. Part of the problem was that your
data feeds from SCT were unorganized. Another
problem is that you ran into a large number of
system-level issues that caused your integrations
to fail. - Question How do you reduce the risk and ensure
successful integration?
42Example 4
- Scenario You have procured budgetary funding to
replace the older Blackboard servers and storage
device for newer hardware. This new hardware is
expected to solve all of your performance
problems. The new servers will arrive in late
May, which will give you 45 days to configure and
convert your Blackboard environment before the
bulk of your students get back on the system. You
have been told by your boss that the system can
only be down for 48 hours, as the summer school
still uses Blackboard. - Question How do you ensure a smooth conversion
with minimal downtime? What can you do in
advance? How would you spend your 48 hours of
downtime?
43Example 5
- Scenario Suzie (Blackboard Administrator) has
been contacted by her boss about a change in the
schools Blackboard licensing. The school had
been using a Blackboard Learning System - Basic
license for the past two years. They have
upgraded to the Blackboard Learning System and
purchased the Blackboard Community System and
Blackboard Content System in order to support a
new distance learning initiative. Her boss tells
Suzie that she is responsible for the following - Purchasing of hardware and storage to support
new products. - Software Upgrade from Blackboard Learning System
Basic Edition to Blackboard Learning System - Installation and Configuration of the new
implementation. - The new software components are expected to
change the way Blackboard has traditionally been
used at the school. There will be lots more
data, and will cater to a community 10X the size
of the present implementation. - Question What can Suzie do in order to prepare
for the change in features, adoption and growth?
44Performance Resources
- Measurement
- Windows Tool Kit, Top, Sar, VMStat, Prstat
- JProbe, OptimizeIt, HPJmeter, JMPI/Thread Dumps
- Hotsos, Statspack, TKProf, Enterprise Manager,
Query Analyzer - Performasure, Spotlight, Patrol, Unicenter
- Apache Server-Status, JVMStat, VerboseGC
- Resources
- http//support.microsoft.com/kb/224587
- http//www.sql-server-performance.com/jc_sql_serve
r_quantative_analysis1.asp - http//www.javaperformancetuning.com
- http//www.oraperf.com
- http//www.ixora.com.au
- http//www.hotsos.com
- http//perl.apache.org/docs/1.0/guide/performance.
html
45Introduction to Load Testing
46Introduction to Load Testing
- Load Testing is the process of
- Simulating synthetic workload on a software
application. - Identifying where bottlenecks exist
- Software Layer
- Hardware and/or Interface Layer
- Determining software and system capacity
capabilities under a given workload. - Attempting to meet or exceed a predefined
performance objectives. - Representing conditional patterns of application
usage.
47Introduction to Load Testing
- Software load testing requires a significant
investment from an organization both financially
and operationally. - Most commercially available load testing tools
cost tens of thousands of dollars to purchase and
maintain. - Organizing and managing a staff focused on using
these specialized tools bears similar expense. - Organizations must be prepared to deal with the
results of the load tests. - Optimizing Software (Refactoring)
- Identifying Accurate Sizing and Capacity
Configurations
48Components of Load Testing
- Reusable autonomous actions in the application
(Create, Read, Delete, - Update and Execute)
- Isolated verification points
- Incorporation of abandonment (patience rating)
Library of Test Assets
- Capture statistical overview of current
implementations (data models) - Study usage patterns and trends for simulation
- Develop performance data models based on
findings.
Volumetrics and Usage Analysis
Scenario Definition
- Simulation of realistic scenarios based on
actual usage (artifacts) - Focus on sessions per hour rather then solely on
concurrency - Session Outcomes Abandon, Abort, Continue or
Idle.
- Define user patience rating (Will users abandon
if the transaction - or site are slow)
- Incorporate as a means of preserving
realistic/expected usage patterns.
Abandonment
49Load Testing as a Part of the Blackboard SDLC
50Load Testing as a Part of the Blackboard SDLC
- Five step process deep rooted in designing for
performance before a feature is developed. - Part of the requirements process by assessing
risk, defining performance requirements and
isolating high-impact use cases. - Study artifacts of performance within current
implementations - Usage Analysis
- Data Collection (Volumetrics within the Data
Model) - Isolate software contention by identifying
software anti-patterns. - Refactor and optimize the software application
layer (business logic and database structure). - Performance test the software under conditional
and common load on standard/recommended
configurations. - Simulate Abandonment for Calibration Purposes
- Generate enough samples of a given function
- Stay within 2 Sigma (95 response time)
51Load Testing Tools and Resources
- Simulation
- Mercury LoadRunner
- Segue Silk Performer
- Grinder and Apache JMeter
- Open STA
- Rational Test Studio
- Microsoft WCAT and WAST
- Resources
- http//www.keynote.com/downloads/articles/tradesec
rets.pdf (Abandonment) - http//www-128.ibm.com/developerworks/rational/lib
rary/4169.html (Great Starter Article) - Performance Analysis for Java Websites Joines,
Stacy
52Closing Slide
- Innovating Together in 05
- Managing Performance and Capacity is something
everyone can do. - The more quantifiable something isthe more
attainable it can be. - Resources Available
- Provided throughout the presentation.
- Follow up Contact(s)
- Steve Feldman, sfeldman_at_blackboard.com
- IF YOU ONLY REMEMBER 1 THING
- It is never too late to think about performance
and capacity.