Title: 3178 24 x 7 StarTeam
1317824 x 7 StarTeam
- Randy Guck
- Chief Scientist, DSP
- Borland
-
2Overview
- High availability fundamentals
- How available is highly-available?
- High availability at what price?
- Enemies of high availability
3Overview
- StarTeam high availability best practices
- Administrative practices
- Flash (demand peak) control
- Backup procedures
- Redundancy
- Failover and clustering
- Disaster recovery and replication
4High Availability FundamentalsHow available is
highly-available?
5A Distorted Term
- Depending on who you ask, high availability
means - 24 x 7 uptime
- Clustering
- Failover
- Online backups
- Five nines or Six sigma
- Currently not dating
6Availability by the Numbers
7The Myth of the Nines
- Most people want more than they need
- Actual reliability difficult to compute (complex
mathematics) - Example 99.99 reliability (downtime52
minutes/year) of 7 components results in 99.93
(downtime6 hours/year). - Downtime often affected by future, unforeseeable
business decisions
8MTBF versus MTTR
- MTBF mean time between failures
- MTTR mean time to repair
- Availability
- A MTBF / (MTBF MTTR)
- Availability is good if MTTR is low
- 99.9999 availability (six sigma) 6 mins
downtime in 11.4 years!
9A Better Approach
- Focus on scenarios and probabilities
- Examine organizations needs
- Identify possible service disruptions
- Prioritize failures by probability
- Address scenarios on a cost/benefit basis
- Test each failure scenario
- The result is your high availability plan!
10High Availability FundamentalsHigh availability
at what price?
11Availability versus Investment
- Disaster recovery planning
Availability
- Demand peak management (flash control)
- Basic administrative practices
Investment
12What kind of systems need the highest
availability?
- Life-rated systems
- Space shuttle onboard systems
- Emergency response systems
- Command-and-control systems
- High financial cost systems
- Stock-trading systems
- Reservation systems
- Banking systems
13ALM High Availabilityin Perspective
- Although ALM systems are becoming more
mission-critical, they do not have the same
financial or loss of life impact as some
systems, so it doesnt make sense to model high
availability after them - Bottom line Strike a reasonable balance between
investment and high availability
14High Availability Fundamentals Enemies of
availability
15Infrastructure Issues
- Hardware failures
- Disk, CPU, memory, power supply, fans, network
card, motherboard, disk controller, etc. - Environmental failures
- Power, cooling, fire, flood, hurricane,
earthquake, terrorism, etc.
16Infrastructure Issues
- Network outages
- LAN outages switch/cable failures (server-to-DB
network segment, client-to-server network
segment, etc.) - WAN outages VPN failure, ISP failure, physical
network issues, etc. - Service outages DNS, DHCP, directory server,
email, etc.
17Infrastructure Issues
- Database outages
- Out-of-disk issues, recovery time after reboot,
index corruption, etc. - Bandwidth issues
- Network congestion, database congestion, resource
starvation - Denial-of-service Attacks
- Viruses, worms, DDOS
18Application Issues
- Application brown-outs
- Locking/bottleneck issues, demand peaks, etc.
- Application outages
- Hangs, fatal exceptions, out-of-memory
- Scheduled outages
- Offline backups, application patches, database
upgrades, etc.
19Plan of Attack
- To a specific user, a service is down when it
is not available for any reason - A comprehensive high availability plan must
consider all potential outages from end-to-end,
on a cost/benefit basis
20StarTeam High AvailabilityBest Practices
- Disaster recovery planning
Availability
- Demand peak management (flash control)
- Basic administrative practices
Investment
21Administrative Practices
- Administrative best practices top 10 list
- 10 Dont be cheap
- 9 Enforce security
- 8 Centralize your servers
- 7 Enforce change control
- 6 Document everything
22Administrative Practices
- Administrative best practices top 10 list
- 5 Test everything
- 4 Design for growth
- 3 Choose mature software
- 2 Choose mature hardware
- 1 K.I.S.S.
23StarTeam High AvailabilityBest Practices
- Disaster recovery planning
Availability
- Demand peak management (flash control)
- Basic administrative practices
Investment
24Flash Control
- Client/server systems have natural demand peaks
- Peaks are often time-based e.g.
- Everyone logs in the morning
- Big reports launched just before lunch
- Peaks are often calendar-based e.g.
- End-of-week builds
- End-of-month reports
25Client/Server Architecture
StarTeam Client
StarTeam Server
Command API
Demand peak congestion areas
StarTeam Client
DB
Vault
All information is pulled by clients using a
request/reply command API
StarTeam Client
26StarTeamMPX
StarTeam Client
StarTeam Server
Event publish stream
Message Broker
StarTeam Client
DB
Vault
Updated objects are pushed to clients, preventing
poll and refresh requests, smoothing demand peaks
StarTeam Client
27New for 7.0 MPX Cache Agent
StarTeam Client
StarTeam Server
Check-out requests
Message Broker
Cache Agent
DB
Vault
File publish stream
Encrypted Cache
The Cache Agent is trickled charged with file
contents, providing an alternate check-out source
for remote clients.
StarTeam Client
28StarTeam High AvailabilityBest Practices
- Disaster recovery planning
Availability
- Demand peak management (flash control)
- Basic administrative practices
Investment
29Backups for High Availability
- Mirroring does not replace backups
- Backups are an important part of high
availability - Test integrity of backups periodically
- Consider a rotating/hierarchical storage system,
which can serve disaster recover scenarios
30StarTeam Backups
- StarTeam 6.0 backup procedure
- Lock the server ?
- Backup the database and vault
- Disk-to-disk and differential dumps can speed
things up - Unlock the server
- Why does the server need to be locked?
31Review StarTeam 6.0 Vault
Single Volume
Base version
Delta 1
Delta 2
Delta 3
Text files
Archive Folder
Base version
Delta 1
Delta 2
Delta 3
Base version
Rev 1
Rev 2
Rev 3
Binary files
Base version
Rev 1
Rev 2
Rev 3
Single Volume
Full version
Full version
Cache Folder
Full version
Full version
Uncompressed
Full version
Full version
Full version
Full version
32New StarTeam 7.0 Vault
StarTeam Server
Hive Index
Vault
DB
Hive
Hive
Hive
Hive
337.0 Vault Inside the Hive
subfolders
MD5-based storage
Hive
compressed
000a807b9f393f58a69998b2cd7db7d2.gz
00/0
Archive Root
000752242cc7e16d573f299a127903f2.gz
uncompressed
ff/f
fff16c26e911ac72abad5557ac44d84c
000a807b9f393f58a69998b2cd7db7d2
uncompressed
00/0
Cache Root
000752242cc7e16d573f299a127903f2
ff/f
fffb865605a09eef1f06be92a38bc8da
34StarTeam 7.0On-line Backup Procedure
- The new vault allows on-line backups
- Backup the database on-line
- When complete, backup archive and attachment
folders - 2.1 Perform full backups weekly
- 2.2 Perform incremental backups daily
- No need to lock the server! ?
35StarTeam 7.0Recovery Procedure
- To recover a full StarTeam configuration
- Reload the database
- Simultaneously reload archive and attachment
folders - 2.1 Load latest full backup
- 2.2 Load all incrementals since last full
backup in parallel - Modify this procedure for partial recoveries
36StarTeam High AvailabilityBest Practices
- Disaster recovery planning
Availability
- Demand peak management (flash control)
- Basic administrative practices
Investment
37Reducing SPOFs
- Servers
- Dual power supplies, ECC/mirrored memory, dual
fans, etc. - Storage
- Dual controllers, mirrored/RAID disks
- Network
- Dual network cards, redundant switches, dual ISP
connections, etc.
38Redundant Everything
StarTeam Server
ECC memory, dual fans, etc.
Switch
RAIDvaultdisks
dualcontrollers
dual NICs
RAIDDBdisks
Switch
Database Server
39StarTeam High AvailabilityBest Practices
- Disaster recovery planning
Availability
- Demand peak management (flash control)
- Basic administrative practices
Investment
40Failover Checklist
- At least two identically configured systems
- Shared disks
- Network connections
- Heartbeat/server network
- Client-facing service network
- Optional administrative network
- Failure Management System (FMS)
- Cluster set app, db connections, IP address
41StarTeam Active/Passive Configuration Requirements
- Each system identically configured
- StarTeam release (including patches)
- starteam-server-configs.xml
- EventServices\ltconfiggt\.xml
- ServerLicenses.st
- Access to shared vault and database
- Only one instance can be running at a time
- Failover time is secondary startup time
42Active/Passive Configuration
heartbeat network
Mirroreddisks
Active Server
Passive Server
12.34.56.78
client-facing service network
43Failover Condition
heartbeat network
X
Mirroreddisks
Active Server
Passive Server
12.34.56.78
client-facing service network
44StarTeam and BDOC
- Borland Deployment Op-Center can assist with
process monitoring and restart - StarTeam Server process
- MPX Processes
- Message Broker
- Multicast Service
- Cache Agent
- Workflow Notification Agent
45Op-Center Example
46StarTeam High AvailabilityBest Practices
- Disaster recovery planning
Availability
- Demand peak management (flash control)
- Basic administrative practices
Investment
47Replication for DR
- Types of replication based on latency
- Synchronous Remote site is always up-to-date
- Asynchronous Remote site lags by a small amount
of time - Batch Remote site receives periodic snapshots
(e.g., backups)
48Synchronous Replication
- Long-distance mirroring
- Fibre channel 10km or more with newer
technologies - Variation disk replication software (e.g.,
Veritas Volume Replicator) - Advantages real-time replication
- Disadantages cost
49Asynchronous Replication
- Possible strategy for StarTeam
- Database-provided replication e.g.
- SQL Server Log Shipping
- Oracle Standby Database Replication
- Continuous/incremental copy of attachment and
archive files - Exploits write-once feature of StarTeam 7.0 vault
- Possible because not yet in use!
50Asynchronous Replication
- Advantages
- Less network bandwidth needed than synchronous
replication - Database currency window can be tuned
- Disadvantages
- Requires reliable network
- Not yet tested!
51Batch Replication
- Sending backups offsite
- Never underestimate the bandwidth of a station
wagon filled with tapes barreling down the
highway - Make copies of backups or rotate backups through
offsite storage - Send backups via FedEx, UPS, Volvo net, etc.
52Batch Replication
- Advantages
- Reliable
- Low cost
- Full backups ensure recoverability
- Disadvantages
- Asynchronous (time lag)
- Manual process (media handling) unless network
bandwidth is available
53StarTeam High AvailabilityBest PracticesOther
Topics
54Other High Availability Features for StarTeam 7.0
- New StarTeam 7.0 Vault
- Conversion from StarTeam 6.0 vault can occur in
real-time as background or scheduled process - Vault space can be increased dynamically by
adding new hives - Archive files can be offloaded/ reloaded
dynamically
55Other High Availability Features for StarTeam 7.0
- New StarTeam 7.0 Memory Management
- New memory management caps memory growth with
XxxCaching values gt 0 (where Xxx Files,
ChangeRequests, etc.) - Allows the server to run for very long periods
without restarting
56Summary
- High availability is a cost/benefit pursuit
- Review administrative practices
- Smooth demand peaks (MPX)
- Establish on-line backup procedures
- Eliminate SPOFs
- Consider clustering for failover
- Create a disaster recovery plan
- Document and test everything!
57References
- Blueprints for High Availability 2nd Edition,
Evan Marcus and Hal Stern, Wiley Publishing Inc.
(2003) detailed discussion of all issues related
to high availability - Applied Reliability, Paul Tobias and David
Trindade, Kluwer Academic Publishers (1995)
detailed mathematical treatment of failure rates
and renewability
58Questions?
59Thank You
- 3178
- 24 x 7 StarTeam
- Please fill out the speaker evaluation
- You can contact me further at randy.guck_at_borland
.com