3178 24 x 7 StarTeam - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

3178 24 x 7 StarTeam

Description:

Disk, CPU, memory, power supply, fans, network card, motherboard, disk controller, etc. ... Viruses, worms, DDOS. Application Issues. Application brown-outs ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 60

Provided by: randy48

Category:

Tags: starteam

more less

Transcript and Presenter's Notes

Title: 3178 24 x 7 StarTeam

1
317824 x 7 StarTeam

Randy Guck
Chief Scientist, DSP
Borland

2
Overview

High availability fundamentals
How available is highly-available?
High availability at what price?
Enemies of high availability

3
Overview

StarTeam high availability best practices
Administrative practices
Flash (demand peak) control
Backup procedures
Redundancy
Failover and clustering
Disaster recovery and replication

4
High Availability FundamentalsHow available is
highly-available?
5
A Distorted Term

Depending on who you ask, high availability
means
24 x 7 uptime
Clustering
Failover
Online backups
Five nines or Six sigma
Currently not dating

6
Availability by the Numbers
7
The Myth of the Nines

Most people want more than they need
Actual reliability difficult to compute (complex
mathematics)
Example 99.99 reliability (downtime52
minutes/year) of 7 components results in 99.93
(downtime6 hours/year).
Downtime often affected by future, unforeseeable
business decisions

8
MTBF versus MTTR

MTBF mean time between failures
MTTR mean time to repair
Availability
A MTBF / (MTBF MTTR)
Availability is good if MTTR is low
99.9999 availability (six sigma) 6 mins
downtime in 11.4 years!

9
A Better Approach

Focus on scenarios and probabilities
Examine organizations needs
Identify possible service disruptions
Prioritize failures by probability
Address scenarios on a cost/benefit basis
Test each failure scenario
The result is your high availability plan!

10
High Availability FundamentalsHigh availability
at what price?
11
Availability versus Investment

Disaster recovery planning

Failover management

Redundancy (no SPOFs)

Backup procedures

Availability

Demand peak management (flash control)

Basic administrative practices

Investment
12
What kind of systems need the highest
availability?

Life-rated systems
Space shuttle onboard systems
Emergency response systems
Command-and-control systems
High financial cost systems
Stock-trading systems
Reservation systems
Banking systems

13
ALM High Availabilityin Perspective

Although ALM systems are becoming more
mission-critical, they do not have the same
financial or loss of life impact as some
systems, so it doesnt make sense to model high
availability after them
Bottom line Strike a reasonable balance between
investment and high availability

14
High Availability Fundamentals Enemies of
availability
15
Infrastructure Issues

Hardware failures
Disk, CPU, memory, power supply, fans, network
card, motherboard, disk controller, etc.
Environmental failures
Power, cooling, fire, flood, hurricane,
earthquake, terrorism, etc.

16
Infrastructure Issues

Network outages
LAN outages switch/cable failures (server-to-DB
network segment, client-to-server network
segment, etc.)
WAN outages VPN failure, ISP failure, physical
network issues, etc.
Service outages DNS, DHCP, directory server,
email, etc.

17
Infrastructure Issues

Database outages
Out-of-disk issues, recovery time after reboot,
index corruption, etc.
Bandwidth issues
Network congestion, database congestion, resource
starvation
Denial-of-service Attacks
Viruses, worms, DDOS

18
Application Issues

Application brown-outs
Locking/bottleneck issues, demand peaks, etc.
Application outages
Hangs, fatal exceptions, out-of-memory
Scheduled outages
Offline backups, application patches, database
upgrades, etc.

19
Plan of Attack

To a specific user, a service is down when it
is not available for any reason
A comprehensive high availability plan must
consider all potential outages from end-to-end,
on a cost/benefit basis

20
StarTeam High AvailabilityBest Practices

Disaster recovery planning

Failover management

Redundancy (no SPOFs)

Backup procedures

Availability

Demand peak management (flash control)

Basic administrative practices

Investment
21
Administrative Practices

Administrative best practices top 10 list
10 Dont be cheap
9 Enforce security
8 Centralize your servers
7 Enforce change control
6 Document everything

22
Administrative Practices

Administrative best practices top 10 list
5 Test everything
4 Design for growth
3 Choose mature software
2 Choose mature hardware
1 K.I.S.S.

23
StarTeam High AvailabilityBest Practices

Disaster recovery planning

Failover management

Redundancy (no SPOFs)

Backup procedures

Availability

Demand peak management (flash control)

Basic administrative practices

Investment
24
Flash Control

Client/server systems have natural demand peaks
Peaks are often time-based e.g.
Everyone logs in the morning
Big reports launched just before lunch
Peaks are often calendar-based e.g.
End-of-week builds
End-of-month reports

25
Client/Server Architecture
StarTeam Client
StarTeam Server
Command API
Demand peak congestion areas
StarTeam Client
DB
Vault
All information is pulled by clients using a
request/reply command API
StarTeam Client
26
StarTeamMPX
StarTeam Client
StarTeam Server
Event publish stream
Message Broker
StarTeam Client
DB
Vault
Updated objects are pushed to clients, preventing
poll and refresh requests, smoothing demand peaks
StarTeam Client
27
New for 7.0 MPX Cache Agent
StarTeam Client
StarTeam Server
Check-out requests
Message Broker
Cache Agent
DB
Vault
File publish stream
Encrypted Cache
The Cache Agent is trickled charged with file
contents, providing an alternate check-out source
for remote clients.
StarTeam Client
28
StarTeam High AvailabilityBest Practices

Disaster recovery planning

Failover management

Redundancy (no SPOFs)

Backup procedures

Availability

Demand peak management (flash control)

Basic administrative practices

Investment
29
Backups for High Availability

Mirroring does not replace backups
Backups are an important part of high
availability
Test integrity of backups periodically
Consider a rotating/hierarchical storage system,
which can serve disaster recover scenarios

30
StarTeam Backups

StarTeam 6.0 backup procedure
Lock the server ?
Backup the database and vault
Disk-to-disk and differential dumps can speed
things up
Unlock the server
Why does the server need to be locked?

31
Review StarTeam 6.0 Vault
Single Volume
Base version
Delta 1
Delta 2
Delta 3

Text files
Archive Folder
Base version
Delta 1
Delta 2
Delta 3

Base version
Rev 1
Rev 2
Rev 3

Binary files
Base version
Rev 1
Rev 2
Rev 3

Single Volume
Full version
Full version
Cache Folder
Full version
Full version
Uncompressed
Full version
Full version
Full version
Full version
32
New StarTeam 7.0 Vault
StarTeam Server
Hive Index
Vault
DB

Hive
Hive
Hive
Hive
33
7.0 Vault Inside the Hive
subfolders
MD5-based storage
Hive
compressed
000a807b9f393f58a69998b2cd7db7d2.gz
00/0
Archive Root
000752242cc7e16d573f299a127903f2.gz

uncompressed
ff/f
fff16c26e911ac72abad5557ac44d84c
000a807b9f393f58a69998b2cd7db7d2
uncompressed
00/0
Cache Root
000752242cc7e16d573f299a127903f2

ff/f
fffb865605a09eef1f06be92a38bc8da
34
StarTeam 7.0On-line Backup Procedure

The new vault allows on-line backups
Backup the database on-line
When complete, backup archive and attachment
folders
2.1 Perform full backups weekly
2.2 Perform incremental backups daily
No need to lock the server! ?

35
StarTeam 7.0Recovery Procedure

To recover a full StarTeam configuration
Reload the database
Simultaneously reload archive and attachment
folders
2.1 Load latest full backup
2.2 Load all incrementals since last full
backup in parallel
Modify this procedure for partial recoveries

36
StarTeam High AvailabilityBest Practices

Disaster recovery planning

Failover management

Redundancy (no SPOFs)

Backup procedures

Availability

Demand peak management (flash control)

Basic administrative practices

Investment
37
Reducing SPOFs

Servers
Dual power supplies, ECC/mirrored memory, dual
fans, etc.
Storage
Dual controllers, mirrored/RAID disks
Network
Dual network cards, redundant switches, dual ISP
connections, etc.

38
Redundant Everything
StarTeam Server
ECC memory, dual fans, etc.
Switch
RAIDvaultdisks
dualcontrollers
dual NICs
RAIDDBdisks
Switch
Database Server
39
StarTeam High AvailabilityBest Practices

Disaster recovery planning

Failover management

Redundancy (no SPOFs)

Backup procedures

Availability

Demand peak management (flash control)

Basic administrative practices

Investment
40
Failover Checklist

At least two identically configured systems
Shared disks
Network connections
Heartbeat/server network
Client-facing service network
Optional administrative network
Failure Management System (FMS)
Cluster set app, db connections, IP address

41
StarTeam Active/Passive Configuration Requirements

Each system identically configured
StarTeam release (including patches)
starteam-server-configs.xml
EventServices\ltconfiggt\.xml
ServerLicenses.st
Access to shared vault and database
Only one instance can be running at a time
Failover time is secondary startup time

42
Active/Passive Configuration
heartbeat network
Mirroreddisks
Active Server
Passive Server
12.34.56.78
client-facing service network
43
Failover Condition
heartbeat network
X
Mirroreddisks
Active Server
Passive Server
12.34.56.78
client-facing service network
44
StarTeam and BDOC

Borland Deployment Op-Center can assist with
process monitoring and restart
StarTeam Server process
MPX Processes
Message Broker
Multicast Service
Cache Agent
Workflow Notification Agent

45
Op-Center Example
46
StarTeam High AvailabilityBest Practices

Disaster recovery planning

Failover management

Redundancy (no SPOFs)

Backup procedures

Availability

Demand peak management (flash control)

Basic administrative practices

Investment
47
Replication for DR

Types of replication based on latency
Synchronous Remote site is always up-to-date
Asynchronous Remote site lags by a small amount
of time
Batch Remote site receives periodic snapshots
(e.g., backups)

48
Synchronous Replication

Long-distance mirroring
Fibre channel 10km or more with newer
technologies
Variation disk replication software (e.g.,
Veritas Volume Replicator)
Advantages real-time replication
Disadantages cost

49
Asynchronous Replication

Possible strategy for StarTeam
Database-provided replication e.g.
SQL Server Log Shipping
Oracle Standby Database Replication
Continuous/incremental copy of attachment and
archive files
Exploits write-once feature of StarTeam 7.0 vault
Possible because not yet in use!

50
Asynchronous Replication

Advantages
Less network bandwidth needed than synchronous
replication
Database currency window can be tuned
Disadvantages
Requires reliable network
Not yet tested!

51
Batch Replication

Sending backups offsite
Never underestimate the bandwidth of a station
wagon filled with tapes barreling down the
highway
Make copies of backups or rotate backups through
offsite storage
Send backups via FedEx, UPS, Volvo net, etc.

52
Batch Replication

Advantages
Reliable
Low cost
Full backups ensure recoverability
Disadvantages
Asynchronous (time lag)
Manual process (media handling) unless network
bandwidth is available

53
StarTeam High AvailabilityBest PracticesOther
Topics
54
Other High Availability Features for StarTeam 7.0

New StarTeam 7.0 Vault
Conversion from StarTeam 6.0 vault can occur in
real-time as background or scheduled process
Vault space can be increased dynamically by
adding new hives
Archive files can be offloaded/ reloaded
dynamically

55
Other High Availability Features for StarTeam 7.0

New StarTeam 7.0 Memory Management
New memory management caps memory growth with
XxxCaching values gt 0 (where Xxx Files,
ChangeRequests, etc.)
Allows the server to run for very long periods
without restarting

56
Summary

High availability is a cost/benefit pursuit
Review administrative practices
Smooth demand peaks (MPX)
Establish on-line backup procedures
Eliminate SPOFs
Consider clustering for failover
Create a disaster recovery plan
Document and test everything!

57
References

Blueprints for High Availability 2nd Edition,
Evan Marcus and Hal Stern, Wiley Publishing Inc.
(2003) detailed discussion of all issues related
to high availability
Applied Reliability, Paul Tobias and David
Trindade, Kluwer Academic Publishers (1995)
detailed mathematical treatment of failure rates
and renewability

58
Questions?
59
Thank You