OpenEdge High Availabilty - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

OpenEdge High Availabilty

Description:

Title: Progress Client/Server for developers Author: Alan Wilkinson Last modified by: Adam Backman Created Date: 7/2/2006 12:05:35 PM Document presentation format – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 40
Provided by: AlanW192
Category:

less

Transcript and Presenter's Notes

Title: OpenEdge High Availabilty


1
OpenEdge High Availabilty
  • Adam Backman
  • Grand Poobah White Star Software

2
About the speaker
  • Head Winemaker White Star Software
  • One of the oldest and most respected consulting
    and training companies in the Progress OpenEdge
    sector
  • Lackey DBAppraise
  • Managed database services backed up by
    experienced Progress OpenEdge professionals not
    rookies off the bench
  • Read a book or two
  • Snappy Dresser
  • Knows a bit about systems and OpenEdge

3
Agenda
  • Are you really 24X7?
  • Redundancy
  • Replication
  • Maintenance
  • Failing over
  • Conclusion

4
What is High Availability?
  • A real business need that requires full access to
    current data at any time of the day or night
  • Many sites are kind of 24X7 but only a small
    percentage of companies have real business
    requirements that necessitate access to the data
    24 hours a day.
  • Some applications have high availability needs
    but only during given hours which simplifies
    maintenance
  • The need is growing every day

5
Are You Really 24X7?
  • Business runs 24 hours a day
  • 3-shift manufacturing, Utility, Casino, Website,
  • Business needs access 24 hours
  • Work during the day, report and plan at night
  • Weekend requirements

6
What is High Availability?
  • The ability to keep running your business
  • Continuous Access which allows for failures with
    zero impact to the users
  • Minimally Invasive failure management like using
    HACMP clustering with OpenEdge as a cluster
    service
  • Major Failover where physical location of the
    application must be changed
  • Minimal recovery time in case of disaster
  • It is not disaster recovery DR is only used
    when HA fails

7
Before you begin
  • Understand your business
  • Understand the cost of downtime
  • Do not build a solution that costs more that what
    you are protecting

8
People
  • Who owns the data
  • Be inclusive with invites most will drop out
  • This is not solely an IT decision
  • You are the keeper, not owner of the data
  • You know what is technically possible
  • You know the cost of the tech needed to build the
    solution
  • The goal is to eliminate surprises if/when a
    problem occurs

9
Planning
  • Budget it is not free
  • Hardware fault tolerant, redundancy,
  • Software OpenEdge plus ALL the other stuff you
    have to run the operation
  • Knowledge Buy or Rent
  • Time schedule and outage time
  • Personnel constraints Who is on call and who is
    their backup

10
Causes of Downtime
  • Hardware
  • Disks are most vulnerable as they are the only
    moving part unless you have SSD
  • Power - All the hardware requires power
  • Software
  • OS bug
  • OpenEdge (core or application) bug
  • Natural disaster
  • Fire
  • Flood
  • Sabotage
  • Human Error

11
Basic Rules
  • Good Hardware
  • Trusted vendor
  • Good support (local support if possible)
  • No Windows (OK, maybe 2008)
  • You need a good recovery plan
  • You will run with after imaging enabled

12
Redundancy
  • Hardware
  • Software
  • Personnel

13
Redundancy Hardware
  • Power (UPS or UPS Generator)
  • Mirrored disks
  • Network - in machine and general network
  • Non-interleaved memory (some use FT memory)
  • Multiple CPUs
  • Support hardware (PCs, terminals, phone,)
  • Complete failover environment

14
Hardware
  • Why have a UPS and a generator?
  • UPS has limited capacity
  • Generators can run for a long time
  • Have a reliable source of extra fuel

15
Hardware
  • Do not let standby systems sit idle
  • Use them for development or test
  • Keep copies of all support files
  • .pf
  • .ini
  • .d

16
Redundancy Software
  • Host-based are least fault tolerant
  • Web-based can provide a good environment provided
    the AppServer calls are stateless
  • In client/server model remember that file servers
    need to be redundant as well

17
Redundancy Software
  • NameServer on the broadcast and clustered
  • Dont use the NameServer
  • Cluster your AppServers so if a single AppServer
    fails there is another to pick up the load

18
Redundancy Staffing
  • Is the failover machine close?
  • Can it reliably be accessed remotely (failure
    point)
  • Possible to call in additional resources?
  • More hands
  • Different skills
  • Relief of tired staff
  • Is it necessary to support all functions or only
    core?

19
Replication of Data
  • Database data
  • OpenEdge replication (synchronous)
  • Log-based replication (asynchronous)
  • Hardware-based replication (?)
  • Application and User files
  • OS utililty (fsync, rsync, )
  • Hardware (remote mirroring)
  • Third-party (polyserve)

20
Replication OpenEdge
  • Pros
  • Supported product
  • Synchronous
  • Fast (Really Fast)
  • Cons
  • Cost
  • Yet another thing to support
  • Additional resource usage

21
Replication Log-based
  • Pros
  • Cheap (Not free, but close)
  • Easy to setup and maintain
  • Cons
  • No formal support
  • Additional resource utilization

22
Hardware Replication
  • Pros
  • Easy setup
  • Easy Maintenance
  • Cons
  • Expensive
  • Possibility of data corruption unless ALL writes
    are guaranteed

23
Maintenance
  • Script everything to eliminate human error
  • Scheduled Maintenance
  • Application changes
  • Backups
  • Index maintenance
  • Adding space
  • Unscheduled maintenance
  • Eliminate unscheduled maintenance buy monitoring
    and trending

24
Maintenance Application
  • Schema
  • Use fast schema add then add default value
  • Still requires an outage for some changes due to
    table locks
  • Code changes
  • If you are n-tier you can stop the AppServer to
    reduce the interruption
  • Switch to a different propath and move clients
    over through natural attrition

25
Maintenance Backups
  • Progress backup
  • Reliable
  • Online option
  • Split mirror backup
  • Replication backup
  • Eliminate overhead on production db
  • Must be a no recover backup for log-based
    replication

26
Maintenance Index
  • Index rebuild cannot be run against a replicated
    database
  • Use index compact online
  • proutil ltdbnamegt -C idxcompact lttable.indexgt
  • Notes
  • Watch for open transactions as idx compact will
    do a significant amount of logging
  • Schedule outside of busy times to allow
    replication to keep up

27
Maintenance Add Space (Online and offline
approaches)
  • prostrct addonline to add space while you are
    running
  • Process
  • Make sure your umask is correct
  • Validate your add.st file
  • prostrct addonline db add.st
  • prostrct is supported for both source and target
    databases with the exception of prostrct unlock
  • Process
  • Shutdown source and target
  • Make changes to source
  • Make changes to target
  • Start both databases

28
Maintenance
  • All maintenance should be scripted and tested in
    a test environment before proceeding with the
    Production run
  • Eliminate the human element (no typos)
  • Know how long it will take
  • Make sure maintenance does not cause a problem
  • Apply and test schema changes thoroughly

29
Building a failover plan
  • Who
  • Business and technical personnel
  • Gets informed email, conference call, call
    tree,
  • Makes Decisions
  • Does the work
  • What
  • What resources are affected?
  • Where
  • Location of physical resources
  • Location of personnel
  • Location of replacement/replication target

30
Building a failover plan - continued
  • When
  • Times of backups
  • Times of data archiving
  • Times of backup archiving
  • Times of log archiving
  • Why
  • What are we protecting ourselves from
  • Why did we choose not to deal with some event

31
Risk Assessment
  • Things to consider
  • Risk Natural Disaster, Human caused, hardware,
  • Likelihood
  • Impact to application environment
  • Time to recover
  • It is OK to say we considered that and it was not
    high enough in likelihood in our eyes to create a
    solution
  • Determine the dependency of each level
  • Hardware requires power
  • OpenEdge application requires PostalSoft

32
Solutions
  • Document redundancy where it exists
  • Document places where redundancy is missing or
    unknown (on purpose or omission)
  • Ensure reasonable software update procedures are
    in place and documented
  • Verify security, division of responsibilities and
    software release policies per layer
  • Need to develop Risk Assessment form

33
Aspects of a failover plan
  • When
  • When do we decide to move to the standby
    environment?
  • Who makes the decision?
  • Who does the work along with a backup for who
    does the work
  • Defined process
  • Service level agreements with customers
  • Milestones in the process
  • Why
  • This is a tougher decision than you think
  • Fix or flee lost time vs. lost data

34
Documenting your plan
  • Your plan should be able to be executed by anyone
  • You cannot have enough detail
  • Automate as much of the process as possible to
    eliminate the human element
  • Document and automate both the failover and the
    failback

35
Test your plan
  • Switch over to your standby environment and run
    for a day or more
  • You dont want to cause an extended outage
    testing your plan
  • You will only find issues if you run at full load
  • Do this at least once a year
  • Follow your document and correct mistakes as you
    go

36
Keep documents and support files up-to-date
  • Keep your failover and failback documents
    up-to-date
  • Keep contact lists up-to-date
  • Keep all individual process documents up-to-date
  • Keep copies of your support files
  • Scripts
  • Application (.pf, .ini, .properties, )
  • Good password management
  • Keep everything accessible (online and hard
    copies)

37
Points to Remember
  • Build redundancy into all aspects of your
    operation
  • Look at the likelihood of a failure and its
    impact to the customer
  • Protect your entire application environment both
    hardware and software
  • Build a total solution but think about the
    cost/benefit of each component
  • Automate tasks to eliminate human error
  • Test your failover plan at least once a year

38
Questions?
Adam Backman adam_at_wss.com
39
Thank you for your time!
Write a Comment
User Comments (0)
About PowerShow.com