Title: Fault Tolerant and Resilient Web Services
1Fault Tolerant and Resilient Web Services
- By Terry B. Bobbie
- Systems Engineer, Raytheon ITSS
- Bobbie_at_usgs.gov
Raytheon Contractor for the USGS at the EROS Data
Center
2Goals of this briefing
- Examine Fault Tolerant and Resilient
- Introduce an approach to mapping your
requirements to service offerings - Foster out of the cube thinking
- Learn from open discussions
- Gather some feedback and have some fun
3So why all the hub-bub ?
- Your data and service needs may have an elevated
importance not known before - Importance of hazard and emergency response
information - Protection of life and property
- Business continuity
- Matured technology and services
- Improved and reliable services
4Have our requirements changed ?(suppliers and
consumers) Private Industry, Academia, and
Government
- Needs and requirements are diverse, unique,
varied, and may not lend themselves to
stove-piped solutions - Diverse community of users
- Uniqueness of use, data, and user elements (I.e.
sophistication, access requirements, delivery
requirements) - Weve always done it this way approach may no
longer be valid
5Fault Tolerant and Resilient
- What is Fault Tolerant and Resilient ?
No Fault Web where packets collide with each
other (injury) on the information super highway
without individual ownership of responsibility
(no individual packet liability)
Resilient Web where injured packets repair
themselves to good as new while on-the-fly
(and sue the switches for pain, suffering and
BIG )
6How about a design based on Replication of lt ? gt
aa-gtxxhost.domain
aa-gtxxhost.domain
aa-gtxxhost.domain
7Example USGS National Web
8Example technologies used 3 Servers
One Interface
- 3 Sun quad 450s
- Replicated File systems (Andrew File System)
- DNS configurations
- CISCO Distributed Director will provide
uninterrupted access to mirrored information - Load balance between available National modules
- Only available modules remain in pick list
9Benefits of Fault Tolerance and Resiliency
- Improved reliability Geographically distribute
public access to content - Improved customer serviceServe the public from
high bandwidth sites and reclaim bandwidth for
data transfer - Improved management of content
- Allow for distributed content management where
appropriate while consolidating physical location
10Benefits of Fault Tolerance and Resiliency
- Improved security
- Authentication and firewalls
- Sophisticated file access
- Kerberos authenticated editing of web pages from
any system with an AFS client desktop, laptop
or server. At the office, home or away! - Reduced System Administration Requirements
- Near 100 reliability for data and information
- Protects against
- network failures
- server failure
- natural disaster
11An approach to analyzing service opportunities
- Phased approach
- 1st phase is discovery, understanding, and
translation of Web Service Requirements - 2nd phase is discovery, understanding, and
translation of vendor market opportunities - 3rd phase is cross walking (mapping) requirements
to vendor services available
12Phase 1Analyzing requirements
- Characterize Web hosting requirements
- Examples include
- Real-Time gathering and reporting
- WWW pages
- Images
- Flat files
- Databases
- Each may differ in their characteristics relative
to - Data
- Manipulation
- Access
13Phase 1 - Web hosting requirements - Real-Time
- Real-Time
- An event or series of events that by its nature
and mission characteristic require periodic data
collection and subsequent delivery in a timely
manner. In some cases, this could be described
as on-demand whereas a master process is
executing for the collection of changing data and
a corresponding slave process is made available
for query and delivery by returning a element or
series of data, collected at a specific moment in
real time and delivered in a quick, efficient
fashion. Should another request of the data
collection be made, with all parameters equal,
one could expect delivery content to be
different. An example of this would be to sample
a digital clock. At each second, the new time is
passed to a query and delivery staging area.
This area is made available to query and when
queried, delivers its content(s) in real time (no
delay). Each second may overwrite the previous
or may be concatenated in order to construct a
series. The query process is repeated with the
parameters allowing for possible responses
ranging from the single entry of current time to
a series of collections ranging from current to
oldest or any subset inclusive.
14Phase 1 - Web hosting requirements WWW pages
- WWW pages
- Delivery of content within WWW pages may describe
textual based information, documents, or graphics
that are vital to the basis of information and
research, but can be generally referred to as
static. Each user request or hit returns the
same front page information. - Front pages of WWW servers that act as
directories or portals of information may be
static, requiring updates only as often as
listing requires change. One would say that this
page (a listing of directories) is static until a
new directory is added or deleted. The actual
content of the directory may not be hosted by the
same source as the directory, thereby possibly
not being described as static.
15Phase 1 - Web hosting requirements - Images
- Images
- Images can be large or small, compressed and
uncompressed, and of different formats. Many
images are jpeg (or other common format) and are
used for logos, pictures for hosting, graphic
representations, etc. Other images may be of
different formats. Images are pre-generated
(like a logo) while other images can be generated
dynamically by user input. - Images may be static they are generated one
time and rarely change (most often attached to
WWW pages as static graphics delivered on each
request or hit) - Some images may be dynamically created where
user input defines criteria for graphic
generation. (I.e. geo-spatial data and rendering)
16Phase 1 - Web hosting requirements - Databases
- Databases
- Many of todays WWW pages contain user selectable
parameters that may change and differ by user
subject matter or interest. Custom user input
may describe broad, open-ended, (like an infinite
number of) input parameters much like a query
based upon a key word. A good example of custom
user input would be where results are returned
based upon input parameters selected, chosen or
otherwise obtained from a very large number of
choices or selections. WWW search engines are
designed and built with the idea that user input
may not be entirely predictable (i.e. key word
search and the key word could be any word (or
combination of) used in the English language of
over one million words). - One may counter this concept with relating
infinite to having a known set of boundaries
(i.e. everything has an end limit or boundary).
In the context of this definition, we should
assume that infinite relates to a very large
order of magnitude. - USGS has many examples of this requirement today.
One example is where user selectable boundaries
are used as input criteria to deliver geo-spatial
data. The same database, populated with a known
set of data files, is queried with different
input parameters and combinations and a different
geo-spatial information is delivered for each
unique query.
17Phase 1 - Web hosting Data characteristics
- Data Characteristics
- Frequency of update requirements
- How often the data requires updating,
modification or deletion. (I.e. hourly, weekly,
monthly, dynamic) - Volume of data
- Quantity as it relates to storage requirements
- Geographic Scope and Context
- Data may be relevant to global, national,
regional, or local needs and may require service
from multiple locations
18Phase 1 - Web hosting manipulation requirements
- Manipulation Characteristics
- None (text-like)
- Data is served as a flat file without
manipulation - On-the-fly graphics generation
- Generation or rendering of graphics before
presentation to a user - Database query
- Lookup is executed based upon user input
parameters - Other special (Java based, map object rendering,
etc.)
19Phase 1 - Web hosting access requirements
- Access characteristics
- Frequency of use (hits, files served, etc.)
- How often are requests serviced in a period
- Fault tolerance limit (Low, medium, high)
- Importance of availability (L,M,H)
- Volume of units served per period
- 150 WWW page (25KB ea.) deliveries hourly
- 250 Images (500MB ea.) delivered per 24 hr day
- 500 Database queries responses per 8 hr
business day - 350 Gif-on-the-fly deliveries per 24 hr day
- Expected delivery time per request
20An approach to analyzing service opportunities
- Phased approach Phase 2
- 1st phase is discovery, understanding, and
translation of Web Service Requirements - 2nd phase is discovery, understanding, and
translation of vendor market opportunities - 3rd phase is cross walking (mapping) requirements
to vendor services available
21Phase 2 - Gain an understanding of services
available
- Web Services opportunities
- Vendor supplied
- Host site supplied
- Combinations of any or all
- Others ?
22Phase 2 - Characteristics of Service Opportunities
- Key Characteristic Descriptions
- Data
- Local storage capability / capacity
- Responsiveness to (period or cycle) changes in
source data (i.e. new www page or content,
add/delete/change image files, database content
and architecture, real-time data gathering - Change Management Strategy and Plans
(out-of-service maintenance, scheduled
maintenance, access permissions, content change,
software and platform changes, etc.) - Geographic context (local, regional, national,
global)
23Phase 2 - Characteristics of Service Opportunities
- Key Characteristic Descriptions
- Manipulation
- Local processing capability/capacity
- Scalability of end-to-end response to events
(i.e. excess capacity or headroom of resources,
networks, CPU, memory, I/O interfaces, storage,
other surge capability, etc.)
24Phase 2 - Characteristics of Service Opportunities
- Key Characteristic Descriptions
- Access
- Bandwidth capability/capacity
- Service redundancy (networks, platforms, other
infrastructure) - Responsiveness (response time) to requests for
serving data to end user - Geographic context (locations are local,
regional, national, global) - Delivery of Data guarantee
25Phase 2 - Characteristics of Service Opportunities
- Key Characteristic Descriptions
- Misc. (may apply of any or all of the categories)
- Uptime guarantee
- Security Management Strategy and Plans (system
level, content, customer identity, etc.) - Prioritized Users (i.e. can the vendor render a
schema to priority users based upon volume,
frequency, emergency response, etc.) - Operations and Service Level agreements (backup
strategies, 24x7 system monitoring, trouble
analysis and resolution, network management,
technical support to end users and customer,
contingency plans, etc.)
26An approach to analyzing service opportunities
- Phased approach Phase 3
- 1st phase is discovery, understanding, and
translation of Web Service Requirements - 2nd phase is discovery, understanding, and
translation of vendor market opportunities - 3rd phase is cross walking (mapping) requirements
to vendor services available
27Phase 3 - a crosswalk analysis of requirements
and services
- Case Study
- Requirement 1 WWW pages
- Data Requirements
- Frequency of update (monthly)
- Volume is 500 MB (stored pages, graphics, work
area) - Manipulation
- None (text based page with small static graphics)
28Phase 3 - a crosswalk analysis of requirements
and services
- Case Study Requirement 1 - Cont
- Access
- 180 pages served per hour (history 3 per min)
- Fault tolerance is high (outages are ok)
- Importance of availability is low (not required
to safeguard human life and property) - Volume is 180 x 50KB or 9000KB (9MB) per hour or
150KB/min or 2500 Bytes per sec (sustained rate) - If there is an expected delivery time of 5 sec
(delivery rate requirement 10KB/sec)
29Phase 3 - a crosswalk analysis of requirements
and services
- Case Study
- Requirement 2 Image file generation
- Data Requirements
- Frequency of update (hourly updates required)
- Volume is 300 TB (image graphics)
- Manipulation
- High (Gif-on-the-fly generation graphics)
30Phase 3 - a crosswalk analysis of requirements
and services
- Case Study Requirement 2 - Cont
- Access
- 10 files served per hour (history)
- Fault tolerance is low (very few outages)
- Importance of availability is medium (some
requirement to safeguard human life and property) - Volume is 10 x 500MB per hour or 1.389MB/sec
(sustained rate) - An expected delivery time of 1hr/file
- (delivery rate requirement)
31Requirements Matrix
32Service Opportunities Matrix
33The Magic Algorithm Matrix
Score
Overall Score Pass
34The Magic Algorithm Matrix
Score
Overall Score Fail
35Cross walk Matrix
In this case, only Vendor B meets all requirements
36Summary
- Using facts and data, characterize your
requirements - Analyze vendor service offerings and
opportunities - Map requirements to vendor services
- Perform cost analysis
- Explore other options (½ full or ½ empty)
- Expect that not all needs can be fully met by
vendors - Analyze the cost benefit and tradeoffs
37Example USGS National Web