Title: ISM 4300
1ISM 4300
- Computer Architectures
- Data Warehouses
2Mainframe Architecture
- With mainframe software architectures all
intelligence is within the central host computer.
- Users interact with the host through a terminal.
- Mainframe software architectures are not tied to
a hardware platform. User interaction can be done
using PCs and UNIX workstations.
3Mainframe Architecture (cont)
- A limitation of mainframe software architectures
is that they do not easily support graphical user
interfaces or access to multiple databases from
geographically dispersed sites. - Can be used as a server in distributed
client/server architectures.
4File Sharing Architecture
- Original PC networks were based on file sharing
architectures, where the server downloads files
from the shared location to the desktop
environment. The requested user job is then run
(including logic and data) in the desktop
environment. File sharing architectures work if
shared usage is low, update contention is low,
and the volume of data to be transferred is low.
5File Sharing Architecture (cont)
- In the 1990s, PC LAN computing changed because
the capacity of the file sharing was strained as
the number of online user grew (it can only
satisfy about 12 users simultaneously) and
graphical user interfaces (GUIs) became popular
(making mainframe and terminal displays appear
out of date).
6Client/Server Architecture
- C/S introduced a database server to replace the
file server. Using a relational database
management system, user queries are answered
directly. - Reduces network traffic by providing a query
response rather than total file transfer. It
improves multi-user updating through a GUI front
end to a shared database. - Remote Procedure Calls (RPCs) or standard query
language (SQL) statements are typically used to
communicate between the client and server
7Two-Tier Client Server
- User system interface is usually located in the
user's desktop environment and the database
management services are usually in a server that
is a more powerful machine that services many
clients. - Processing management is split between the user
system interface environment and the database
management server environment. The database
management server provides stored procedures and
triggers.
8Three-Tier Client Server
- There are a variety of ways of implementing this
middle tier, such as transaction processing
monitors, message servers, or application
servers. - The middle tier can perform queuing, application
execution, and database staging.
9Three-Tier Client Server (cont)
- For example, if the middle tier provides queuing,
the client can deliver its request to the middle
layer and disengage because the middle tier will
access the data and return the answer to the
client. - Adds scheduling and prioritization for work in
progress. - The three tier client/server architecture
improves performance for large groups (in the
thousands).
10Client/Server
- A Client Server System is more structured than
general distributed computing. A client is
defined as a requester of services and a server
is defined as the provider of services. A single
machine can be both a client and a server
depending on the software configuration.
11Client/Server (cont)
- A client sends request to servers to execute
tasks - The tasks may be just to provide information, or
to perform a complex computation (perhaps
returning information) - Client and servers are asymmetric
- A server may be a client of another server
12Client Network - Server
13C/S Desired Properties
- Interoperability
- Portability
- Integration
- Transparency
- Security
- Scalability
- Flexibility
14Interoperability
- The ability of two or more systems or components
to exchange information and to use the
information that has been exchanged - Allows different systems to exchange meaningful
information - Requires standard exchange formats
- Requires standard message formats
15Portability
- A system in one environment can be installed in
another - Can be within the same hardware environment
- Can be within the same Operating System
environment - Can be within the same network environment
- Can be within the same database environment
16Transparency
- The user can read data from a site without
knowing where it is - The user can update data without knowing whether
it is duplicated or not - Tasks can be executed at various sites without
the user knowing where they are - Failures are dealt with
17Flexibility and Scalability
- Flexibility
- the ease with which a system or component can be
modified for use in applications or environments
other than those for which it was specifically
designed - Scalability
- the ease with which a system or component can be
modified to fit the problem area.
18Points of Failure on C/S
- The client side of the application could crash
- The client system may have h/w problems
- The client's network card could fail
- Network contention could cause timeouts
- There may be network address conflicts
- Network elements such as routers could fail
- Transmission errors may lose messages
19Points of Failure (cont)
- The client and server versions may be
incompatable - The server's network card could fail
- The server system may have h/w problems
- The server s/w may crash
- The server's database may become corrupted
20Disadvantages of C/S
- Harder to build
- Less stable
- Susceptible to network load
- Lacking in specialists
- Difficult to debug
- Difficult to test
- Can be more costly than mainframe
21Distributed Computing
- A type of computing in which different components
and objects comprising an application can be
located on different computers connected to a
network. - In some distributed computing systems, each of
the three computers could even be running a
different operating systems.
22 Distributed Computing (cont)
- A set of computers connected in some way (serial
lines, ethernet, ATM, etc) - Each computer is able to communicate with some of
the others - Programs running on each computer are able to
share information and request tasks to be
executed
23 Questions
- Is distributed computing preferred over the
mainframe architecture? - What are the business reasons that support (a)
mainframe computing? (b) distributed computing?
24Distributed Computing (cont)
- One of the requirements of distributed computing
is a set of standards that specify how objects
communicate with one another. - Two chief distributed computing standards CORBA
and DCOM.
25 Data Warehouse
- DATA WAREHOUSE Organizations electronic library
stores consolidated current historic data for
management reporting analysis - DATA MART Small data warehouse for special
function, e.G., Focused marketing based on
customer info
26What is a Data Warehouse?
- "A warehouse is a subject-oriented, integrated,
time-variant and non-volatile collection of data
in support of management's decision making
process". - Bill Inmon (1990)
- "A Data Warehouse is a repository of integrated
information, available for queries and analysis.
Data and information are extracted from
heterogeneous sources as they are generated. - Anonymous
27COMPONENTS OF DATA WAREHOUSE
28 Data Mining
- ON-LINE ANALYTICAL PROCESSING (OLAP) ability to
manipulate, analyze large volumes of data from
multiple perspectives - MINING Seeking relationships that are not known
in advance. A function of the software and data
organization.
29 DW Characteristics
- Subject OrientedData that gives information
about a particular subject instead of about a
company's ongoing operations. - Integrated Data that is gathered into the data
warehouse from a variety of sources and merged
into a coherent whole. - Time Variant All data in the data warehouse is
identified with a particular time period.
30Data Acquisition
- The process of moving company data from the
source systems into the warehouse. - Often the most time-consuming and costly effort.
- Performed with software products known as ETL
(Extract/Transform/Load) tools. - Over 50 ETL tools on market.
31Data Cleansing
- Typically performed in conjunction with data
acquisition. - A complicated process that validates and, if
necessary, corrects the data before it is
inserted. - AKA "data scrubbing" or "data quality assurance".
32Business Intelligences Four Areas
- Multi-dimensional Analysis Tools
- look at data from different angles
- Query Tools
- SQL and other may be 4GL
- Mining Tools
- OLAP and statistical tools
- Data Visualization Tools
- show graphical representation of data
33 DW Problems
- Extracting, cleaning, loading are time-consuming
and costly - Difficult to contain scope users want more
- Mistakes will be found in feeder systems
- DW need data not available in any other corporate
systems - Training may not be applied
- Maintenance will be high
- Security of DW is greater than for TPS
34Application Service Providers
- ASP - a service provider whose specialization is
the implementation and ongoing operations
management of one or more networked applications
on behalf of its customer. One key attribute
beginning to rapidly evolve is the emphasis on
Web-based e-business application management as an
important differentiator from the more
traditional outsourced client-server application
management services.
35Why ASPs?
- ASPs own the application and operate the server
- Customers rent the application on a per use or
monthly basis - An advantage is low cost of entry and short setup
time - May be less expensive except for highly used
applications - Eliminates need for specialized IT infrastructure
- Can shift Internet bandwidth to the ASP who can
leverage cost over many customers
36Mid-Tier Heavy Users of ASP
- The main beneficiary of ASPs is the mid-tier
market. For small businesses, outsourcing
business application software and computing
services to an ASP is cost-prohibitive. The
business applications in this tier of the market
are few and simple. The small business can make
do with a small LAN and a PC-based desktop and
server environment, using the Web for WAN
communication and perhaps outsourcing home page
and e-mail operations to an ISP.
37Large Enterprises Outsource
- The majority of large enterprises already have
outsourced or are contemplating outsourcing
internal IT operations to a third party and their
Web applications to a content-host-based ISP.
Economics drives this market, but application
control, security and legacy data force the
high-end tier to adopt a dedicated operations
environment rather than a shared multicompany
environment.
38ASP Infrastructure
- The midtier market is the home territory of the
IBM AS/400 and S/390, as well as Sun,
Hewlett-Packard and Compaq/ DEC machines. These
will be the computing platforms of the ASP and
the LAN-based servers will be downgraded to
office application devices. Access to the ASP
will be via the Web for extranet applications and
a virtual private network (VPN) for Intranet
applications.
39Managing ASPs
- Managing and monitoring the ASP's performance
will replace operations and applications
development. The large enterprise will flourish,
with demands for increased bandwidth, reduced
response time and new Web-based
transaction/database applications driving the
upgrade of the desktop, LAN/WAN and servers.
40(No Transcript)
41(No Transcript)
42Infrastructure Is tres Important
43 End of Data Warehouse