Title: Components and Architecture
1Components and Architecture
2Architecture
- What are the key components of a data warehouse?
- Architecture is the structure that binds the
components into an integrated whole - DW architecture provides the overall framework
for developing and deploying DW solutions
3Architectural Components
4Distinguishing Characteristics
- Different objectives and scope
- Data content
- Complex analysis and quick response
- Flexible and dynamic
- Metadata driven
5Architecture Supporting Flow of Data
6Technical Architecture
- The technical architecture of a DW is the
complete set of functions and services provided
within its components - Functions
- Services
- Rules and procedures
- Data stores
- Tools are the means to implement an architecture
- Architecture comes first, then the tools select
the appropriate tools based on the architecture
7Data Acquisition (1)
- This component includes
- Extraction
- Transfer into staging area
- Preparation for loading (transformation,
cleansing, and integration)
8Data Acquisition (2)
9Data Acquisition Functions and Services (1)
- Data extraction
- Select data sources and determine the types of
filters to apply to individual sources - Generate automatic extract files from operational
systems using replication and other techniques - Create intermediary files to store selected data
to be merged later - Transport extracted files from multiple platforms
- Provide automated job control services for
creating extract files - Reformat input from outside sources, departmental
files, databases, and spreadsheets - Resolve inconsistencies for common data elements
from multiple sources - Generate common application code for data
extraction
10Data Acquisition Functions and Services (2)
- Data transformation
- Map input data to data for DW repository
- Clean data, remove duplicates, merge/purge
- De-normalize extracted data structures as
required by the dimensional model of the DW - Convert data types
- Calculate and derive attribute values
- Check for referential integrity
- Aggregate data as needed
- Resolve missing values
- Consolidate and integrate data
11Data Acquisition Functions and Services (3)
- Data staging
- Provide backup and recovery for staging area
repository - Sort and merge files
- Create files as input to make changes to
dimension tables - If staging area storage is a relational database,
create and populate database
12Data Storage
- This architectural component covers the process
of loading the prepared data from the data
staging area into the data warehouse repository
13Data Storage Functions and Services
- Load data for full refreshes of DW tables
- Perform incremental loads at regular prescribed
intervals - Support loading into multiple tables at the
detailed and summarized levels - Optimize the loading process
- Provide automated job control services for
loading the data warehouse - Provide backup and recovery for the DW database
- Provide security
- Monitor and fine-tune the database
- Periodically archive data from the database
according to preset conditions
14Information Delivery (1)
- This architectural component spans a broad
spectrum of many different methods of making
information available to the users of the DW - To the users, information delivery is the DW it
is the front-end through which the users retrieve
information from the DW - Information
- Online queries and interactive analyses
- Regular and ad-hoc reports
- Specialized applications (e.g. executive
information system) - Data mining
15Information Delivery (2)
16Information Delivery Functions and Services
- Provide security to control information access
- Monitor user access to improve service and for
future enhancements - Allow users to browse data warehouse content
- Simplify access by hiding internal complexities
of data storage from users - Automatically reformat queries for optimal
execution - Enable queries to be aware of aggregate tables
for faster results - Govern queries and control runaway queries
- Provide self-service report generation for users
- Store result sets for queries and reports for
future use - Provide multiple levels of data granularity
- Provide event triggers to monitor data loading
- Make provision for the users to perform complex
analysis - Enable data feeds to downstream, specialized data
support systems such as EIS and data mining
17Infrastructure Supporting Architecture
- The architecture defines the functions and
services the infrastructure defines the elements
to support the architecture - Infrastructure is the foundation supporting the
architecture - Hardware servers
- OSs
- Data management systems
- Networking elements
- Supporting tools and applications
- People
- Procedures
18Operational Infrastructure
- Operational infrastructure includes
- People
- Procedures
- Training
- Management software
- Operational infrastructure are the people and
procedures that keep the DW functioning, and not
those who develop the DW
19Physical Infrastructure (1)
20Physical Infrastructure (2)
- Physical infrastructure includes
- Computing hardware (e.g. server)
- OS and utilities
- Networking hardware and software
- Software tools
- Decisions about the physical infrastructure are
critical for a DW. Two principles - Leverage as much of the existing physical
infrastructure - Keep the infrastructure as modular as possible
21Hardware and Operating System
- Hardware
- Scalability
- Support
- Vendor reference
- Vendor stability
- Operating system
- Compatibility
- Scalability
- Security
- Reliability
- Availability
- Preemptive multitasking
- Multi-threaded approach
- Memory protection
22Single Platform Option
- Simplest option, where all functions and services
are performed by a single computing platform - Typically used by small to medium sized companies
who have mainframes or large Unix servers already
in use with capacity to spare - Some shortcomings of using mainframes
- Stretched to capacity
- Non availability of tools
- Multiple legacy platforms
- Companys migration policy
23Hybrid Option
- Most companies opt for the hybrid option where
multiple platforms are used for data warehousing
(data acquisition, data storage, information
delivery)
24Data Extraction
- Data extraction
- Best performed on each source systems own
computing platform - Initial reformatting and merging
- Best performed on each source systems own
computing platform - Extract files are reformatted and merged into a
smaller number of files performing verification
against the source system - Initial data cleansing
- Also performed on source system platform
- Transformation and consolidation
- Performed on the staging area platform
- Validation and final quality check
- Performed on the staging area platform
- Creation of load images
- Performed on the staging area platform
25Options for the Data Staging Area
- In one of the legacy platforms
- On the data storage platform
- On a separate optional platform
- You can optimize the platform for complex
transformations and cleaning - Install specialized tools for transformations
and cleaning - Keep track of entire data content in the staging
area
26Data Movement
27Client/Server Architecture (1)
28Client/Server Architecture (2)
- Application server (middle tier)
- To run middleware and establish connectivity
- To execute management and control software
- To handle data access from the Web
- To manage metadata
- For authentication
- As front end
- For managing and running standard reports
- For sophisticated query management
- For OLAP applications
29Maturing of the Infrastructure