Introduction to Data Warehousing - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Introduction to Data Warehousing

Description:

Owens&Minor -- data warehousing has supported integration along the supply chain. ... the nation's leading distributor of name-brand medical and surgical supplies ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 39
Provided by: ter7
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Data Warehousing


1
Introduction to Data Warehousing
  • Hugh J. Watson
  • Terry College of Business
  • University of Georgia

2
The Importance of Data Warehousing
  • Provide a single version of the truth
  • Improve decision making
  • Support key corporate initiatives such as
    performance management, B2C and B2B e-commerce,
    and customer relationship management
  • Estimated to be a 113.5 billion market in 2002
    for systems, software, services, and in-house
    expenditures (Palo Alto Management Group)

3
(No Transcript)
4
Data Warehouse Characteristics
  • Subject oriented -- data are organized around
    sales, products, etc.
  • Integrated -- data are integrated to provide a
    comprehensive view
  • Time variant -- historical data are maintained
  • Nonvolatile -- data are not updated by users

5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
Two Data Warehousing Strategies
  • Enterprise-wide warehouse, top down, the Inmon
    methodology
  • Data mart, bottom up, the Kimball methodology
  • When properly executed, both result in an
    enterprise-wide data warehouse, but with
    different architectures

10
The Data Mart Strategy
  • The most common approach
  • Begins with a single mart and architected marts
    are added over time for more subject areas
  • Relatively inexpensive and easy to implement
  • Can be used as a proof of concept for data
    warehousing
  • Can perpetuate the silos of information problem
  • Can postpone difficult decisions and activities
  • Requires an overall integration plan

11
The Enterprise-wide Strategy
  • A comprehensive warehouse is built initially
  • An initial dependent data mart is built using a
    subset of the data in the warehouse
  • Additional data marts are built using subsets of
    the data in the warehouse
  • Like all complex projects, it is expensive, time
    consuming, and prone to failure
  • When successful, it results in an integrated,
    scalable warehouse

12
Data Sources and Types
  • Primarily from legacy, operational systems
  • Almost exclusively numerical data at the present
    time
  • External data may be included, often purchased
    from third-party sources
  • Technology exists for storing unstructured data
    and expect this to become more important over time

13
Extraction, Transformation, and Loading (ETL)
Processes
  • The plumbing work of data warehousing
  • Data are moved from source to target data bases
  • A very costly, time consuming part of data
    warehousing

14
Data Extraction
  • Often performed by COBOL routines (not
    recommended because of high program maintenance
    and no automatically generated meta data)
  • Sometimes source data is copied to the target
    database using the replication capabilities of
    standard RDMS (not recommended because of dirty
    data in the source systems)
  • Increasing performed by specialized ETL software

15
Sample ETL Tools
  • DataStage from Ascential Software
  • SAS System from SAS Institute
  • Power Mart/Power Center from Informatica
  • Sagent Solution from Sagent Software
  • Hummingbird Genio Suite from Hummingbird
    Communications

16
Reasons for Dirty Data
  • Dummy Values
  • Absence of Data
  • Multipurpose Fields
  • Cryptic Data
  • Contradicting Data
  • Inappropriate Use of Address Lines
  • Violation of Business Rules
  • Reused Primary Keys,
  • Non-Unique Identifiers
  • Data Integration Problems

17
Data Cleansing
  • Source systems contain dirty data that must be
    cleansed
  • ETL software contains rudimentary data cleansing
    capabilities
  • Specialized data cleansing software is often
    used. Important for performing name and address
    correction and householding functions
  • Leading data cleansing vendors include Vality
    (Integrity), Harte-Hanks (Trillium), and
    Firstlogic (i.d.Centric)

18
Data Staging
  • Often used as an interim step between data
    extraction and later steps
  • Accumulates data from asynchronous sources using
    native interfaces, flat files, FTP sessions, or
    other processes
  • At a predefined cutoff time, data in the staging
    file is transformed and loaded to the warehouse
  • There is usually no end user access to the
    staging file
  • An operational data store may be used for data
    staging

19
Data Transformation
  • Transforms the data in accordance with the
    business rules and standards that have been
    established
  • Example include format changes, deduplication,
    splitting up fields, replacement of codes,
    derived values, and aggregates

20
Data Loading
  • Data are physically moved to the data warehouse
  • The loading takes place within a load window
  • The trend is to near real time updates of the
    data warehouse as the warehouse is increasingly
    used for operational applications

21
Meta Data
  • Data about data
  • Needed by both information technology personnel
    and users
  • IT personnel need to know data sources and
    targets database, table and column names
    refresh schedules data usage measures etc.
  • Users need to know entity/attribute definitions
    reports/query tools available report
    distribution information help desk contact
    information, etc.

22
Database Vendors
  • High end (i.e., terabyte plus) vendors include
    IBM (DB2) and NCR-Teradata (Teradata)
  • Oracle (8i) and Microsoft (SQL Server 7) are
    major players for smaller databases

23
On-line Analytical Processing (OLAP)
  • A set of functionality that facilitates
    multidimensional analysis
  • Allows users to analyze data in ways that are
    natural to them
  • Comes in many varieties -- ROLAP, MOLAP, DOLAP,
    etc.

24
ROLAP
  • Relational OLAP
  • Uses a RDBMS to implement and OLAP environment
  • Typically involves a star schema to provide the
    multidimensional capabilities
  • OLAP tool manipulates RDBMS star schema data
  • Called slowlap by MOLAP vendors

25
MOLAP
  • Multidimensional OLAP
  • Uses a MDDBS (e.g., Essbase) to store and access
    data
  • Usually requires proprietary (non SQL) data
    access tools
  • Provides exceptionally fast response times

26
Star Schema
  • Creates non-normalized data structures
  • Easier for users to understand
  • Optimized for OLAP
  • Uses fact (facts or measures in the business) and
    dimension (establishes the context of the facts)
    tables

27
OLAP Tools
  • Products come from vendors such as Brio, Cognos,
    Hyperion, and BusinessObjects
  • Typically available as a fat or thin (i.e.,
    browser) client
  • In a web environment, the browser communicates
    with a web server, which talks to an application
    server, which connects to backend databases
  • The application server provides query, reporting,
    and OLAP analysis functionality over the web
  • Java applets or downloaded components augment the
    thin client
  • A broadcast server may be used to schedule, run,
    publish, and broadcast reports, alerts, and
    responses over the LAN, email, or personal
    digital assistant.

28
(No Transcript)
29
Dimension Table Examples
  • Retail -- store name, zip code, product name,
    product category, day of week
  • Telecommunications -- call origin, call
    destination
  • Banking -- customer name, account number, branch,
    account officer
  • Insurance -- policy type, insured party

30
Fact Table Examples
  • Retail -- number of units sold, sales amount
  • Telecommunications -- length of call in minutes,
    average number of calls
  • Banking -- average monthly balance
  • Insurance -- claims amount

31
Warehouse Users
  • Analysts
  • Managers
  • Executives
  • Operational personnel
  • Customers and suppliers

32
Warehouse Tools and Applications
  • SQL queries
  • Managed query environments
  • Structured and ad hoc reports
  • DSS/EIS
  • Portals
  • Data mining
  • Packaged applications
  • Custom-built applications

33
Owens Minor
  • OwensMinor -- data warehousing has supported
    integration along the supply chain. Winner of
    the 1999 TDWI Leadership Award
  • the nation's leading distributor of name-brand
    medical and surgical supplies
  • has transformed its business model by integrating
    supply chain management, e-business, data
    warehousing, and Internet technologies
  • as part of this initiative, WISDOM
    (WebIntelligence Supporting Decisions from Owens
    Minor) has been especially valuable

34
(No Transcript)
35
WISDOM
  • a Web-based decision support system that provides
    information to OMs employees, suppliers and
    customers
  • accesses data from a data warehouse that
    maintains supplier and customer transaction data
  • sold to trading partners as a value added product
  • WISDOM II provides data about the transactions
    that suppliers and customers have with all of
    their trading partners

36
Sample Applications
  • Supports reporting and queries for internal
    personnel
  • Supports an EIS for senior management
  • Suppliers can determine their market share in
    specific hospitals
  • Hospitals can identify which products are being
    bought off contract
  • WISDOM II extends data warehousing to trading
    partners through an outsourcing arrangement

37
Questions
38
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com