TEACHING THE DATA WAREHOUSE COURSE - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

TEACHING THE DATA WAREHOUSE COURSE

Description:

Data warehouses are not cheap. Median cost to create (does not ... People who buy scuba gear take Australian vacations. Fraud detection, consumer loan analysis ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 43
Provided by: paul97
Category:

less

Transcript and Presenter's Notes

Title: TEACHING THE DATA WAREHOUSE COURSE


1
TEACHING THE DATA WAREHOUSE COURSE
  • Paul Gray
  • ISECON 2001

2
Outline
  • Overview of what is Data Warehousing
  • The 5 and 10 week courses
  • The Indiana University of Pennsylvania course

3
ORIGINS
  • Data warehouses are the results of two software
    solutions needing and finding one another
  • Data base firms developed data warehouses and
    were looking for applications
  • EIS and DSS software developers and vendors
    needed to deal with ever-increasing data bases
  • About 10 years ago, the two groups started
    interacting with the results described here.

4
ORIGINS
  • Database developers long understood that their
    software was required for both transactional and
    analytic processing
  • However, their principal developments were
    directed to ever-larger transactional data bases.
    This process occurred even through operational
    and analytic data are separate with different
    requirements and different user communities.

5
ORIGINS
  • Once these differences were understood, new data
    bases were created specifically for analysis use.
  • Today, data warehouses have 3 major applications
  • On-line analytic processing for business
    intelligence
  • Data Mining
  • Customer Relationship Management

6
WHAT IS DATA WAREHOUSING
  • A data warehouse is typically a dedicated data
    base system for decision making that is separate
    from the production data base(s) used
    operationally. It differs from production system
    in that
  • it covers a much longer time horizon than
    transaction systems
  • it includes multiple data bases that have been
    processed so that the warehouses data are
    defined uniformly (i.e., clean data)
  • it is optimized for answering complex queries
    from managers and analysts.

7
WHAT IS DATA WAREHOUSING?
  • In the last 5 years, data warehousing has become
    a major industry within computing which has
    brought together the ideas of data bases and
    decision support. It has also been the foundation
    for efforts in data mining and in CRM
  • Data mining refers to finding answers about an
    organization from the information in the data
    warehouse that the executive or the analyst had
    not thought to ask. Data mining is made possible
    by the very presence of large databases in the
    data warehouse. It provides techniques that allow
    managers to obtain managerial information from
    their legacy systems. Its objective is to
    identify valid, novel, potentially useful, and
    understandable patterns in data.

8
WHAT IS DATA WAREHOUSING?
  • The objective of a data warehouse is to create a
    single truth
  • Data warehousing is a major new application area.
    It rates extremely high salaries (up to 100,000
    for specialists, 300,000 for consultants).

9
DEFINITION
  • A data warehouse is a
  • Subject oriented
  • Integrated
  • Time-variant
  • Non-volatile
  • Collection of data in support of management
    decision processes

10
NOTE
  • Data warehouse is physically separated from
    operational systems and operational data bases
  • Data warehouses hold both aggregated and detailed
    data for management separate from the databases
    used for On-Line Transaction Processing (OLTP)

11
CHARACTERISTICS
12
CHARACTERISTICS
13
SUBJECT ORIENTATION
  • Data is organized around major subjects of the
    enterprise
  • Example
  • OPERATIONAL DATA WAREHOUSE
  • Loans Customer
  • Savings Vendor
  • Bank card Product
  • Trust Activity
  • An application A subject
  • orientation orientation

14
USING THE WAREHOUSE
  • The higher the level of summarization, the more
    the data is used
  • The more summarized the data, the quicker it is
    to retrieve
  • However, the higher the level of summarization,
    the lower the level of detail

15
DATA MARTS
  • Data Mart A scaled-down version of the data
    warehouse
  • A data mart is a small warehouse designed for the
    SBU or department level.
  • It is often a way to gain entry and provide an
    opportunity to learn
  • Major problem if they differ from department to
    department, they can be difficult to integrate
    enterprise-wide

16
COST
  • Data warehouses are not cheap
  • Median cost to create (does not include operating
    cost) 2.2M
  • Multimillion dollar costs are common
  • Their design and implementation is still an art
    and they require considerable time to create

17
SIZE
  • Being designed for the enterprise so that
    everyone has a common data set, they are large
    and increase in size with time.
  • Typical storage sizes run from 50 Gigabytes to
    several Terabytes

18
SIZE OF INDUSTRY
  • Data warehouses are a major industry within
    information systems. 6B/year
  • Estimates vary but it is clear that many more
    than 90 of Fortune 1000 have data warehouse
    projects
  • Major players include
  • Oracle IBMInformix Sybase NCR
  • BI companies (Brio, Cognos, Pilot,.)

19
MARKET COMPONENTS
  • High end business intelligence (OLAP)
  • Low end query tools
  • Data cleansing
  • Data marts
  • Data mining
  • Customer relationship management (CRM)

20
APPLICATION OLAP/BI
  • OLAP On Line analytic processing
  • Basic idea of OLAP managers should be able to
    manipulate enterprise data models across many
    dimensions to understand changes that are
    occurring
  • Vendors claim they are OLAP compliant even if
    they are not

21
APPLICATION - DATA MINING
  • Also known as Knowledge Data Discovery (KDD)
  • Mining terminology refers to finding answers
    about a business from the data warehouse that the
    executive or analyst had not thought to ask

22
APPLICATION - DATA MINING
  • KDD applies techniques mostly from artificial
    intelligence and statistics to discover new
    information.
  • It is designed to find information that queries
    and reports dont reveal effectively
  • KDD uses AI and statistics to find pattern in
    data and to infer rules.

23
APPLICATION - DATA MINING
  • Some successes
  • People who buy scuba gear take Australian
    vacations
  • Fraud detection, consumer loan analysis
  • Optimizing production lines
  • IBMs SCOUT
  • (apocryphal?) Men who buy diapers buy beer

24
APPLICATION - DATA MINING
25
CUSTOMER RELATIONSHIP MANAGEMENT (CRM)
  • Successor to data base marketing
  • Implies marketing to customers on a 11 basis
  • Requires data granularity at the level of the
    individual customer large amounts of data
  • Data warehouse is only a part of the CRM concept.

26
History of Course I
  • First given as PhD Seminar on Management of
    Information Systems 1996
  • Overview lecture by instructor
  • Students present topics each week typically 2
    or 3
  • Great source of material!

27
History of Course II
  • Attend Data Warehouse Institute conference in San
    Diego. Agree to write book with Watson. January
    1997
  • Complete book in late 1997. Desk-top published.
    Used sabbatical at UCI to do the writing.
  • Move course to regular MS course in Spring 1997.

28
Two Versions
  • 14 week for Claremont Graduate University
  • 5 Week for UC Irvine
  • Taught 1/year for last 5 years
  • WHY DO STUDENTS TAKE COURSE?
  • High salaries (100K specialist, 300K for
    consultant)

29
Both Versions
  • Text Decision Support in the Data Warehouse by
    P. Gray and H.J. Watson
  • Prentice Hall 1998
  • PLUS readings
  • PLUS hands-on exercises
  • PLUS term paper

30
5 WEEK VERSION
  • Wk Part 1 Part 2
  • Overview of DW OLAP, applications
  • Demo Startracker Strategic use,
  • software Framework
  • Data sources, Planning and
    cleansing,metadata operating the DW
  • Data marts,ODS Industry, economics
  • Mining, BI CRM, Developments

31
(No Transcript)
32
Using SoftwareStar Tracker
  • Simple DW
  • The Data Warehouse Toolkit Practical Techniques
    for Building Dimensional Data Warehouses by Ralph
    Kimball (Wiley 1996)
  • Commercial version Synchrony from If no
    longer available

33
Using Software Commercial
  • Red Brick (now part of Informix which is now
    owned by IBM)
  • Gift from Red Brick
  • Mostly used with student projects

34
Data Sources
  • Journal of Data Warehousing
  • DM Review
  • On-line (dmreview.com)
  • Hard copy
  • Lots and lots on Internet e.g.,
  • www.Dw-institute.com
  • Pwp.starnetinc.com/larryg
  • www.datawarehouse.org

35
Assignments
  • Find 10 articles on Internet
  • teaches search techniques, nomenclature, rapid
    course immersion
  • Exercise with software
  • The DW industry

36
Term Projects for DW
  • Next chart shows 22 of the 30 suggested topics.
    Last topic allows student to choose topic subject
    to instructor approval.

37
DW Architecture Data sources, loading, cleaning,
summarizing, granularity Metadata in a DW Data
Modeling and the DW Multiple Dimensions in DWs
via ROLAP and MOLAP Indexing in DWs DW Interfaces
conventional, window, browser Design principles
for DWs Strategic use of the DW Justification of
the DW including C/B analysis DW Security Data
Marts Operational Data Stores Maintaining the
DW Economics of DWs Querying and reporting in a
DW OLAP and the DW Decision Support Systems and
the DW Database marketing and the DW Data Mining
and the DW Personnel considerations for building
the DW and for maintaining the DW Organizing the
DW Project (Project management etc.)
38
Elizabeth Pierces Course at Indiana University
of Penn.
  • Paper published in Communications of AIS (CAIS)
    Sept. 1999 (Vol. 2 Article 16)
  • Developing and Delivering a Data Warehousing and
    Mining Course
  • Introduces students to the strategies,
    technologies, and techniques

39
Elizabeth Pierces Course at Indiana University
of Penn.
  • Students learn what is involved in planning,
    designing, building, using, and managing a data
    warehouse. Students also learn about how a data
    warehouse must fit into an over-all corporate
    data architecture that may include legacy
    systems, operational data stores, enterprise data
    warehouses, and data marts. In addition, students
    are exposed to the different data mining
    techniques used by organizations to derive
    information from the data warehouse for strategic
    and long-term business decision making.
    (abstract)

40
U. Of Indiana Course
  • 14 weeks 7 weeks on DW
  • Challenges
  • Evolving nature of the subject
  • Hands on experience for students
  • Textbook selectionmostly tradebooks

41
Useful Tables in Pierce
  • List of Internet sources
  • Available software demos
  • Some on-line data warehouses with public access
  • Schools that offered data mining courses in 1999

42
Conclusions
  • Students come because it is a job source
  • A good 2nd course for data base (can be taken w/o
    database course)
  • Allows covering the new topics of the 90s and
    00sOLAP,CRM,Mining, in more depth than Intro to
    IS course.
  • Its a fun topic and you can even do research on
    it.
Write a Comment
User Comments (0)
About PowerShow.com