Title: TEACHING THE DATA WAREHOUSE COURSE
1TEACHING THE DATA WAREHOUSE COURSE
2Outline
- Overview of what is Data Warehousing
- The 5 and 10 week courses
- The Indiana University of Pennsylvania course
3ORIGINS
- Data warehouses are the results of two software
solutions needing and finding one another - Data base firms developed data warehouses and
were looking for applications - EIS and DSS software developers and vendors
needed to deal with ever-increasing data bases - About 10 years ago, the two groups started
interacting with the results described here.
4ORIGINS
- Database developers long understood that their
software was required for both transactional and
analytic processing - However, their principal developments were
directed to ever-larger transactional data bases.
This process occurred even through operational
and analytic data are separate with different
requirements and different user communities.
5ORIGINS
- Once these differences were understood, new data
bases were created specifically for analysis use. - Today, data warehouses have 3 major applications
- On-line analytic processing for business
intelligence - Data Mining
- Customer Relationship Management
6WHAT IS DATA WAREHOUSING
- A data warehouse is typically a dedicated data
base system for decision making that is separate
from the production data base(s) used
operationally. It differs from production system
in that - it covers a much longer time horizon than
transaction systems - it includes multiple data bases that have been
processed so that the warehouses data are
defined uniformly (i.e., clean data) - it is optimized for answering complex queries
from managers and analysts.
7WHAT IS DATA WAREHOUSING?
- In the last 5 years, data warehousing has become
a major industry within computing which has
brought together the ideas of data bases and
decision support. It has also been the foundation
for efforts in data mining and in CRM - Data mining refers to finding answers about an
organization from the information in the data
warehouse that the executive or the analyst had
not thought to ask. Data mining is made possible
by the very presence of large databases in the
data warehouse. It provides techniques that allow
managers to obtain managerial information from
their legacy systems. Its objective is to
identify valid, novel, potentially useful, and
understandable patterns in data.
8WHAT IS DATA WAREHOUSING?
- The objective of a data warehouse is to create a
single truth - Data warehousing is a major new application area.
It rates extremely high salaries (up to 100,000
for specialists, 300,000 for consultants).
9DEFINITION
- A data warehouse is a
- Subject oriented
- Integrated
- Time-variant
- Non-volatile
- Collection of data in support of management
decision processes
10NOTE
- Data warehouse is physically separated from
operational systems and operational data bases - Data warehouses hold both aggregated and detailed
data for management separate from the databases
used for On-Line Transaction Processing (OLTP)
11CHARACTERISTICS
12CHARACTERISTICS
13SUBJECT ORIENTATION
- Data is organized around major subjects of the
enterprise - Example
- OPERATIONAL DATA WAREHOUSE
- Loans Customer
- Savings Vendor
- Bank card Product
- Trust Activity
- An application A subject
- orientation orientation
14USING THE WAREHOUSE
- The higher the level of summarization, the more
the data is used - The more summarized the data, the quicker it is
to retrieve - However, the higher the level of summarization,
the lower the level of detail
15DATA MARTS
- Data Mart A scaled-down version of the data
warehouse - A data mart is a small warehouse designed for the
SBU or department level. - It is often a way to gain entry and provide an
opportunity to learn - Major problem if they differ from department to
department, they can be difficult to integrate
enterprise-wide
16COST
- Data warehouses are not cheap
- Median cost to create (does not include operating
cost) 2.2M - Multimillion dollar costs are common
- Their design and implementation is still an art
and they require considerable time to create
17SIZE
- Being designed for the enterprise so that
everyone has a common data set, they are large
and increase in size with time. - Typical storage sizes run from 50 Gigabytes to
several Terabytes
18SIZE OF INDUSTRY
- Data warehouses are a major industry within
information systems. 6B/year - Estimates vary but it is clear that many more
than 90 of Fortune 1000 have data warehouse
projects - Major players include
- Oracle IBMInformix Sybase NCR
- BI companies (Brio, Cognos, Pilot,.)
19MARKET COMPONENTS
- High end business intelligence (OLAP)
- Low end query tools
- Data cleansing
- Data marts
- Data mining
- Customer relationship management (CRM)
20APPLICATION OLAP/BI
- OLAP On Line analytic processing
- Basic idea of OLAP managers should be able to
manipulate enterprise data models across many
dimensions to understand changes that are
occurring - Vendors claim they are OLAP compliant even if
they are not
21APPLICATION - DATA MINING
- Also known as Knowledge Data Discovery (KDD)
- Mining terminology refers to finding answers
about a business from the data warehouse that the
executive or analyst had not thought to ask
22APPLICATION - DATA MINING
- KDD applies techniques mostly from artificial
intelligence and statistics to discover new
information. - It is designed to find information that queries
and reports dont reveal effectively - KDD uses AI and statistics to find pattern in
data and to infer rules.
23APPLICATION - DATA MINING
- Some successes
- People who buy scuba gear take Australian
vacations - Fraud detection, consumer loan analysis
- Optimizing production lines
- IBMs SCOUT
- (apocryphal?) Men who buy diapers buy beer
24APPLICATION - DATA MINING
25CUSTOMER RELATIONSHIP MANAGEMENT (CRM)
- Successor to data base marketing
- Implies marketing to customers on a 11 basis
- Requires data granularity at the level of the
individual customer large amounts of data - Data warehouse is only a part of the CRM concept.
26History of Course I
- First given as PhD Seminar on Management of
Information Systems 1996 - Overview lecture by instructor
- Students present topics each week typically 2
or 3 - Great source of material!
27History of Course II
- Attend Data Warehouse Institute conference in San
Diego. Agree to write book with Watson. January
1997 - Complete book in late 1997. Desk-top published.
Used sabbatical at UCI to do the writing. - Move course to regular MS course in Spring 1997.
-
28Two Versions
- 14 week for Claremont Graduate University
- 5 Week for UC Irvine
- Taught 1/year for last 5 years
- WHY DO STUDENTS TAKE COURSE?
- High salaries (100K specialist, 300K for
consultant)
29Both Versions
- Text Decision Support in the Data Warehouse by
P. Gray and H.J. Watson - Prentice Hall 1998
- PLUS readings
- PLUS hands-on exercises
- PLUS term paper
305 WEEK VERSION
- Wk Part 1 Part 2
- Overview of DW OLAP, applications
- Demo Startracker Strategic use,
- software Framework
- Data sources, Planning and
cleansing,metadata operating the DW - Data marts,ODS Industry, economics
- Mining, BI CRM, Developments
31(No Transcript)
32Using SoftwareStar Tracker
- Simple DW
- The Data Warehouse Toolkit Practical Techniques
for Building Dimensional Data Warehouses by Ralph
Kimball (Wiley 1996) - Commercial version Synchrony from If no
longer available
33Using Software Commercial
- Red Brick (now part of Informix which is now
owned by IBM) - Gift from Red Brick
- Mostly used with student projects
34Data Sources
- Journal of Data Warehousing
- DM Review
- On-line (dmreview.com)
- Hard copy
- Lots and lots on Internet e.g.,
- www.Dw-institute.com
- Pwp.starnetinc.com/larryg
- www.datawarehouse.org
35Assignments
- Find 10 articles on Internet
- teaches search techniques, nomenclature, rapid
course immersion - Exercise with software
- The DW industry
36Term Projects for DW
- Next chart shows 22 of the 30 suggested topics.
Last topic allows student to choose topic subject
to instructor approval.
37DW Architecture Data sources, loading, cleaning,
summarizing, granularity Metadata in a DW Data
Modeling and the DW Multiple Dimensions in DWs
via ROLAP and MOLAP Indexing in DWs DW Interfaces
conventional, window, browser Design principles
for DWs Strategic use of the DW Justification of
the DW including C/B analysis DW Security Data
Marts Operational Data Stores Maintaining the
DW Economics of DWs Querying and reporting in a
DW OLAP and the DW Decision Support Systems and
the DW Database marketing and the DW Data Mining
and the DW Personnel considerations for building
the DW and for maintaining the DW Organizing the
DW Project (Project management etc.)
38Elizabeth Pierces Course at Indiana University
of Penn.
- Paper published in Communications of AIS (CAIS)
Sept. 1999 (Vol. 2 Article 16) - Developing and Delivering a Data Warehousing and
Mining Course - Introduces students to the strategies,
technologies, and techniques
39Elizabeth Pierces Course at Indiana University
of Penn.
- Students learn what is involved in planning,
designing, building, using, and managing a data
warehouse. Students also learn about how a data
warehouse must fit into an over-all corporate
data architecture that may include legacy
systems, operational data stores, enterprise data
warehouses, and data marts. In addition, students
are exposed to the different data mining
techniques used by organizations to derive
information from the data warehouse for strategic
and long-term business decision making.
(abstract)
40U. Of Indiana Course
- 14 weeks 7 weeks on DW
- Challenges
- Evolving nature of the subject
- Hands on experience for students
- Textbook selectionmostly tradebooks
41Useful Tables in Pierce
- List of Internet sources
- Available software demos
- Some on-line data warehouses with public access
- Schools that offered data mining courses in 1999
42Conclusions
- Students come because it is a job source
- A good 2nd course for data base (can be taken w/o
database course) - Allows covering the new topics of the 90s and
00sOLAP,CRM,Mining, in more depth than Intro to
IS course. - Its a fun topic and you can even do research on
it.