Title: Data Wharehousing OLAP Data Mining
1Data Wharehousing OLAP Data Mining
- S. Costantini
- Università degli Studi di LAquila
- stefcost_at_di.univaq.it
2Ringraziamenti (Acknowledgment)
- Part of this material is taken from Database
Systems The Complete Book, by Hector
Garcia-Molina, Jeff Ullman, and Jennifer Widom,
edited by Prentice-Hall. - URL http//www-db.stanford.edu/ullman/dscb.html
3Cosè in sostanza un Data Wharehouse?
- E una vista materializzata
- Aggiornata a intervalli stabiliti (a seconda
dellapplicazione) - E un cosiddetto sistema di integrazione di
dati perché può contenere dati provenienti da
vari database (detti sorgenti)
4Perché i Data Warehouse?
- Perché le query di analisi statistica ed esame
dei dati per estrarne varie informazioni (dette
query OLAP, vedi seguito) sono pesanti e
diminuiscono troppo la performance del sistema.
Però non necessitano della versione più
aggiornata dei dati.
5Perché i Data Wharehouse
- Allora conviene separare le query usuali dalle
query OLAP, creando per queste ultime un Data
Wharehouse - Per le query OLAP il modello relazionale non è
ottimale, quindi nel creare un Data Wharehouse il
modello dei dati viene modificato.
6Observation
- Traditional database systems are tuned to many,
small, simple queries. - Some new applications use fewer, more
time-consuming, complex queries. - New architectures have been developed to handle
complex analytic queries efficiently.
7The Data Warehouse
- The most common form of data integration.
- Copy sources into a single DB (warehouse) and try
to keep it up-to-date. - Usual method periodic reconstruction of the
warehouse, perhaps overnight. - Frequently essential for analytic queries.
8OLTP
- Most database operations involve On-Line
Transaction Processing (OTLP). - Short, simple, frequent queries and/or
modifications, each involving a small number of
tuples. - Examples Answering queries from a Web interface,
sales at cash registers, selling airline tickets.
9OLAP
- Of increasing importance are On-Line Application
Processing (OLAP) queries. - Few, but complex queries --- may run for hours.
- Queries do not depend on having an absolutely
up-to-date database.
10OLAP Examples
- Amazon analyzes purchases by its customers to
come up with an individual screen with products
of likely interest to the customer. - Analysts at Wal-Mart look for items with
increasing sales in some region.
11Data Warehouses
- Doing OLTP and OLAP in the same database system
is often impractical - Different performance requirements
- Analysis queries require data from many sources
- Solution Build a data warehouse
- Copy data from various OLTP systems
- Optimize data organization, system tuning for
OLAP - Transactions arent slowed by big analysis
queries - Periodically refresh the data in the warehouse
12Common Architecture
- Relational Databases handle OLTP.
- Local databases copied to a central warehouse
overnight. - Analysts use the warehouse for OLAP.
13Definition of data warehousing
A data warehouse is a subject-oriented,
integrated, time-variant and non-volatile
collection of data in support of managements
decision making process.
14Loading the Data Warehouse
Data is periodically extracted
Data is cleansed and transformed
Data Staging Area
Users query the data warehouse
Source Systems
(OLTP)
Data Warehouse
15Data Mining
- Data mining is a popular term for queries that
summarize big data sets in useful ways. - Examples
- Clustering all Web pages by topic.
- Finding characteristics of fraudulent credit-card
use.
16Data Warehouse
Enterprise Database
Customers
Orders
Transactions
Vendors
Etc
Etc
- Data Miners
- Farmers they know
- Explorers - unpredictable
Copied, organized summarized
Data Warehouse
Data Mining
17Market-Basket Data
- An important form of mining from relational data
involves market baskets sets of items that
are purchased together as a customer leaves a
store. - Summary of basket data is frequent itemsets
sets of items that often appear together in
baskets.
18Data Mining Flavors
- Directed Attempts to explain or categorize some
particular target field such as income or
response. - Undirected Attempts to find patterns or
similarities among groups of records without the
use of a particular target field or collection of
predefined classes.
19Data Mining Examples in Enterprises
- Government
- Track down criminals (Police also)
- Treasury Dept suspicious intl funds transfer
- Phone companies
- Supermarkets Superstores
- Mail-Order, On-Line Order
20Data Mining Examples in Enterprises
- Financial Institutions
- Insurance Companies
- Web sites
- Many others