Title: By: Mihir Mehta
1KM
IRM
IT
Data
Mining
By Mihir Mehta
Date 11/11/03
2Menu
Introduction Terminology Learn about Data
Warehouse Learn about Data Mining
3Introduction
- Key topic of this presentation
- Data Warehouse
- OLTP and OLAP
- Data Mining
Relation between Data Mining IRM, KM and IT
Main focus on Data Warehouse and Data Mining
4What is Data?
- The data that is an individual fact or multiple
facts, or a value, or a set of values, but is not
significant to a business in and of itself. Data
is the raw material stored in a structured manner
that, given context, turns into information.
http//www.kmtool.net/vocabulary.htm
5Data Warehouse
- a logical collection of information gathered
from many different operational databases used
to create business intelligence that supports
business analysis activities and decision-making
tasks.
Burlton www.datawarehouse.dci.com
6What can Data Warehouse Do?
- A data warehouse is an attempt to integrate
separate decision support system so that users
can query one place to find the answers to their
questions - A data warehouse has the key, corporate data in
the organization - A data warehouse tracks historical data
Burlton www.datawarehouse.dci.com
7Data Warehouse Architecture
- A Data Warehouse is a repository store for
information that will allow a company to make
business decisions based on facts. Many business
decisions are made on historical business
knowledge and intuition, which in this day and
age is not enough to stay ahead of your
competitors.
Data Extraction from Sites across the World
Business QueryReports to aid Information
Led Decision Making
Automatic Populationof Data to
Spreadsheets, World or E-mail
Decision Making Identify New Business
Opportunities
Graphical Representation of Data
Dimensional View of Data To provide Drill Down
and Analysis of Data
8Data Warehouse Knowledge Management Cycle
Explicit
Tacit
takes
create
made
New Tacit
It doesnt change tacit knowledge into explicit
rather it takes explicit knowledge and helps
create new tacit knowledge. - Burlton
Burlton www.datawarehouse.dci.com
9Data Warehouse A Success Story
- Largest data warehouse is Wal-Mart
- Wal-Mart used data warehouse
- Identifies where a new store should be built
based on customer demand - Identifies how stores are performing across the
nation - Contains every scan from every purchase
- Benefits Wal-Mart gained from their data
warehouse - Provided competitive advantage over K-Mart
- Reduced excess inventory in individual stores
- Avoided wasted funds in building stores which
would fail
www.walmart.com
10Why do we need Data Warehouse?
- Decisions can be made quickly and correctly.
- Data warehouse is also a place to store and
access historical data. - Users measure performance goals for their company
over a period of time. - Company statistics are available
- Single query can be used to access key data
- A data warehouse by itself will respond to
queries from users - It will not tell users about patterns in data
that users may not have thought about. - To find patterns in data, data mining is used to
try and mine key information from a data
warehouse. - Data warehouses provide a single place to store
key corporate data - The idea is that users can go one place to find
this key data using an enterprise information
system (EIS)
Alex Berson, Stephen J. Smith
11Enterprise Information System
- An EIS (Enterprise Information System) allows
users to query data in a data warehouse. - Its a tools predate reports and managed query
tools
http//www.iec.org/online/tutorials/bus_int
12Security in Data Warehouse
- Building a data warehouse does increase security
risk because key, corporate information is all in
one place. - Database system components can be used to protect
the data warehouse. They are - Views
- Access Control
- Security Administration
- Encryption
- Audit
Alex Berson, Stephen J. Smith
13Introduction - Data Mining
14Introduction - Data Mining
- Data Mining is done by running software that
examines a database and looks for patterns in the
data. - Data mining is a powerful analytical tool that
enables business executives to advance from
describing historical customer behavior to
predicting the future. It finds patterns that
unlocks the mysteries of customer behavior.
http//www.teradata.com
15Data Mining
- Data Mining Solves complex business problems
- increase revenue
- reduce expanses
- identify business opportunities
- Gain competitive advantages.
Alex Berson, Stephen J. Smith
16Data Mining Benefits
- Fraud detection in banking and telecommunication
- Marketing such as stock market
- Science Data analysis involving cataloging object
of interest in large data sets. (for instance
finding atmospheric events in remote sensing
data, volcanoes on Venus) - Problem diagnosis in manufacturing
Reference book in Library - Computer Information
System pg 496-499
17Data Mining Benefits
- Data mining allows companies to collect
information and make them more productive and
beat their competition. - Data mining helps identify
- why customers buy certain products
- ideas for very direct marketing
- ideas for shelf placement
- training of employees vs. employee retention
- employee benefits vs. employee retention
Reference book in Library - Computer Information
System pg 496-499
18Implementing Data Mining
- Apply data mining tools to run data mining
algorithms against data. - There are two approaches
- Copy data from the Data Warehouse and mine it
- Mine the data in the Data Warehouse
- Popular tools use a variety of different data
mining algorithms - association rules
- genetic algorithms
- decision trees
- neural networks
Alex Berson, Stephen J. Smith
19Data Mining Using Separate Data
- You can move data from the data warehouse to data
mining tools - Advantages
- Data mining tools may organize data so they can
run faster - Disadvantages
- Could be very expensive to move large amounts of
data
Copy of datamade by the Data Mining Tool
Data Warehouse
Data Mining Tool
Reference book in Library - Computer Information
System pg 496-499
20Data Mining Against the Data Warehouse
- Data mining tools can access data directly in the
Data Warehouse - Advantage
- No copy of data is needed for data mining
- Disadvantage
- Data may not be organized in a way that is
efficient for the tool
Data Warehouse
Data Mining Tool
Reference book in Library - Computer Information
System pg 496-499
21How Datas transfer to Knowledge
Data Mining
Transformation and Reduction
Graph
Preprocessing cleaning
Selection Sampling
Evaluation
InputData
Pre-processedData
Data
Target data
Database Warehouse
22How Datas transfer to Knowledge continues
- Selection selecting or segmenting the data
according to some criteria. - Preprocessing this is the data cleansing stage
where certain information is removed which is
deemed unnecessary and may slow down queries. - Transformation The data is made useable and
navigable. - Data mining this stage is concerned with the
extraction of pattern from the data. - Interpretation and Evaluation the system are
interpreted into knowledge which can then be used
to support human decision-making.
23Data Mining - Information Process
- OLTP (Online-Transaction Processing) - the
processing of transaction information. - OLAP (Online-Analytical Processing)
manipulation of information to support decision
making.
24What is OLTP
Client / Server
OLTP
Web-based
Mainframe
25OLTP and DSS Defining
- An application that updates is called an on-line
transaction processing (OLTP) application - An application that issues queries to the
read-only database is called a decision support
system (DSS)
OLTP Application
DSS Application
26OLTP vs. OLAP
- Online-Transaction Processing
- Day to Day Operations
- Application-Oriented
- Data Current, up to date detailed
- Database Size 100 MB- GB
- Online-Analytical Processing
- Decision support
- Subject-oriented
- Historical, Multidimensional summarized
- Database Size 100GB-TB
Principle of Knowledge Discovery in Database
27OLTP vs. OLAP (example)
McGrow-Hill company, Inc
28Data Mining Algorithm
Data mining algorithms consists three parts
- Model the purpose of the algorithm is to fit a
model to the data. - Preference use to fit one model over another
- Search search data
Margaret H. Dunham
29What can Data Mining Do?
Basic Data Mining Tasks
- Classification maps data into predefined groups
or classes. - Regression is used map a data item to a real
valued prediction variable. Assumes that the
target data fit into some known type of function
(eg. Linear, logistic) and determine best
function. - Time Series Analysis The value of an attribute
is examined as it varies over time. Values are
usually are obtained as evenly spaced time points
(daily, weekly, hourly) - Prediction prediction is predicting a future
state rather than a current state.
Margaret H. Dunham
30What can Data Mining Do?
Basic Data Mining Tasks
- Clustering similar to classification except that
the groups are not predefined, but rather defined
by the data alone. The most similar data are
grouped into clusters. - Summarization Maps data into subsets with
associated simple description. - Association Rules link analysis, alternatively
referred to as affinity analysis or association,
refers to the data mining task of uncovering
relationships among data. - Sequence Discovery is used to determined
sequential patterns is data.
Margaret H. Dunham
31Different Styles - Data Mining
Two styles of Data Mining
- Directed Data Mining is a top-down approach,
used when we know what we are looking for. This
often takes the form of predictive modeling,
where we know exactly what we want to predict. - Ex Classification, Estimation, and Prediction.
- Undirected Data Mining is a bottom-up approach
that lets the data speak for itself. - Ex Clustering, Summarization, Sequence
Discovery
Mastering Data Mining - Michael J. A. Berry
32Data Mining Transfer Data into IRM, KM, and IT
33How data transfer to IRM, IT, KM
34If it is still unclear about IRM, KM and IT then
here is the easier
35Overview - Business View Diagram
- A wide-range of data sources and turning it into
knowledge - can be used to make better business
decisions. - The data warehouse/data mart is a repository for
data that has been extracted from one or more
sources, cleansed and transformed into a format
suitable for analysis.
www.xwave.com/industries/telecom/solutions/images/
bi_diagram_jpg
36Data Mining
Important Considerations
- Do you need a data warehouse?
- Do all your employees need an entire data
warehouse? - How up-to-date must the information be?
- What data mining tools do you need?
37Conclusion
- I talked about how Data Mining can be used to
pull information into knowledge. - Data mining is not a one-step procedure. Data
mining is also not the end procedure in
decision-making processes. It is only a part of
the decision-making support system. The
decision-making system basically includes data
warehousing, data mining, and online analysis
processing, and so on. - Data Mining is the natural evolution of query and
reporting tools. Everyone who creates queries and
reports, benefits from having data mining
capabilities. - The data mining process be able to discover
information that are completely hidden.
38Data Mining - Software
- AIM Learning offers fast data mining tools based
on genetic programming and simulated annealing. - Acknosoft developers of KATE-tools for induction
and CBR, and other tools for decision support and
data mining. - http//www.salford-systems.com
- Demonstration about Data Mining
- https//www.statsoft.com/dm2.html
39The End
40Question ???