Title: Data Mining
1Data Mining
- Ahmed M. Zeki
- Semester III
- April 2007
2Introduction
- What is Data?
- Data , Information, Knowledge.
- What is Mining?
3Machine Learning
Knowledge Engineering (Expert Systems)
System
Output (According to the Rules)
Rules
- Example MYCIN (Medical Diagnosis System).
4Expert Systems (Example)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9Definition
- Data Mining is the process of exploration and
analysis, by automatic or semi-automatic means,
of large quantities of data in order to discover
meaningful patterns and rules.
10Data Mining vs. Other Techniques
Statistics, Query, Reporting, OLAP, ...
Hypothesis-Free
Hypothesis
Not suitable for large databases and data
warehouses within the time limits.
- Why are my discount coupons not attracting the
sort of return I was expecting? - How can I increase the share I have of my
customers total spending on electronic goods? - How can I get my other stores to match the
incredibly successful sales figures of the main
branch?
- Volume of TVs sold in one store last month.
- Analyze the price sensitivity of new line of TVs.
- Comparing the sales of various of products in
different stores over time. - Hypotheses the manager knows that there are
stores, products, sensitivity and sales figures,
and he is checking out the interrelationships.
11Traditional Data Analysis
Hypothesis
Query Language
Graphics Statistics OLAP
Output
Database
12Relationship between Data Mining and Statistics
- Statistics is closest to data mining.
- Many of the analysis that are now done with data
mining has been used by statistics, such as
predictive models or discovering associations in
databases.
13Data Mining is not Magic!!
- Examples
- Older and wealthy customers were buying large
sedans!! - People born under the sign of Pisces were most
prone to accidents! - Males with incomes between 50k-65k who
subscribe to certain magazines are likely
purchasers of a certain product! - DM just assists business analysis by finding
patterns and relationships in the data. - These patterns and relationships are not
necessarily causes of an action.
14Data Warehousing
- Data Warehousing Collection of data, in an
organized, integrated, subject-oriented,
nonvolatile, documented and time dimensioned way,
to support decision making, by improving the
effectiveness of data-driven. - 90 of major organizations have or are building
some kind of data warehouse.
15Data Warehousing
- Subject Oriented The data is grouped under
business headings, such as Customers, products,
sales analysis repots (This subject orientation
is achieved through data modeling). - Integrated The contents of the data warehouses
are defined such that they are valid across the
enterprise and its operational and external data
sources. - Time Dimensioned All data in the data warehouse
is time stamped at time of entry into the
warehouse or when it is summarized. - Non-volatile Once loaded into the data
warehouse, the data is not updated. Thus it acts
as a stable resource for consistent reporting and
comparative analysis.
16Data Mining and Data Warehousing
Data Mining Data Mart Geographic Data
Mart Analysis Data Mart
or
Data Source
Data Warehouse
- Data warehousing is very closely associated with
data mining. - Data warehousing is not a prerequisite for a data
mining solution.
17Data Mining Data Mart
- Setting up a data warehouse is not a trivial
task, especially if the aim is to service the
entire enterprise. - Recently, many organizations have used the data
mart, which is more specialized, more accessible
and a lot smaller than an enterprise-side data
warehouse. - i.e. Data mart is subset of data warehouse.
18From Data Warehouse to Data Mining
- If the organization has already invested a data
warehouse, they already knows the strategic
value of the corporate data asset and is
therefore well disposed to the concept of data
mining. - Much of the hard work in understanding, gathering
and cleaning the business data has been done, so
the organization is well positioned to further
capitalize on its investment in the data
warehouse.
19From Data Mining to Data Warehouse
- After implementing a data mining solution (since
data warehouse is not prerequisite for data
mining) an organization could decide to integrate
the solution in a broader data driven approach to
business decision making.
20Data Mining for Business Intelligence
- Business intelligence all of the processes,
techniques and tools that support business
decision-making based on information technology.
The approaches can range from a simple
spreadsheet to a major competitive intelligence
undertaking. Data mining is an important new
component of business intelligence.
21Data Mining and Business IntelligencePositioning
of different business intelligence according to
their potential value as a basis for tactical and
strategic business decisions
Making Decisions Data Presentation Visualization
Techniques Data Mining Information Discovery Data
Exploration OLAP, MDA, Statistical Analysis and
Querying and Reporting Data Warehouses / Data
Marts Data Sources Paper, Files, Information
Providers, Database Systems
Decision Maker Business Analyst Data
Analyst Database Administrator
Increasing potential to support business decision
The value of the information to support decision
making increase from the bottom of the pyramid to
the top.