Data mining - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Data mining

Description:

Images of astronomical bodies. Molecular databases. Medical records ... 'What were unit sales in New England last March? Drill down to Boston.' Data Warehousing ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 16
Provided by: Non129
Category:
Tags: data | mining

less

Transcript and Presenter's Notes

Title: Data mining


1
Data mining
  • By Aung Oo

2
What is Data Mining?
  • Different perspectives CS, Business, IT
  • As a field of research in CS
  • Science of extracting useful information from
    large data sets or databases
  • Also known as
  • Knowledge Discovery and Data Mining
    (KDD)Knowledge Discovery in Databases (KDD)

3
Knowledge Discovery and Data Mining (KDD)
  • KDD can be said to lie at the intersection of
    statistics, machine learning, data bases, pattern
    recognition, information retrieval and artificial
    intelligence.

4
Data Mining Definitions
  • Analysis of datasets to find unsuspected
    relationships
  • Summarize data in novel ways that are
    understandable useful to data owner
  • Extraction of knowledge from data
  • non-trivial extraction of implicit, previously
    unknown potentially useful knowledge from data
  • Process of discovering patterns
  • automatically or semi-automatically, in large
    quantities of data
  • Patterns discovered must be useful meaningful in
    that they lead to some advantage, usually
    economic

5
Why Data Mining?
  • Large datasets are common due to advances in
    digital data acquisition and storage
    technology.
  • Automatic data production leads to need for
    automatic data consumption
  • Large databases mean vast amounts of information
  • Difficulty lies in accessing it
  • Business
  • Supermarket transactions
  • Credit card usage records
  • Telephone call details
  • Government statistics
  • Scientific
  • Images of astronomical bodies
  • Molecular databases
  • Medical records

6
Why Data Mining?
  • Data mining is ready for application in the
    business community because it is supported by
    three technologies that are now sufficiently
    mature
  • Massive data collection
  • Powerful multiprocessor computers
  • Data mining algorithms

7
Example of Data Mining
  • If a store tracks the purchases of a customer and
    notices that a customer buys a lot of silk
    shirts, the data mining system will make a
    correlation between that customer and silk
    shirts.
  • The store may begin direct mail marketing of silk
    shirts to that customer or it may alternatively
    attempt to get the customer to buy a wider range
    of products .
  • Another example analysts found that beers and
    diapers were often bought together .
  • So place the high-profit diapers next to the
    high-profit beers.
  • This technique is often referred to as "Market
    Basket Analysis".

8
Steps in the Evolution of Data Mining
Evolutionary Step Business Question Enabling Technologies
Data Collection (1960s) "What was my total revenue in the last five years?" Computers, tapes, disks
Data Access (1980s) "What were unit sales in New England last March?" Relational databases (RDBMS), Structured Query Language (SQL), ODBC
Data Warehousing Decision Support (1990s) "What were unit sales in New England last March? Drill down to Boston." On-line analytic processing (OLAP), multidimensional databases, data warehouses
Data Mining (Emerging Today) "Whats likely to happen to Boston unit sales next month? Why?" Advanced algorithms, multiprocessor computers, massive databases
9
The Scope of Data Mining
  • Automated prediction of trends and behaviors.
  • Data mining uses data on past promotional
    mailings to identify the targets most likely to
    maximize return on investment in future mailings.
  • Automated discovery of previously unknown
    patterns.
  • An example of pattern discovery is the analysis
    of retail sales data to identify seemingly
    unrelated products that are often purchased
    together.
  • More columns.
  • High performance data mining allows users to
    explore the full depth of a database, without
    pre-selecting a subset of variables.
  • More rows.
  • Larger samples yield lower estimation errors and
    variance, and allow users to make inferences
    about small but important segments of a
    population.

10
Data Mining vs. Statistics
  • Objective of data mining exercise plays no role
    in data collection strategy
  • In this way it differs from much of statistics
  • For this reason, data mining is referred to as
    secondary data analysis
  • KDD more complicated than initially thought
  • 80 preparing data
  • 20 mining data

11
Query Data Base vs. Data Mining
  • Data Base When you know exactly what you are
    looking for.
  • Data Mining When you only vaguely know what you
    are looking for.

12
Data Mining Tasks and Techniques
  • Not so much a single technique
  • Idea that there is more knowledge hidden in the
    data than shows itself on the surface
  • Any technique that helps to extract more out of
    data is useful
  • Five major task types
  • 1. Exploratory Data Analysis (Visualization)
  • 2. Descriptive Modeling (Density estimation,
    Clustering)
  • 3. Predictive Modeling (Classification and
    Regression)
  • 4. Discovering Patterns and Rules (Association
    rules)
  • 5. Retrieval by Content (Retrieve items similar
    to pattern of interest)

13
Privacy concerns
  • For example, if an employer has access to medical
    records, they may screen out people who have
    diabetes or have had a heart attack. Screening
    out such employees will cut costs for insurance,
    but it creates ethical and legal problems.
  • Essentially, data mining gives information that
    would not be available otherwise. It must be
    properly interpreted to be useful. When the data
    collected involves individual people, there are
    many questions concerning privacy, legality, and
    ethics.

14
Notable Uses of Data Mining
  • Data mining has been cited as the method by which
    the U.S. Army intelligence unit, Able Danger,
    supposedly had identified the 9/11 attack leader,
    Mohamed Atta, and three other 9/11 hijackers as
    possible members of an al Qaeda cell operating in
    the U.S. more than a year before the attack.

15
References
  • http//www.cedar.buffalo.edu/srihari/CSE626
  • http//en.wikipedia.org/wiki/Data_Mining
  • http//www.thearling.com/text/dmwhite/dmwhite.htm
Write a Comment
User Comments (0)
About PowerShow.com