Introduction to Data Analysis and Mining - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Introduction to Data Analysis and Mining

Description:

OLAP (Online Analytical Processing) consists of tools for data analysis. ... 'Young women with annual incomes are most likely to buy small sports cars. ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 19
Provided by: Lau147
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Data Analysis and Mining


1
Introduction to Data Analysis and Mining
  • By Laura Jordana

2
Decision-Support Systems
  • Database applications can be classified as either
    transaction-processing or decision-support
    systems.
  • Transaction-processing systems are extensively
    used today bank transactions, online sale
    transactions, etc.
  • These systems generate a large amount of
    information.

3
Decision-Support Systems
  • Decision-support systems attempt to extract
    useful information from the generated information
    in order to make business decisions.
  • For example, it can analyze customer behavior to
    help managers decide what products to stock in a
    store or what market to advertise their products
    to.

4
OLAP
  • Many decision-support queries can be written in
    SQL. However, others cannot, or cannot be
    expressed easily.
  • Extensions are available to make data analysis
    easier.
  • OLAP (Online Analytical Processing) consists of
    tools for data analysis.
  • Examples statistical data such as finding
    percentiles, cumulative distributions

5
Data Warehousing
  • A data warehouse is an archive of information
    gathered from multiple data sources.
  • A company may have different databases for
    different purposes.
  • These databases might only contain current data.
  • The purpose of the data warehouse is to store ALL
    the data for a long time.
  • Decision-support queries are easier to write, and
    online transaction-processing systems are not
    affected by this additional workload.

6
Components of a Data Warehouse
7
Issues
  • When and how to gather data source-driven (from
    the data source to the warehouse) or
    destination-driven (warehouse sends requests for
    new data)
  • Schema to be used Different data sources are
    likely to have different schemas
  • Data transformation and cleansing correcting
    minor errors such as a street name being spelled
    incorrectly

8
Issues (cont.)
  • Propogating updates how to update the data
    warehouse when an update occurs at the data
    source
  • Summarizing data may not necessarily need or
    have room to store all raw data

9
Data Mining
  • The process of analyzing large databases to find
    useful patterns.
  • Data mining attempts to discover rules and
    patterns from data.
  • Also called knowledge discovery.

10
Knowledge Discovery
  • A rule can be the result of knowledge discovery.
  • For example Young women with annual incomes are
    most likely to buy small sports cars.
  • These rules are not universally true, and have
    degrees of support and confidence.

11
Applications of Knowledge Discovery
  • Predictions For example, a credit-card company
    may want to predict a persons credit risk based
    on known factors.
  • Associations Suggesting books to a customer who
    has purchased books at an online bookstore, or
    suggesting accessories to go with an item.
  • Real-World Example The National Basketball
    Association uses a data-mining application in
    conjunction with video recordings of basketball
    games to analyze plays and discover interesting
    patterns in game data.
  • (Source http//citeseer.ist.psu.edu/cachedpage/4
    21882/1)

12
Classification
  • Items belong to one of several classes.
  • The problem is to predict what class a new item
    belongs to (i.e. predicting a persons credit
    risk).
  • Attributes of the item are used to predict its
    class (i.e. age, education, annual income,
    current debts).
  • The decision-tree is one way to perform
    classification.

13
Decision-Tree
  • A decision tree has leaf nodes that represent
    classes.
  • Each internal node is associated with a predicate
    or function which is used to determine which
    child to traverse to.
  • Basically, a decision-tree is a flow chart of
    if-then scenarios.

14
Decision-Tree
15
Association
  • Association is a topic of interest particularly
    in the retail industry.
  • Companies are interested in the associations
    among different items that people purchase.
  • For example Someone who buys bread will probably
    buy milk. Someone who bought a book on PHP is
    likely to purchase a book on MySQL.

16
Association Rules
  • bread gt milk
  • PHP gt MySQL
  • As mentioned before, rules have degrees of
    support and confidence.
  • Support measures what percentage of the
    population satisfies both sides of the rule (i.e.
    what percentage of all purchases include both
    milk and bread).
  • Confidence is a measure of how often the
    population satisfies the right hand side of the
    rule when the left hand side is true (i.e. what
    percentage of the purchases that include bread
    also include milk).
  • Note Confidence of breadgtmilk can be different
    from milkgtbread although they have the same
    support.

17
Other Types Of Mining
  • Text mining uses data mining techniques on text
    documents
  • Data visualization helps users observe patterns
    visually

18
References
  • http//www.purdue.edu/UNS/html4ever/2004/041018.Ca
    ruthers.discover.html
  • A. Silberschatz, H.F. Korth, S. Sudershan
    Database System Concepts, 5th ed., McGraw-Hill,
    2006
Write a Comment
User Comments (0)
About PowerShow.com