Advanced Database Applications: Database Indexing and Data Mining CS562 -- Fall 2005 - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Advanced Database Applications: Database Indexing and Data Mining CS562 -- Fall 2005

Description:

Advanced Database Applications: Database Indexing and Data Mining CS562 -- Fall 2005 George Kollios Boston University Prof. George Kollios Office: MCS 288 Office ... – PowerPoint PPT presentation

Number of Views:205
Avg rating:3.0/5.0
Slides: 18
Provided by: Valued288
Category:

less

Transcript and Presenter's Notes

Title: Advanced Database Applications: Database Indexing and Data Mining CS562 -- Fall 2005


1
Advanced Database ApplicationsDatabase Indexing
and Data MiningCS562 -- Fall 2005
  • George Kollios
  • Boston University

2
  • Prof. George Kollios
  • Office MCS 288
  • Office Hours Monday 230pm-400pm
  • Thursday 1100am-1230pm
  • Mailing List cs562
  • Web http//www.cs.bu.edu/faculty/gkollios/ada05
  • Book1 http//www.cs.bu.edu/faculty/gkollios/ada05
    /Book/

3
History of Database Technology
  • 1960s Data collection, database creation, IMS
    and network DBMS
  • 1970s Relational data model, relational DBMS
    implementation
  • 1980s RDBMS, advanced data models
    (extended-relational, OO, deductive, etc.) and
    application-oriented DBMS (spatial, scientific,
    engineering, etc.)
  • 1990s2000s Data mining and data warehousing,
    multimedia databases, and Web databases

4
Structure of a RDBMS
Modern Database Systems Extend these layers
  • A DBMS is an OS for data!
  • A typical RDBMS has a layered architecture.

5
Index Methods for RDBMS
  • Hashing Methods
  • Linear Hashing, Extensible Hashing
  • B-tree family
  • B-trees and variations
  • Both of them are one-dimensional

6
Overview of the course
  • Spatial Database Systems
  • GIS, CAD/CAM, EOSDIS project NASA
  • Manages points, lines and regions
  • Temporal Database Systems
  • Billing, medical records
  • Spatio-temporal Databases
  • Moving objects, changing regions, etc

7
Overview of the course
  • Multimedia databases
  • A multimedia system can store and retrieve
    objects/documents with text, voice, images, video
    clips, etc
  • Time series databases
  • Stock market, ECG, trajectories, etc

8
Multimedia databases
  • Applications
  • Digital libraries, entertainment, office
    automation
  • Medical imaging digitized X-rays and MRI images
    (2 and 3-dimensional)
  • Query by content (or QBE)
  • Efficient
  • Complete (no false dismissals)

9
What is Data Mining?
  • Data mining (knowledge discovery in databases)
  • The efficient discovery of previously unknown,
    valid, potentially useful and understandable
    information or patterns from data in large
    databases
  • Alternative names
  • Knowledge discovery(mining) in databases (KDD),
    knowledge extraction, data/pattern analysis, data
    archeology, data dredging, etc.

10
DM Applications
  • Database analysis and decision support
  • Market analysis target marketing, market basket
    analysis, market segmentation
  • Fraud detection and management
  • Biology and medicine
  • Text mining (news group, email, documents) and
    Web analysis.

11
Data Mining Confluence of Multiple Disciplines
Database Technology
Statistics
Data Mining
Machine Learning
Visualization
Information Science
Other Disciplines
12
Overview of terms
  • Data a set of facts (items) D, stored in a
    database
  • Pattern an expression E in a language L, that
    describes a subset of facts
  • Attribute a field in an item i in D.
  • Interestingness a function ID,L that maps an
    expression to a measure space M

13
The Data Mining Task
  • For a given dataset D, language of facts L,
    interestingness function ID,L and threshold c,
    find the expressions E that
  • ID,L(E) gt c efficiently.

14
How Data Mining is used
  • Identify the problem
  • Use data mining techniques to transform the data
    into information
  • Act on the information
  • Measure the results

15
DM Functionalities
  • Concept description
  • Generalize, summarize, and contrast data
    characteristics, e.g., dry vs. wet regions
  • Association (correlation and causality)
  • Multi-dimensional vs. single-dimensional
    association
  • age(X, 20..29) income(X, 20..29K) Ă  buys(X,
    PC) support 2, confidence 60
  • contains(T, computer) Ă  contains(x, software)
    1, 75

16
DM Functionalities
  • Cluster analysis
  • Class label is unknown Group data to form new
    classes, e.g., cluster houses to find
    distribution patterns
  • Clustering based on the principle maximizing the
    intra-class similarity and minimizing the
    interclass similarity

17
DM Functionalities
  • Classification and Prediction
  • Finding models (functions) that describe and
    distinguish classes or concepts for future
    prediction
  • E.g., classify countries based on climate, or
    classify cars based on gas mileage
  • Presentation decision-tree, classification rule,
    neural network
  • Prediction Predict some unknown or missing
    numerical values
Write a Comment
User Comments (0)
About PowerShow.com