Data and Databases - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Data and Databases

Description:

data that have been processed and presented in a form suitable for human interpretation ... Kilobyte. 8 bits. Byte. Storage. Data Storage ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 28
Provided by: madh
Category:

less

Transcript and Presenter's Notes

Title: Data and Databases


1
Data and Databases
2
The Data Basics
  • Data
  • Facts concerning things such as people, objects,
    or events
  • Information
  • data that have been processed and presented in a
    form suitable for human interpretation
  • Database
  • a collection of interrelated, shared, and
    controlled data

3
Drawbacks of the Traditional Database System
  • Data Redundancy
  • Program-Data Dependence
  • Inflexibility
  • Poor Data Security
  • Lack of Data Sharing
  • Lack of Data Standards

4
Modern Database Systems
Accounting Application Programs
Accounting
Integrated Database
Finance Application Programs
Finance
DBMS
Sales
Sales Application Programs
5
Advantages of Modern Database Environments
  • Minimal data redundancy
  • Data consistency
  • Integration of data
  • Data sharing
  • Ease of application development
  • Security, privacy, and integrity controls
  • Data accessibility and responsiveness
  • Data independence
  • Reduced program maintenance

6
Drawbacks of Modern Database Environments
  • Need for new specialized personnel
  • Need for explicit backup
  • because of minimal data redundancy
  • Interference with shared data
  • concurrent access is a problem
  • Organizational conflict

7
Data Storage
8
Data Representation
  • Binary digit (bit)
  • String of bits (Byte)
  • EBCDIC vs. ASCII
  • Picture Element (Pixel)

9
Data Storage
  • In Web-era, data is piling up quickly space at a
    premium
  • Storage solutions
  • Server-hosted storage
  • SCSI Arrays
  • Network Attached Storage (NAS)
  • Storage Area Networks (SAN)

10
Server-hosted storage
  • Both applications and storage on same server
  • Advantage
  • Server, OS, and storage all from the same vendor
  • Easy to replicate
  • Disadvantages
  • Expansion limited by server architecture (may
    need to replace existing media)
  • Free space on one server not easily accessed by
    another server
  • Maintenance affects server and storage (CPUs
    become obsolete before storage)

11
SCSI Arrays Small Computer System Interface
(scuzzy)
  • Scuzzy interfaces allow for faster data
    transmissions than traditional serial and
    parallel ports.
  • In a survey by InfoWorld, 67 were using SCSI
    arrays for storage
  • Often used with RAID
  • Advantages
  • Embedded computer to manage configuration and
    monitor performance
  • Can be made fault-tolerant
  • SCSI cable offers good throughput
  • Disadvantages
  • Expansion difficult once space is used
  • Significant costs of layout (SCSI cable limited
    in distance)

12
Network Attached Storage (NAS)
  • A server that is dedicated to nothing more than
    file sharing.
  • Devices can be plugged into LAN using standard
    network cables and accessed by client PCs via a
    NAS gateway
  • Advantages
  • Easiest and cheapest
  • Pre-configured with OS tailored for data handling
  • Can be few GB to several TB
  • Easy to connect
  • Faulty components can be changed without downtime
  • Disadvantages
  • Adds burden to LAN traffic
  • Access speed limited by bandwidth
  • Each NAS device has to managed independently

13
Storage Area Networks (SAN)
  • Dedicated network of servers and storage devices
  • Uses hubs and switches
  • No limit to number of storage servers
  • Uses fiber can extend long distances good
    bandwidth
  • Easy to set up needs special adaptors
  • Works with any OS
  • Easy migration from old systems

14
Storage Virtualization
  • pooling of physical storage from multiple network
    storage devices into what appears to be a single
    storage device that is managed from a central
    console. (source whatis.com)
  • For more information, visit http//www.storage.co
    m

15
Data Warehouses and Data Mining
16
Data Requirements
  • Organizations need access to
  • operational data
  • historical data
  • legacy data
  • subscription databases
  • internet data
  • Organizations need to
  • combine data, slice and dice, do complex
    analysis...

17
Data Warehouses
  • Aimed at supporting all levels of analysis and
    information formats
  • DSS have existed for many years
  • Labeled data warehouse in the 1990s and top
    executives began top pay notice
  • Many different definitions (some relating to
    data, others to people or processes)

18
Simple Definition
A data warehouse is a collection of integrated,
subject-oriented databases designed to support
the decision support function, where each unit of
data is relevant to some moment in time.
19
Four Defining Concepts
  • Subject-oriented
  • Integrated
  • Time-variant
  • Non-volatile

20
Concepts....
  • Subject-oriented
  • requires database design
  • revolves around specific business entities
  • many companies simply pull together old files
  • Integrated data
  • data warehouse database designed using a proper
    methodology
  • consistency in naming conventions for keys,
    relationships etc.
  • warehouses require large design effort

21
Concepts...
  • Time-variant
  • data warehouse design organizes data by different
    time periods
  • fundamental to temporal analysis
  • usually years or quarters or months
  • Non-volatile
  • not updated in real-time
  • staged into warehouse on a nightly/weekly basis
  • users cannot update the data (in DW) directly

22
Data Mining
True genius resides in the capacity for
evaluation of uncertain, hazardous, and often
conflicting information - Sir Winston Churchill
23
What is data mining?
  • Large databases can be searched for relationships
    patterns, and trends, which prior to the search
    were not known to exist.
  • Data mining is the process of asking a processing
    engine to show answers to questions that we do
    not know how to ask.

24
Data Mining techniques
  • Four major types of processing algorithms (or
    rules)
  • associations
  • clustering
  • classification
  • sequential patterns

25
1. Associations (Link Analysis)
  • Find correlations between one set of items or
    events and another such set
  • eg 78 of all people who buy a desktop PC will
    also buy add-ons
  • eg large percentage of buyers will buy potato
    chips if they are stacked near the beverages
    aisle...

26
Clustering
  • Used to discover hitherto unknown or unsuspected
    class of data
  • Defect Analysis or Group affinity analysis
  • Some particular common characteristic between
    good customers that cancel their own credit cards

27
Classification
  • Identifies the process and must discover the
    rules that whether an item belongs to a
    particular subset of data (a subtype)
  • Eg Credit card approval
  • do a variety of customer characteristics put
    him/her in a subset of customers who can charge?

28
Sequential Patterns
  • Mostly used for pattern analysis
  • uses historical data store of all transactions in
    a warehouse
  • Eg Buyers who purchase window coverings and then
    buy linens within three months will purchase
    furniture within the next 12 months (new
    residence furnishings buying pattern)
Write a Comment
User Comments (0)
About PowerShow.com