Modern Data Warehousing, Mining, and Visualization: Core Concepts - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Modern Data Warehousing, Mining, and Visualization: Core Concepts

Description:

Title: Slide 1 Author: Jack Becker Last modified by: Author Created Date: 11/30/2004 4:21:32 PM Document presentation format: On-screen Show (4:3) Company – PowerPoint PPT presentation

Number of Views:388
Avg rating:3.0/5.0
Slides: 34
Provided by: JackB79
Category:

less

Transcript and Presenter's Notes

Title: Modern Data Warehousing, Mining, and Visualization: Core Concepts


1
Chapter 7 The Future of Data Mining,
Warehousing, and Visualization
  • Modern Data Warehousing, Mining, and
    Visualization Core Concepts

2
(No Transcript)
3
(No Transcript)
4
7-1 The Future of Data Warehousing
  • As a DW becomes a mature part of an organization,
    it is likely that it will become as transparent
    as any other part of the IS.
  • One challenge to face is coming up with a
    workable set of rules that ensure privacy as well
    as facilitating the use of large data sets.
  • Another is the need to store unstructured data
    such as multimedia, maps and sound.
  • The growth of the Internet allows integration of
    external data into a DW, but its varying quality
    is likely to lead to the evolution of third-party
    intermediaries whose purpose is to rate data
    quality.

5
Predicting the Future
  • In a technology-intensive area, it doesnt pay to
    get too far ahead of the curve.
  • The past is the best prelude to history.
  • Old example Josephson junctions.
  • A switching element based on superconductivity
    --- rendered useless by ICs.
  • New Example Quantum computing.
  • Were inventing clever algorithms for a device
    that may well never exist.

6
The data explosion
  • The amount of data stored in electronic storage
    media increases at a fast pace
  • UC Berkeley estimated that 5 Exabytes of new data
    were generated in 2002
  • Amount of data doubles every 18-24 months
  • 1 Exabyte 1 billion Gigabytes
  • It took 300,000 years for humans to accumulate 12
    Exabytes of information, it took only 2.5 years
    more for the next 12 Exabytes

7
A guide to collective names for scientific units
  • Kilo 103
  • Mega 106
  • Giga 109
  • Tera 1012
  • Peta 1015
  • Exa 1018
  • Zetta 1021
  • Yotta 1024

8
The data explosion
  • In March 2007, an IDC study reported that 161
    Exabytes of new data were generated in the year
    2006. At the same time, 185 Exabytes of storage
    were available.

9
The data explosion
  • In March 2008, another IDC study reported that,
    at 281 billion gigabytes (281 exabytes), the
    digital universe in 2007 was 10 bigger than
    originally estimated !
  • http//www.emc.com/digital_universe

10
The data explosion
  • According to the June 2009 update of the Cisco
    Visual Networking Index IP traffic forecast, by
    2013, annual global IP traffic will reach
    two-thirds of a zettabyte or 667 exabytes.
  • Internet video will generate over 18 exabytes per
    month in 2013.
  • Global mobile data traffic will grow at a CAGR of
    131 percent between 2008 and 2013, reaching over
    two exabytes per month by 2013.

11
Long Term, What Does a Database Person Care
About?
  • What is the largest amount of data we can deal
    with? Terabytes 1012? Petabytes 1015?
    Exabytes 1018?
  • What can we do with it?
  • How?
  • There are lots of new places with big data
  • The Web
  • Scientific databases
  • Digital libraries

12
Long-Lived Themes
  • Very high-level query languages.
  • If you are going to deal with very large amounts
    of data, there has to be a lot of uniformity in
    what you do.
  • SQL-based user interfaces, like QBE in Access
    will be central to the future of Data Warehouses
  • Query optimization.
  • The success of a very high-level language depends
    on the ability to produce efficient
    implementations.

13
Some Good, New Directions
  • Languages and systems for automating the process
    of integrating databases .
  • Everyone acts as if this problem were solved, but
    it is not.
  • Stream data collection processing.
  • Many applications where data whizzes by so fast
    that storage and processing are limited.
  • E.g., telecom billing, intrusion detection, etc.

14
More New Directions
  • New kinds of data
  • e.g., images, audio.
  • Data mining
  • SAS Enterprise Mining-type GUI interfaces
  • Automation of database design and tuning.
  • Exploiting new architectures
  • Parallel database machines.
  • Peer-to-peer and distributed systems.

15
Integrated Architecture
  • Historically, market and business forces have
    moved organizations toward ineffective
    nonintegrated DW systems .
  • Far too often, a silo DW simply replaces a silo
    OLTP system.
  • To survive in a future world of low-cost, turnkey
    application systems, the transition to a
    federated architecture must be made.

16
Typical Nonintegrated Information Architecture
17
Federated Integrated Information Architecture
i2 Supply Chain
Oracle Financials
Siebel CRM
3rd Party Data
Common Data Staging Area
Federated Supply Chain Data Mart
Federated Financial DW
Federated Marketing DW
Subset Non-Architected Data Marts
18
Future
  • The future of data warehousing is clearly
    multi-faceted.
  • There is a lot of blurring today with
  • CRM, Enterprise Systems and E-commerce
    initiatives.
  • Data warehousing is really becoming the method
    for storing analytic-capable data for all these
    applications and more, many of which are
    packaged.
  • Architectures will need to be more tightly
    integrated.
  • E-commerce is cranking up data volumes.

19
Customer Relationship Management CRM
  • Whether it's for traditional catalog sales, 24 x
    7 customer support, or day-to-day banking,
  • Consumers demand that their suppliers support
    consistent, unified customer interactions across
    multiple communications channels, including
    voice, fax, Web-based email, personal
    interaction, and browsers.
  • Plus aggressive competition for market share as
    well as the need to increase profitability from
    each customer transaction, has created a pressing
    need for
  • Enterprise-wide customer relationship management
    (CRM) solutions.
  • A successful CRM solution requires the
    integration of all types of customer interactions
    with enterprise-wide business functions,
    including sales, marketing, customer service, and
    provisioning.

20
Enterprise Resource Planning ERP System
Vendors
Customers
21
Benefits Of Enterprise Systems
  • FIRM STRUCTURE ORGANIZATION
  • One organization
  • MANAGEMENT
  • Firm-wide knowledge-based management processes
  • TECHNOLOGY
  • Unified platform
  • BUSINESS
  • More efficient operations customer-driven
    business processes

22
Challenges Of Enterprise Systems
  • Daunting implementation
  • High up front costs future benefits
  • Inflexibility
  • Hard to realize strategic value

23
7-2 Alternate Storage and the Data Warehouse
  • Surprisingly, the future of data warehousing is
    not high-performance disk storage, but an array
    of alternative storage.
  • Involves two forms of alternative storage
  • Near-line storage involves an automated silo
    where tape cartridges are handled automatically.
  • Secondary storage which is slower and less
    expensive, such as CD-ROMs and floppy disks.
  • Firms like Teradata, Inc., Storage Technology
    Corp. STC and others specialize in high volume
    storage systems

24
Speed and Capacity of Various Near-Line Storage
Media
Device Capacity Data Access Speed Media Lifetime Write once or Write many
DAT DDS2 4-8 Gbyte 510 Kbyte/s 10-25 Yrs WM
DAT DDS3 12-24 Gbyte 1 Mbyte/s 10-25 Yrs WM
CD-ROM 640 Mbyte X times 1.5 Mbits/s to Read 10 Yrs Plus WO
CD-RW 640 Mbyte X times 1.5 Mbits/s to Read 10 Yrs Plus WM
Exabyte 20-40 Gbyte 3-6 Mbyte/s 10-25 Yrs WM
DLT Tape 35 Gbyte 5 MByte/s 30 Yrs WM
DVD up to 15Gbyte Not Known Not Known WO
DTF Tape 42Gbyte 12 Mbyte/s 10-25 Yrs WM
Data D3 50 Gbyte 12 Mbyte/s 10-25 Yrs WM
DVD-RAM up to 3 Gbyte Not Known Not Known WM
Magneto-optical 2.6-5.6 Gbyte Not Known Not Known WM
25
Typical Near-Line Tape Storage Silo
26
Why Use Alternative Storage?
  1. The data in a DW are stable. They are placed
    there once and left alone, so do not need to be
    updated at high speed.
  2. The queries that operate on the DW data often
    require long streams of data stored sequentially.
    Operational access requires different units of
    data from different storage areas.
  3. The DW is of indeterminate size and is always
    increasing in volume, requiring flexible
    capacity.
  4. When data gets accessed less often as it ages, it
    can be moved to secondary storage, making access
    to newer data more efficient.

27
To make this two-level storage work, we need both
an Activity Monitor (shown here) and a Cross
Media Storage Monitor (manages traffic between
active storage and alternative storage).
28
7-3 Trends in Data Warehousing
  • Customer interaction and learning relationships
    require capturing information everywhere and
    massive scalability.
  • Enterprise applications generate data that is
    doubling very 9-12 months.
  • The time available for working with data is
    shrinking and the need for 247 access is
    becoming the norm.
  • Fast implementation and ease of management are
    becoming more and more important.
  • In the future, more organizations will build Web
    applications that operate in conjunction with the
    DW.

29
7-4 The Future of Data Mining
  • As promising as the field may be, it has
    pitfalls
  • The quality of data can make or break the data
    mining effort.
  • In order to mine the data, companies first have
    to integrate, transform and cleanse it.
  • To obtain value from data mining, organizations
    must be able to change their mode of operation
    and maintain the effort (agile corporations).
  • Finally, there are concerns about privacy.

30
Personalization versus Privacy
  • Companies that use data mining for target
    marketing walk a tightrope between
    personalization and privacy.
  • Implementation of the recent FTC guidelines about
    information practices can be a problem since
    companies often do not know how they will use
    information ahead of time. Signed releases from
    customers increasing required.
  • Further, technology appears to create new ways to
    acquire information faster than the legal system
    can handle the ethical and property issues.

31
7-5 Using Data Mining to Protect Privacy
  • While Internet use has grown, so have the
    problems of network intrusion.
  • One current intrusion detection technique is
    misuse detection scanning for known malicious
    activity patterns known by signatures.
  • Another technique is anomaly detection where
    there is an attempt to identify malicious
    activity based on deviations from norms.
  • Most intrusion detection systems operate by the
    signature approach.

32
Shortfalls of Current Detection Schemes
  • Variants although signature lists are updated
    frequently, minor changes in the exploit code
    can produce a new undetected intruder.
  • False positives a detection system may be too
    conservative and declare an intrusion when there
    is none.
  • E.g., Intruder scoring techniques for email
  • False negatives an intrusion wont be detected
    until a signature has been identified.
  • Data overload as traffic grows, the ability to
    find new hacks becomes harder and harder.

33
How Can Data Mining Help?
  • Data mining can help mainly by its ability to
    identify patterns of valid network activity.
  • Variants anomalies can be detected by comparing
    connection attempts to lists of know traffic.
  • False positives data mining can be used to
    identify recurring patterns of false alarms.
  • False negatives if valid activity patterns are
    identified, invalid activity will be easier to
    spot.
  • Data overload data reduction is one of the
    major features of data mining.

34
7-6 Trends Affecting the Future of Data Mining
  • While the available data increases exponentially,
    the number of new data analysts graduating each
    year has been fairly constant. Either of lot of
    data will go unanalyzed or automatic procedures
    will be needed.
  • Increases in hardware speed and capacity makes it
    possible to analyze data sets that were too large
    just a few years ago.
  • The next generation Internet will connect sites
    100 times faster than current speeds.
  • To be more profitable, businesses will need to
    react more quickly and offer better service, and
    do it all with fewer people and at a lower cost.

35
7-7 The Future of Data Visualization
  • Weapons performance and safety
  • Data visualization coupled with simulation models
    can show how weapons perform under typical
    conditions and the effect of weapons aging.
  • Medical trauma treatment
  • Todays surgeons use computer vision to assist in
    surgery. In the future this trend suggests that
    local medical personnel can also be assisted from
    afar by specialists through telepresence.
  • X-ray transmission resolution now at acceptable
    limits

36
Visualization of a Simulated Warhead Impact
37
Augmented-reality Headset Worn by Surgeon
38
Surgery Being Conducted Via Telepresence
39
7-8 Components of Future Visualization
Applications
  • The data visualization environment links the
    critical components and enables the smooth flow
    of information among the components.
  • In the future, the bounds between computers,
    graphics and human knowledge will become more
    blurred.
  • Many advances in technology will be need to
    handle the visualization environment of the
    future.
  • Intelligent file systems and data management
    software will contend with thousands of coupled
    storage devices.

40
Conceptual Mapping of an Information Architecture
ENTERPRISE NETWORK
Enterprise Metadata System Metadata
Browser Global Query System System
Simulation Information Modeler
Enterprise Metadatabase
Visualization Environment
Visual Interpreter
Visualization Interface Management System
41
Cooperation Between Statistical Analysis And Data
Mining
  • The enhancement of data mining techniques with
    mature statistical methods may produce
    interesting new techniques which may work well
    with different kinds of problems and on different
    data.
  • For example, the statistical techniques may help
    in judgment on interestingness and significance
    of rules
  • E.g., Neural Networks

42
Multidimensional Rule Use Visualization
Techniques
  • Discovering knowledge is not enough because it
    has to be presented in a manner that the user can
    understand easily.
  • One of the most effective ways of digesting the
    rules discovered is through graphical
    visualizations.
  • Humans are very good at interpreting visual data
    and scenes.
  • This fact should be exploited in the data mining
    process.

see http//www.cs.uml.edu/phoffman/kdd/miv.htm
43
Intelligent GIS
  • Methods for mining spatial data should be
    combined with advanced spatial databases, such as
    object-oriented spatial databases and
    spatial-temporal databases, as well as
    statistical analysis, spatial reasoning, and
    expert system technology to create Intelligent
    GIS Systems

44
In Conclusion
  • Data explosion recent years have seen a dramatic
    increase in the amount of information stored in
    electronic format.
  • It has been estimated that the amount of
    information in the world doubles every 20 months
    and the size and number of databases are
    increasing even faster
  • Data and information are crucial for decision
    making, especially in business operations. As a
    prominent top manager aid said,
  • "Whoever has information fastest and uses it
    wins"
  • Watterson K., from BYTE.

45
Future Vision
  • The objective of taking a view on the future is
    not so much about trying to guess lottery
    numbers, it is about combining the past and the
    present with what we think is likely to occur.
  • That way we believe we are able to forecast with
    some accuracy.
  • Predicting the future is like predicting the
    weather, events will occur that were unexpected
    and geniuses have a habit of seeing things
    differently leading to major shifts in the way
    things are done.

46
Accounting and DW
  • http//ledgerism.net/datamart.htm
  • http//www.finance.state.mn.us/agencyapps/training
    /ia/ia150s_accounting.pdf
  • http//www.geocities.com/SiliconValley/Horizon/914
    4/artdb003.html
Write a Comment
User Comments (0)
About PowerShow.com