Data Management: Warehousing, Analyzing, Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Data Management: Warehousing, Analyzing, Mining

Description:

Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization – PowerPoint PPT presentation

Number of Views:314
Avg rating:3.0/5.0
Slides: 45
Provided by: Syste366
Category:

less

Transcript and Presenter's Notes

Title: Data Management: Warehousing, Analyzing, Mining


1
Chapter 11
  • Data Management Warehousing, Analyzing, Mining
    Vizualization

2
Learning Objectives
  • Recognize the importance of data, their
    managerial issues, and their life cycle.
  • Describe the sources of data, their collection,
    and quality issues.
  • Relate data management to multimedia and document
    management.
  • Explain the operation of data warehousing and its
    role in decision support.

3
Learning Objectives (cont.)
  • Understand the data access and analysis problem
    and the data mining and online analytical
    processing solutions.
  • Describe data presentation methods and explain
    geographical information systems, visual
    simulations, and virtual reality as decision
    support tools.
  • Discuss the role and provide examples of
    marketing databases.
  • Recognize the role of the Web in data management.

4
Case Sears Data Warehouses
  • Problem
  • Sears was caught by surprise in the 1980s as
    shoppers defected to specialty stores and
    discount mass merchandisers.
  • Solution
  • Sears constructed a single sales information data
    warehouse, replacing 18 old databases which were
    packed with redundant, conflicting obsolete
    data.
  • By 2001, Sears made the following Web
    initiatives
  • e-Commerce home improvement center
  • B2B supply exchange for the retail industry
  • Online Toy catalog and much more

5
Case Sears Data Warehouses
  • Results
  • The ability to monitor sales by item per store
    enables Sears to create a sharp local market
    focus.
  • Data monitoring of Web-based sales helps Sears
    marketing and Web advertisement plans.
  • Response time to queries has dropped from days to
    minutes.
  • The data warehouse offers Sears employees a tool
    for making better decisions.
  • Sears retailing profits have climbed more than 20
    annually since the data warehouse was
    implemented.

6
Difficulties of Managing Data
  • The amount of data increases exponentially.
  • Data are scattered throughout organizations and
    are collected by many individuals using several
    methods and devices.
  • Only small portions of an organizations data are
    relevant for any specific decision.
  • An ever-increasing amount of external data needs
    to be considered in making organizational
    decisions.
  • Data are frequently stored in several servers and
    locations in an organization.

7
Difficulties of Managing Data (cont.)
  • Raw data may be stored in different computing
    systems, databases, formats, and human and
    computer languages.
  • Legal requirements relating to data differ among
    countries and change frequently.
  • Selecting data management tools can be a major
    problem because of the huge number of products
    available.
  • Data security, quality, and integrity are
    critical yet are easily jeopardized.

8
Data Life Cycle
9
Data Sources Collection
  • Internal Data. An organizations internal data
    are about people, products, services, and
    processes.
  • Personal Data. IS users or other corporate
    employees may document their own expertise by
    creating personal data.
  • External Data. There are many sources for
    external data, ranging from commercial databases
    to sensors and satellites.
  • The Internet Commercial Database Services. Some
    external data flow to an organization through
    electronic data interchange (EDI), through other
    company-to-company channels or the Internet.

10
Data Quality
  • Data Quality (DQ) is an extremely important issue
    since quality determines the datas usefulness as
    well as the quality of the decisions based on the
    data.

11
Data Quality Problems (Strong et al.,1997)
  • Intrinsic DQ Accuracy, objectivity,
    believability, and reputation.
  • Accessibility DQ Accessibility and access
    security.
  • Contextual DQ Relevancy, value added,
    timeliness, completeness, amount of data.
  • Representation DQ Interpretability, ease of
    understanding, concise representation, consistent
    representation.

12
Object-Oriented Databases
  • The object-oriented database is the most widely
    used of the newest methods of data organization,
    especially for Web applications.
  • An object-oriented database is a part of the
    object-oriented paradigm, which also includes
    object-oriented programming, operating systems,
    and modeling.
  • Object-oriented databases are sometimes referred
    to as multimedia databases and are managed by
    special multimedia database management systems.

13
Document Management
  • Document Management is the automated control of
    electronic documents, page images, spreadsheets,
    word processing documents, and complex, compound
    documents through their entire life cycle within
    an organization, from initial creation to final
    archiving.
  • Benefits of Document Management
  • Greater control over production, storage, and
    distribution of documents
  • Greater efficiency in the reuse of information
  • Control of a document through a workflow process
  • Reduction of product cycle times

14
Case U.S. Automobile Association (USAA)
  • Problem
  • The USAA is a large insurance company in Texas
    that serves over 2 million officers. In the
    1980s, the company experienced extreme delays in
    data retrieval and searches.
  • Solution
  • Using an environment called Automated Insurance
    Environment, USAA has been transformed into a
    completely paperless company.
  • Results
  • The system reduces the cost of storing documents,
    improves customer service, and improves
    productivity of employees.
  • USAA now saves 70,500,000 for the 10,000,000
    documents handled annually.

15
Data Processing
Data processing in organizations can be viewed
either as transactional or analytical.
  • Transactional
  • The data in transactions processing systems (TPS)
    are organized mainly in a hierarchical structure
    and are centrally processed.
  • Databases and processing systems are known as
    operational systems.
  • Analytical
  • Analytical processing involves analysis of
    accumulated data, mainly by end-users.
  • Includes DSS, EIS, Web applications, and other
    end-user activities.

16
Delivery Systems
  • A good data delivery system should be able to
    support
  • Easy data access by the end-users themselves.
  • A quick decision-making process.
  • Accurate and effective decision making.
  • Flexible decision making.

17
Data Warehouses
  • The purpose of a data warehouse is to establish a
    data repository that makes operational data
    accessible in a form readily acceptable for
    analytical processing activities (e.g. decision
    support, EIS)
  • Data warehouses include a companion called
    metadata, meaning data about data.
  • Major Benefits of Data Warehouses
  • (1) The ability to reach data quickly, as they
    are located in one place.
  • (2) The ability to reach data easily, frequently
    by end-users themselves, using Web browsers.

18
Data Warehouses
19
Characteristics of Data Warehouses
  1. Organization. Data are organized by detailed
    subjects.
  2. Consistency. Data in different operational
    databases may be encoded differently. In the
    warehouse they will be coded in a consistent
    manner.
  3. Time variant. The data are kept for 5 to 10 years
    so they can be used for trends, forecasting, and
    comparisons over time.
  4. Non-volatile. Once entered into the warehouse,
    data are not updated.
  5. Relational. The data warehouse uses a relational
    structure.
  6. Client/server. The data warehouse uses the
    client/server to provide the end user an easy
    access to its data.

20
Data Warehouse Suitability
  •  Data warehousing is most appropriate for
    organizations in which some of the following
    apply.
  • Large amounts of data need to be accessed by
    end-users.
  • The operational data are stored in different
    systems.
  • An information-based approach to management is in
    use.
  • There is a large, diverse customer base.
  • The same data are represented differently in
    different systems.
  • Data are stored in highly technical formats that
    are difficult to decipher.
  • Extensive end-user computing is performed.

21
Data Marts
  • Data Marts are an alternative used by many other
    firms is creation of a lower cost, scaled-down
    version of a data warehouse. They refer to small
    warehouses designed for a strategic business unit
    (SBU) or a department.
  • Two major types of Data Marts
  • 1) Replicated (dependent) Data Marts. In such
    cases one can replicate functional subsets of the
    data warehouse in smaller databases.
  • 2) Stand-Alone Data Marts. A company can have
    one or more independent data marts without having
    a data warehouse.

22
Knowledge Discovery in Databases (KDD)
  • KDD is the process of extracting useful knowledge
    from volumes of data.
  • It is the subject of extensive research.
  • KDDs objective is to identify valid, novel,
    potentially useful, and ultimately understandable
    patterns in data.
  • KDD is useful because it is supported by three
    technologies that are now sufficiently mature
  • Massive data collection
  • Powerful multiprocessor computers
  • Data mining algorithms

23
Evolution of KDD
24
Tools Techniques of KDD
  • Ad-hoc queries allow users to request in real
    time information from the computer that is not
    available in the periodical reports. Such answers
    are needed to expedite decision making.
  • Online analytical processing (OLAP) refers to
    such end-user activities as DSS modeling using
    spreadsheets and graphics, which are done online.
  • Ready-made Web-based Analysis. Many vendors
    provide ready made analytical tools, mostly in
    finance, marketing, and operations.

25
Data Mining
  • Data mining derives its name from the
    similarities between searching for valuable
    business information in a large database,and
    mining a mountain for valuable ore.
  • Data mining technology can generate new business
    opportunities by providing these capabilities
  • Automated prediction of trends and behaviors.
    Data mining automates the process of finding
    predictive information in large databases.
  • Automated discovery of previously unknown
    patterns. Data mining tools identify previously
    hidden patterns in one step.

26
Applications of Data Mining
Data Mining is currently being used in the
following areas
  • Insurance
  • Policework
  • Government Defense
  • Airlines
  • Health care
  • Broadcasting
  • Marketing
  • Retailing Sales
  • Banking
  • Manufacturing Production
  • Brokerage Securities trading
  • Computer hardware software

27
Text Web Mining
  • Text mining is the application of data mining to
    non-structured or less structured text files.
  • Text mining helps organizations to do the
    following
  • Find the hidden content of documents, including
    additional useful relationships.
  • Group documents by common themes.
  • Web Mining refers to mining tools used to analyze
    a large amount of data on the Web, such as what
    customers are doing on the Webthat is, to
    analyze clickstream data.

28
Data Visualization
  • Data visualization refers to the presentation of
    data by technologies such as digital images,
    geographical information systems, graphical user
    interfaces, multidimensional tables and graphs,
    virtual reality, three-dimensional presentations,
    and animation.

29
CASE Data Visualization Helps Haworth
  • Problem
  • Haworth Corporation, a major office furniture
    manufacturer, has maintained a competitive edge
    by offering customization.
  • But many customers are unable to visualize the 21
    million potential product combinations.
  • Solution
  • Computer visualization software enables sales
    representatives with laptops to show customers
    exactly what they were ordering.
  • Results
  • Reduction in time spent between sales reps and
    CAD operators, increased customer satisfaction
    with quicker delivery.

30
Multidimensionality
  • Modern data and information may have several
    dimensions.
  • e.g. Management may be interested in examining
    sales figures in a certain city by product, by
    time period, by salesperson, and by store.
  • It is important to provide the user with a
    technology that allows him or her to add,
    replace, or change dimensions quickly and easily
    in a table and/or graphical presentation.
  • The technology of slicing, dicing, and similar
    manipulations is called Multidimensionality.

31
Multidimensionality
  • Three factors are considered in
    multidimensionality       

Examples of dimensions Products, salespeople,
market segments, business units, geographical
locations, distribution channels, countries,
industries.
  • Examples of measures
  • Money, sales volume, head count, inventory
    profit, actual versus forecasted results.

Examples of time Daily, weekly, monthly,
quarterly, yearly.
32
Advantages of Multidimensionality
  • .
  • Data can be presented and navigated with relative
    ease.
  • Multidimensional databases are easier to
    maintain.
  • Multidimensional databases are significantly
    faster than relational databases as a result of
    the additional dimensions and the anticipation of
    how the data will be accessed by users.

33
Geographic Information Systems (GIS)
  • A geographical information system (GIS) is a
    computer-based system for capturing, storing,
    checking, integrating, manipulating, and
    displaying data using digitized maps.
  • Every record or digital object has an identified
    geographical location.
  • Banks are using GIS for plotting the following
  • Branch and ATM locations
  • Customer demographics
  • Volume and traffic patterns of business
    activities
  • Geographical area served by each branch
  • Market potential for banking activities
  • Strengths and weaknesses against the competition
  • Branch performance

34
Geographic Information Systems (GIS)
  • GIS Software varies in its capabilities, from
    simple computerized mapping systems to enterprise
    wide tools for decision support data analysis.
  • GIS Data are available from a wide variety of
    sources. Government sources (via the Internet and
    CD-ROM) provide some data, while vendors provide
    diversified commercial data as well
  • GIS Decision Making.  The graphical format of
    makes it easy for managers to visualize the data
    make decisions.
  • GIS and the Internet or intranet. Most major GIS
    software vendors are providing Web access, such
    as embedded browsers, or a Web/Internet/intranet
    server that hooks directly into their software.
  • Emerging GIS Applications. 

35
Visual Interactive Modeling (VIM)
  • Visual interactive simulation (VIS) is one of the
    most developed areas in VIM.
  • It is a decision simulation in which the end-user
    watches the progress of the simulation model in
    an animated form using graphics terminals.
  • Visual interactive modeling (VIM) uses computer
    graphic displays to represent the impact of
    different management decisions on goals such as
    profit or market share.
  • A VIM can be used both for supporting decisions
    training.
  • It can represent a static or a dynamic system.

36
Virtual Reality
  • Virtual reality (VR) is interactive,
    computer-generated, three-dimensional graphics
    delivered to the user through a head-mounted
    display.
  • VR applications to date have been used to support
    decision making indirectly.
  • Boeing has developed a virtual aircraft mock-up
    to test designs.
  • At Volvo, VR is used to test virtual cars in
    virtual accidents.
  • Data visualization helps financial decision
    makers by using visual, spatial aural immersion
    virtual systems.
  • Some stock brokerages have a VR application in
    which users surf over a landscape of stock
    futures, with color, hue, and intensity.

37
Marketing Transaction Database
  • The Marketing transaction database (MTD) combines
    many of characteristics of static databases and
    marketing data sources into a new database that
    allows marketers to engage in real-time
    personalization and target every interaction with
    customers.
  • The MTD provides dynamic, or interactive,
    functions not available with traditional types of
    marketing databases.
  • Exchanging information allows marketers to refine
    their understanding of each customer
    continuously.
  • Data mining, data warehousing, and MTDs are
    delivered on the Internet and intranets.

38
Implementation Examples
  • The following examples illustrate how companies
    use data mining and warehousing to support the
    new marketing approaches
  • Alamo Rent-a-Car discovered that German tourists
    liked bigger cars. So now, when Alamo advertises
    its rental business in Germany, the ads include
    information about its larger models.
  • Au Bon Pain Company discovered that they were not
    selling as much cream cheese as planned. When
    they analyzed point-of-sale data, they found that
    customers preferred small, one-serving packaging.
  • ATT and MCI sift through terabytes of customer
    phone data to fine-tune marketing campaigns and
    determine new discount calling plans.

39
CASE Data Mining Powers Walmart
  • Wal-Marts formula for success owes much to the
    companys multimillion-dollar investment in data
    warehousing.
  • The systems house data on point of sale,
    inventory, products in transit, market
    statistics, customer demographics, finance,
    product returns, and supplier performance.
  • The data are used for three broad areas of
    decision support
  • analyzing trends
  • managing inventory
  • understanding customers
  • The data warehouse is available over an extranet
    to store managers and suppliers.
  • In 2001, 5,000 users made over 35,000 database
    queries each day.

40
Web-based Data Management Systems
  • Business intelligence activities from data
    acquisition, through warehousing, to mining can
    be performed with Web tools or are interrelated
    with Web technologies and e-Commerce.
  • e-Commerce software vendors are providing Web
    tools that connect the data warehouse with EC
    ordering and cataloging systems.
  • e.g. Tradelink, a product of Hitachi
  • Data warehousing and decision support vendors are
    connecting their products with Web technologies
    and EC.
  • e.g. Comshares DecisionWeb, Brios Brio One, Web
    Intelligence from Business Objects, and Cognoss
    DataMerchant.

41
Corporate Portals
42
Web-based Data Acquisition Agents
  • Intelligent Data Warehouse
  • The amount of data in the data warehouse can be
    very large.
  • While the organization of data is done in a way
    that permits easy search, it still may be useful
    to have a search engine for specific
    applications.
  • Web-based Data Acquisition
  • Traditional data acquisition has become a
    pervasive element in todays business
    environment.
  • This acquisition includes both the recording of
    information from online surveys and
    questionnaires, and direct measurements taken in
    the manufacturing environment.

43
Managerial Issues
  • Costbenefit issues justification. A
    costbenefit analysis must be undertaken before
    any commitment to new technologies.
  • Where to store data physically. Should data be
    distributed close to their sources? Or should
    data be centralized for easier control.
  • Legal issues. Data mining gives raise to a
    variety of legal issues.
  • The legacy data problem. What should be
    done with masses of information already stored in
    a variety of formats, often known as the legacy
    data acquisition problem?

44
Managerial Issues (cont.)
  • Disaster recovery. How well can an organizations
    business processes recover after an information
    system disaster?
  • Internal or external? Should a firm store
    maintain its databases internally or externally?
  • Data security and ethics. Are the companys
    competitive data safe from external snooping or
    sabotage?
  • Ethics. Should people have to pay for use of
    online data?
  • Privacy. Collecting data in a warehouse and
    conducting data mining may result in the invasion
    of privacy.
  • Data purging. When is it beneficial to clean
    house and purge information systems of obsolete
    or noncost-effective data?
  • Data delivery. A problem regarding how to move
    data efficiently around an enterprise also
    exists.
Write a Comment
User Comments (0)
About PowerShow.com