Title: Data Management: Warehousing, Analyzing, Mining
1Chapter 11
- Data Management Warehousing, Analyzing, Mining
Vizualization
2Learning Objectives
- Recognize the importance of data, their
managerial issues, and their life cycle. - Describe the sources of data, their collection,
and quality issues. - Relate data management to multimedia and document
management. - Explain the operation of data warehousing and its
role in decision support.
3Learning Objectives (cont.)
- Understand the data access and analysis problem
and the data mining and online analytical
processing solutions. - Describe data presentation methods and explain
geographical information systems, visual
simulations, and virtual reality as decision
support tools. - Discuss the role and provide examples of
marketing databases. - Recognize the role of the Web in data management.
4Case Sears Data Warehouses
- Problem
- Sears was caught by surprise in the 1980s as
shoppers defected to specialty stores and
discount mass merchandisers. - Solution
- Sears constructed a single sales information data
warehouse, replacing 18 old databases which were
packed with redundant, conflicting obsolete
data. - By 2001, Sears made the following Web
initiatives - e-Commerce home improvement center
- B2B supply exchange for the retail industry
- Online Toy catalog and much more
5Case Sears Data Warehouses
- Results
- The ability to monitor sales by item per store
enables Sears to create a sharp local market
focus. - Data monitoring of Web-based sales helps Sears
marketing and Web advertisement plans. - Response time to queries has dropped from days to
minutes. - The data warehouse offers Sears employees a tool
for making better decisions. - Sears retailing profits have climbed more than 20
annually since the data warehouse was
implemented.
6Difficulties of Managing Data
- The amount of data increases exponentially.
- Data are scattered throughout organizations and
are collected by many individuals using several
methods and devices. - Only small portions of an organizations data are
relevant for any specific decision. - An ever-increasing amount of external data needs
to be considered in making organizational
decisions. - Data are frequently stored in several servers and
locations in an organization.
7Difficulties of Managing Data (cont.)
- Raw data may be stored in different computing
systems, databases, formats, and human and
computer languages. - Legal requirements relating to data differ among
countries and change frequently. - Selecting data management tools can be a major
problem because of the huge number of products
available. - Data security, quality, and integrity are
critical yet are easily jeopardized.
8Data Life Cycle
9Data Sources Collection
- Internal Data. An organizations internal data
are about people, products, services, and
processes. - Personal Data. IS users or other corporate
employees may document their own expertise by
creating personal data. - External Data. There are many sources for
external data, ranging from commercial databases
to sensors and satellites. - The Internet Commercial Database Services. Some
external data flow to an organization through
electronic data interchange (EDI), through other
company-to-company channels or the Internet.
10Data Quality
- Data Quality (DQ) is an extremely important issue
since quality determines the datas usefulness as
well as the quality of the decisions based on the
data.
11Data Quality Problems (Strong et al.,1997)
- Intrinsic DQ Accuracy, objectivity,
believability, and reputation. - Accessibility DQ Accessibility and access
security.
- Contextual DQ Relevancy, value added,
timeliness, completeness, amount of data. - Representation DQ Interpretability, ease of
understanding, concise representation, consistent
representation.
12Object-Oriented Databases
- The object-oriented database is the most widely
used of the newest methods of data organization,
especially for Web applications. - An object-oriented database is a part of the
object-oriented paradigm, which also includes
object-oriented programming, operating systems,
and modeling. - Object-oriented databases are sometimes referred
to as multimedia databases and are managed by
special multimedia database management systems.
13Document Management
- Document Management is the automated control of
electronic documents, page images, spreadsheets,
word processing documents, and complex, compound
documents through their entire life cycle within
an organization, from initial creation to final
archiving. - Benefits of Document Management
- Greater control over production, storage, and
distribution of documents - Greater efficiency in the reuse of information
- Control of a document through a workflow process
- Reduction of product cycle times
14Case U.S. Automobile Association (USAA)
- Problem
- The USAA is a large insurance company in Texas
that serves over 2 million officers. In the
1980s, the company experienced extreme delays in
data retrieval and searches. - Solution
- Using an environment called Automated Insurance
Environment, USAA has been transformed into a
completely paperless company. - Results
- The system reduces the cost of storing documents,
improves customer service, and improves
productivity of employees. - USAA now saves 70,500,000 for the 10,000,000
documents handled annually.
15Data Processing
Data processing in organizations can be viewed
either as transactional or analytical.
- Transactional
- The data in transactions processing systems (TPS)
are organized mainly in a hierarchical structure
and are centrally processed. - Databases and processing systems are known as
operational systems.
- Analytical
- Analytical processing involves analysis of
accumulated data, mainly by end-users. - Includes DSS, EIS, Web applications, and other
end-user activities.
16Delivery Systems
- A good data delivery system should be able to
support - Easy data access by the end-users themselves.
- A quick decision-making process.
- Accurate and effective decision making.
- Flexible decision making.
17Data Warehouses
- The purpose of a data warehouse is to establish a
data repository that makes operational data
accessible in a form readily acceptable for
analytical processing activities (e.g. decision
support, EIS) - Data warehouses include a companion called
metadata, meaning data about data. - Major Benefits of Data Warehouses
- (1) The ability to reach data quickly, as they
are located in one place. - (2) The ability to reach data easily, frequently
by end-users themselves, using Web browsers.
18Data Warehouses
19Characteristics of Data Warehouses
- Organization. Data are organized by detailed
subjects. - Consistency. Data in different operational
databases may be encoded differently. In the
warehouse they will be coded in a consistent
manner. - Time variant. The data are kept for 5 to 10 years
so they can be used for trends, forecasting, and
comparisons over time. - Non-volatile. Once entered into the warehouse,
data are not updated. - Relational. The data warehouse uses a relational
structure. - Client/server. The data warehouse uses the
client/server to provide the end user an easy
access to its data.
20Data Warehouse Suitability
- Data warehousing is most appropriate for
organizations in which some of the following
apply. - Large amounts of data need to be accessed by
end-users. - The operational data are stored in different
systems. - An information-based approach to management is in
use. - There is a large, diverse customer base.
- The same data are represented differently in
different systems. - Data are stored in highly technical formats that
are difficult to decipher. - Extensive end-user computing is performed.
21Data Marts
- Data Marts are an alternative used by many other
firms is creation of a lower cost, scaled-down
version of a data warehouse. They refer to small
warehouses designed for a strategic business unit
(SBU) or a department. - Two major types of Data Marts
- 1) Replicated (dependent) Data Marts. In such
cases one can replicate functional subsets of the
data warehouse in smaller databases. - 2) Stand-Alone Data Marts. A company can have
one or more independent data marts without having
a data warehouse.
22Knowledge Discovery in Databases (KDD)
- KDD is the process of extracting useful knowledge
from volumes of data. - It is the subject of extensive research.
- KDDs objective is to identify valid, novel,
potentially useful, and ultimately understandable
patterns in data. - KDD is useful because it is supported by three
technologies that are now sufficiently mature - Massive data collection
- Powerful multiprocessor computers
- Data mining algorithms
23Evolution of KDD
24Tools Techniques of KDD
- Ad-hoc queries allow users to request in real
time information from the computer that is not
available in the periodical reports. Such answers
are needed to expedite decision making. - Online analytical processing (OLAP) refers to
such end-user activities as DSS modeling using
spreadsheets and graphics, which are done online.
- Ready-made Web-based Analysis. Many vendors
provide ready made analytical tools, mostly in
finance, marketing, and operations.
25Data Mining
- Data mining derives its name from the
similarities between searching for valuable
business information in a large database,and
mining a mountain for valuable ore. - Data mining technology can generate new business
opportunities by providing these capabilities - Automated prediction of trends and behaviors.
Data mining automates the process of finding
predictive information in large databases. - Automated discovery of previously unknown
patterns. Data mining tools identify previously
hidden patterns in one step.
26Applications of Data Mining
Data Mining is currently being used in the
following areas
- Insurance
- Policework
- Government Defense
- Airlines
- Health care
- Broadcasting
- Marketing
- Retailing Sales
- Banking
- Manufacturing Production
- Brokerage Securities trading
- Computer hardware software
27Text Web Mining
- Text mining is the application of data mining to
non-structured or less structured text files. - Text mining helps organizations to do the
following - Find the hidden content of documents, including
additional useful relationships. - Group documents by common themes.
- Web Mining refers to mining tools used to analyze
a large amount of data on the Web, such as what
customers are doing on the Webthat is, to
analyze clickstream data.
28Data Visualization
- Data visualization refers to the presentation of
data by technologies such as digital images,
geographical information systems, graphical user
interfaces, multidimensional tables and graphs,
virtual reality, three-dimensional presentations,
and animation.
29CASE Data Visualization Helps Haworth
- Problem
- Haworth Corporation, a major office furniture
manufacturer, has maintained a competitive edge
by offering customization. - But many customers are unable to visualize the 21
million potential product combinations. - Solution
- Computer visualization software enables sales
representatives with laptops to show customers
exactly what they were ordering. - Results
- Reduction in time spent between sales reps and
CAD operators, increased customer satisfaction
with quicker delivery.
30Multidimensionality
- Modern data and information may have several
dimensions. - e.g. Management may be interested in examining
sales figures in a certain city by product, by
time period, by salesperson, and by store. - It is important to provide the user with a
technology that allows him or her to add,
replace, or change dimensions quickly and easily
in a table and/or graphical presentation. - The technology of slicing, dicing, and similar
manipulations is called Multidimensionality.
31Multidimensionality
- Three factors are considered in
multidimensionality
Examples of dimensions Products, salespeople,
market segments, business units, geographical
locations, distribution channels, countries,
industries.
- Examples of measures
- Money, sales volume, head count, inventory
profit, actual versus forecasted results.
Examples of time Daily, weekly, monthly,
quarterly, yearly.
32Advantages of Multidimensionality
- Data can be presented and navigated with relative
ease. - Multidimensional databases are easier to
maintain. - Multidimensional databases are significantly
faster than relational databases as a result of
the additional dimensions and the anticipation of
how the data will be accessed by users.
33Geographic Information Systems (GIS)
- A geographical information system (GIS) is a
computer-based system for capturing, storing,
checking, integrating, manipulating, and
displaying data using digitized maps. - Every record or digital object has an identified
geographical location. - Banks are using GIS for plotting the following
- Branch and ATM locations
- Customer demographics
- Volume and traffic patterns of business
activities - Geographical area served by each branch
- Market potential for banking activities
- Strengths and weaknesses against the competition
- Branch performance
34Geographic Information Systems (GIS)
- GIS Software varies in its capabilities, from
simple computerized mapping systems to enterprise
wide tools for decision support data analysis. - GIS Data are available from a wide variety of
sources. Government sources (via the Internet and
CD-ROM) provide some data, while vendors provide
diversified commercial data as well - GIS Decision Making. The graphical format of
makes it easy for managers to visualize the data
make decisions. - GIS and the Internet or intranet. Most major GIS
software vendors are providing Web access, such
as embedded browsers, or a Web/Internet/intranet
server that hooks directly into their software. - Emerging GIS Applications.
35Visual Interactive Modeling (VIM)
- Visual interactive simulation (VIS) is one of the
most developed areas in VIM. - It is a decision simulation in which the end-user
watches the progress of the simulation model in
an animated form using graphics terminals.
- Visual interactive modeling (VIM) uses computer
graphic displays to represent the impact of
different management decisions on goals such as
profit or market share. - A VIM can be used both for supporting decisions
training. - It can represent a static or a dynamic system.
36Virtual Reality
- Virtual reality (VR) is interactive,
computer-generated, three-dimensional graphics
delivered to the user through a head-mounted
display. - VR applications to date have been used to support
decision making indirectly. - Boeing has developed a virtual aircraft mock-up
to test designs. - At Volvo, VR is used to test virtual cars in
virtual accidents. - Data visualization helps financial decision
makers by using visual, spatial aural immersion
virtual systems. - Some stock brokerages have a VR application in
which users surf over a landscape of stock
futures, with color, hue, and intensity.
37Marketing Transaction Database
- The Marketing transaction database (MTD) combines
many of characteristics of static databases and
marketing data sources into a new database that
allows marketers to engage in real-time
personalization and target every interaction with
customers. - The MTD provides dynamic, or interactive,
functions not available with traditional types of
marketing databases. - Exchanging information allows marketers to refine
their understanding of each customer
continuously. - Data mining, data warehousing, and MTDs are
delivered on the Internet and intranets.
38Implementation Examples
- The following examples illustrate how companies
use data mining and warehousing to support the
new marketing approaches - Alamo Rent-a-Car discovered that German tourists
liked bigger cars. So now, when Alamo advertises
its rental business in Germany, the ads include
information about its larger models. - Au Bon Pain Company discovered that they were not
selling as much cream cheese as planned. When
they analyzed point-of-sale data, they found that
customers preferred small, one-serving packaging.
- ATT and MCI sift through terabytes of customer
phone data to fine-tune marketing campaigns and
determine new discount calling plans.
39CASE Data Mining Powers Walmart
- Wal-Marts formula for success owes much to the
companys multimillion-dollar investment in data
warehousing. - The systems house data on point of sale,
inventory, products in transit, market
statistics, customer demographics, finance,
product returns, and supplier performance. - The data are used for three broad areas of
decision support - analyzing trends
- managing inventory
- understanding customers
- The data warehouse is available over an extranet
to store managers and suppliers. - In 2001, 5,000 users made over 35,000 database
queries each day.
40Web-based Data Management Systems
- Business intelligence activities from data
acquisition, through warehousing, to mining can
be performed with Web tools or are interrelated
with Web technologies and e-Commerce. - e-Commerce software vendors are providing Web
tools that connect the data warehouse with EC
ordering and cataloging systems. - e.g. Tradelink, a product of Hitachi
- Data warehousing and decision support vendors are
connecting their products with Web technologies
and EC. - e.g. Comshares DecisionWeb, Brios Brio One, Web
Intelligence from Business Objects, and Cognoss
DataMerchant.
41Corporate Portals
42Web-based Data Acquisition Agents
- Intelligent Data Warehouse
- The amount of data in the data warehouse can be
very large. - While the organization of data is done in a way
that permits easy search, it still may be useful
to have a search engine for specific
applications.
- Web-based Data Acquisition
- Traditional data acquisition has become a
pervasive element in todays business
environment. - This acquisition includes both the recording of
information from online surveys and
questionnaires, and direct measurements taken in
the manufacturing environment. -
43Managerial Issues
- Costbenefit issues justification. A
costbenefit analysis must be undertaken before
any commitment to new technologies. - Where to store data physically. Should data be
distributed close to their sources? Or should
data be centralized for easier control. - Legal issues. Data mining gives raise to a
variety of legal issues.
- The legacy data problem. What should be
done with masses of information already stored in
a variety of formats, often known as the legacy
data acquisition problem?
44Managerial Issues (cont.)
- Disaster recovery. How well can an organizations
business processes recover after an information
system disaster? - Internal or external? Should a firm store
maintain its databases internally or externally? - Data security and ethics. Are the companys
competitive data safe from external snooping or
sabotage? - Ethics. Should people have to pay for use of
online data?
- Privacy. Collecting data in a warehouse and
conducting data mining may result in the invasion
of privacy. - Data purging. When is it beneficial to clean
house and purge information systems of obsolete
or noncost-effective data? - Data delivery. A problem regarding how to move
data efficiently around an enterprise also
exists.