Title: External Unstructured Data And The Data Warehouse
1External / Unstructured Data And The Data
Warehouse
- Chapter 8
- Kumar Neti
- Amresh Mohanlal
- Susan Shanlever
2Internal Structured Data
- Data that comes internally from the corporation
and has been already shaped into a regularly
occurring format
3External Data
- Data that is of legitimate use to a corporation
that is not generated from the corporations own
systems - It enters the corporation in an unstructured,
unpredictable format - If external data is not stored in a centrally
located place, several problems are sure to arise - Data warehouse is an ideal place to store
external and unstructured data
4- The above figure shows external and
unstructured data entering the data - warehouse
5- The above figure shows
- When external data enters the corporation in an
undisciplined fashion - The identity of the source of the data is lost
- There is no coordination whatsoever in the
orderly use of the data
6External/Unstructured Data in the Data Warehouse
- Problems of external and unstructured data
- Are
- Frequency of availability
- Totally undisciplined
- Unpredictability
7Most Common Types of Unstructured Data
- Image data, stored as pictures
- Voice data, stored digitally and can be
translated back into voice format - - The technology to capture and
- manipulate image and voice data is not
- nearly as mature as more conventional
- technology.
8Methods to Capture and Store Unstructured
Information
- To place it on some bulk storage medium such as
near line storage - Create two stores of unstructured data
- - one store contain all of the unstructured
- data
- - Another is a much smaller store
- containing only a subset
9Meta Data and External Data
- Meta data is an important component of the data
warehouse - Above figure shows the importance and role of
metadata
10Meta Data and External Data contd
- Through meta data, the manager determines much
information about the external data - Scanning meta data eliminates much work because
it filters out documents that are not relevant or
are out of date - Properly built and maintained meta data is
absolutely essential to the operation of the data
warehouse, particularly with regard to the
external data
11Notification Data
- Below given figure shows notification data
- It is a file created for users of the system that
indicates classifications of data that is
interesting for the users
12Storing External/Unstructured Data
- An entry is made in the meta data of the
warehouse describing where the actual body of
external data can be found - The external data is then stored elsewhere, where
it is convenient as shown in the figure
13- External Data may be stored in/on
- - Filing Cabinet
- - Fiche
- - Magnetic Tape
14Components of External/Unstructured Data
- Contains many different components, some of which
are more use than others - Large amount of unstructured data can be
efficiently stored and managed in the following
manner - - To manage the data, an experienced DSS
- analyst needs to determine what are the most
- important units of data
- - Then these units are stored in an easy to get
- to location
- - The remaining less important data is placed
in - a bulk storage location
15Modeling and External/Unstructured Data
external data
unstructured data
data model
data warehouse
Figure 8.6 There is only a faint resemblance of
external data/unstructured data to a data model.
Furthermore, nothing can be done about reshaping
external data and unstructured data.
From Building the Data Warehouse, 3rd Ed. by W.
H. Inmon
16Secondary Reports
From Building the Data Warehouse, 3rd Ed. by W.
H. Inmon
17Archiving External Data
- The useful lifetime of external data
- Discard or archive
- Storage
18Comparing Internal Data to External Data
From Building the Data Warehouse, 3rd Ed. by W.
H. Inmon