Data Warehousing Version 5.0 - 10/13/99 - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Data Warehousing Version 5.0 - 10/13/99

Description:

Data Warehousing Version 5.0 - 10/13/99 – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 45
Provided by: Davi1338
Learn more at: http://mis.njit.edu
Category:

less

Transcript and Presenter's Notes

Title: Data Warehousing Version 5.0 - 10/13/99


1
Data WarehousingVersion 5.0 - 10/13/99
2
Can your database answer questions like these?
  • What is the cost of staff to break into a new
    line of business?
  • What are the travel routes of my competitions
    inventory?
  • At what velocity is my competitor moving toward a
    common goal?
  • How will a transaction on a certain date be
    affected by currency exchange rates?
  • Is a foreign labor source likely to produce a
    higher quality product?
  • Which 20 of the problem creates 80 of the
    problems?

3
Can your database answer questions like these?
  • By product and location, how can we regain a lost
    customer base?
  • Which skill and staff levels are most likely to
    accept the voluntary layoff package?

4
What is a data warehouse?
  • A data warehouse is a pool of data organized in a
    format that enables users to interpret data and
    convert it into useful information to gain
    knowledge from this interpretation.
  • It is a single place that contains complete and
    consistent data from multiple sources.
  • Data warehousing is the act of a business person
    extracting business value from the data stored in
    the data warehouse.

5
Why Data Warehousing?
  • Managers do not make decisions that are good or
    bad, they make decisions on the basis of good
    or bad information.
  • Management information
  • a. the right information
  • b. in the right form
  • c. at the right time.
  • Most transaction-based information systems have
    difficulty delivering this information.

6
Why Data Warehousing - 2
  • Not the right information
  • data not easily accessible
  • meaning is subtly (or significantly different)
    from the question context.
  • Information is presented with too much or too
    little detail, covers the wrong time spans, or is
    in the wrong intervals.

7
Why Data Warehousing - 3
  • Not the right time
  • Getting this information may require the efforts
    of highly skilled professionals who are not
    generally available at the whim of business
    managers.
  • Data comes from a variety of different systems
    which are resident on a variety of different
    technology platforms.

8
Why Data Warehousing - 4
  • Not the right format
  • If data is extracted, merged, and converted into
    a meaningful information, often it is not in a
    usable format.
  • Users will want it loadable into a particular PC
    tool or spreadsheet with which he/she is
    familiar.
  • Printouts weighing 10 pounds are not in the right
    format.
  • a diskette with a COBOL file description is not
    in the right format.

9
The Dilemma for Corporate IT
  • How to control scarce IT resources consumed by
    insatiable user demand for ad-hoc reports.
  • Each ad-hoc report generated by IT and analyzed
    by the user generates three more reports to
    further illuminate the insights gleamed in the
    first.
  • Often the extract programs have few reusable
    components.
  • The user is on a voyage of discovery in a sea of
    data.

10
The Response of Corporate IT
  • New methodologies Align the IT systems with the
    business goals and requirements.
  • These techniques concentrate on business process
    requirements, not decision support requirements.
  • Transaction systems must be rigorously specified
    in advance. The are an intersection between the
    organization and the customer.
  • These systems should not be a voyage of
    discovery for either.

11
Transaction Systems vs. Analytical Support Systems
  • Transaction Systems
  • Insert an order for 300 baseballs
  • Update this passengers airline reservation.
  • close-out accounts payable records for this
    vendor.
  • What is the current checking account balance for
    this customer?
  • Analytical Support Systems
  • Did the sales promotion last quarter do better
    than the same promotion last year?
  • Is the five-day moving average for this security
    leading or trailing actual prices?
  • Which product line sells best in middle-America
    and how does this correlate to demographic data.

12
Key Difference OLTP vs. OLAP
  • OLTP Processing specific functions
  • OLAP providing flexibility for undetermined
    analysis.

13
A Multidimensional database
14
Data for Decision Support
  • The data must be integrated - requires data from
    many separate internal corporate databases.
  • The data must be enriched - through integration
    with other external data.
  • The data must be available - and not constrained
    by machine resources.

15
Sources of Data
  • Internal Data
  • Financial Systems
  • Logistics Systems
  • Sales Systems
  • Production Systems
  • Personnel Systems
  • Billing Systems
  • Information Systems
  • External Data Needs
  • to recognize opportunities
  • to detect threats
  • to identify synergies

16
Sources of Data - 2
  • External Data Categories
  • Competitor Data
  • Economic Data
  • Industry Data
  • Credit Data
  • Commodity Data
  • Econometric Data
  • Psychometric Data
  • Meteorological Data
  • Demographic Data
  • Sales Marketing Data

17
Operational Control vs. Operational Strategy
  • Data is a source not just of operational control,
    but of operational strategy.
  • Operational strategy is an attempt to describe
    the need, in a competitive and turbulent market,
    to continually innovate and re-align strategy
    with time scales too short to be comprehended by
    strategic planning in the conventional corporate
    sense.

18
Comparison of Control and Strategy Data
  • Operational Data
  • short-lived, rapidly changing
  • requires record-level access
  • repetitive standard transactions and access
    patterns
  • updated in real-time
  • event-driven process generates data
  • Strategic Data
  • long-living, static
  • data aggregated into sets (which is why warehouse
    data is friendly to RDBMS).
  • ad-hoc queries with some periodic reporting
  • updated periodically with mass loads
  • data-driven data governs process.

19
Information Requirements by Management
Level(Source Gorry and Scott Morton)
20
Dimensional Modeling
  • Dimensional Modeling gives us a way to visualize
    data.
  • The CEOs perspective
  • We sell products in various markets, and we
    measure our performance over time.
  • From the data warehouse designers perspective,
    we hear three dimensions
  • We sell Products
  • in various Markets
  • and measure performance over time.

21
Dimensional Model of the Business
e
m
i
T
M a r k e t
Product
22
Data Dependencies Model of a Business
Product
Ship To
Shipper
Ship Type
Product Line
District Credit
Contact Location
Order Item
Sales Order
Customer Location
Product Group
Contact
Contract
Customer
Contract Type
Sales Rep
Sales District
Sales Region
Sales Division
23
Transaction Processing
  • The Relation Model was full of promises for equal
    access to data.
  • In the early 1980s the relational model was a
    dream. Typical transaction rates were one per
    second.
  • Today the SABRE system typically processes 4,000
    transactions per second, with peak bursts of
    13,000 per second.
  • OLTP - (On-line Transaction Processing) The point
    is to get data in to the database.

24
Segregating Operational and Warehouse Data
  • In the past, data administrators were constantly
    told to build data sharing, normalization, and
    non-redundant corporate databases.
  • Early attempts at data warehousing tried to share
    the data with transaction-based systems. This
    resulted in LONG response times for complex
    queries.
  • The idea today is to keep the two separate.
  • Separate databases, and perhaps separate DBMS
    products and processor platforms are used.
  • Controlled and practical redundancy is better
    than out-of-control theoretical purity.

25
Fundamental Obstacles With Traditional Systems
  • Systems Integration - Disintegration grew slowly
    from islands of automation.
  • ownership, planning, economic, organizational
    development issues all contribute.
  • Hardware Architecture
  • Inconsistent Data
  • Data Pollution
  • Bad Application Design (semantic and syntactical
    differences).
  • Ownership
  • Data Entry Conventions

26
The Data Warehouse
  • Active, tactical, and current events flow from
    the operational systems to the data warehouse to
    become static, strategic, and historical data.
  • The data warehouse becomes a middle ground
    where a large number of disparate and
    incompatible legacy systems are tied to an
    equally diverse collection of end-user
    workstations.
  • Legacy systems usually comprise a hodge-podge of
    assorted hardware, software, and operational
    systems accumulated over many decades, are by
    nature, incompatible with one another and unique
    to each organization.

27
Practical Facts About the Warehouse
  • The chances are remote that any single vendor
    will be able to develop a product that can
    interface with all legacy systems painlessly
    and seamlessly.
  • Instead warehouse product vendors develop
    specialized capabilities to work with various
    environments.

28
Typical Dimensional Model
Product Dimension
Sales Fact
Product_key description brand category
Time_key product_key store_key dollars_sold units_
sold dollars_cost
Time Dimension
Time_key day-of-week month quarter year holiday_fl
ag
Store Dimension
Store_key store_name address floor_plan_type
29
Fact Table
  • Fact Table is where numerical measurements of the
    business are stored.
  • Each measurement is taken at the intersection of
    all the dimensions.
  • The best facts are numeric, continuously valued
    and additive.
  • For every query made against the fact table may
    use hundreds of thousands of individual records
    to construct an answer set.

30
Dimension Tables
  • Dimension tables are where textual descriptions
    of the business are stored.
  • Each textual description helps to describe a
    member of the dimension.
  • Example each member in the product dimension is
    a specific product. The product dimension
    database has many attributes to describe the
    product. A key role of the dimension table
    attribute is to serve as the source of
    constraints in a query.

31
Example
32
Example Query
  • Find all product brands that were sold in the
    first quarter of 1995 and present the total
    dollar sales as well as the number of units.
  • Brand is a collection of individual products.
  • To construct
  • A. Drag attribute brand from product dimension.
    Place as Row Header.
  • B. Drag Dollar Sales and Units Sold from the
    Fact Table, and place to the right of the Brand
    row header.
  • C. Specify row constraint 1st Q 1995 on the
    quarter attribute in the Time Dimension Table.

33
Components of a Data Warehouse - 1
  • Acquisition - The first component handles
    acquisition of data from legacy systems and
    outside sources.
  • Data is identified, copied, formatted, and
    prepared for loading into a warehouse.
  • Vendors provide tools for extraction and
    preparation.

34
Components of a Data Warehouse - 2
  • Storage Area - The second component is the
    storage area managed by relational databases,
    multi-dimensional databases, specialized hardware
    - symmetric multiprocessor (SMP) or massively
    parallel processors (MPP) machines - or by
    software.
  • The storage component hold the data so that many
    different data mining, executive information and
    decision support systems can make use of it
    effectively.

35
Components of a Data Warehouse - 3
  • Access - The third component of the warehouse is
    the access area.
  • Different end-user PCs and workstations draw data
    from the warehouse with the help of
    multi-dimensional analysis tools, neural
    networks, data discover tools, or analysis tools.
  • These smart data-mining tools are the driving
    force behind the data warehouse concept.
  • What good is it to store all the information
    without some way to understand it in new and
    different ways.

36
Data Warehouse Access Tools
  • Intelligent Agents and Agencies - tools work and
    think for user.
  • Query Facilities and Managed Query environments.
  • Statistical Analysis - One of the biggest
    surprises in the data warehousing marketplace is
    the resurgence of interest in traditional
    statistical analysis, and the concomitant
    resurrection of the popularity to products like
    SAS and SPSS.

37
Data Warehouse Access Tools - 2
  • Data Discovery - A large class of tools formerly
    classified as decision support, artificial
    intelligence and expert systems. They now make
    use of neural networks, fuzzy logic, decision
    trees, and other tools from advanced mathematics
    to allow a user to sift through massive amounts
    of raw data to discover new, interesting,
    insightful, and in many cases useful things about
    the organization, its operations, and its
    markets.
  • Currently there are nearly 60 different data
    discovery tools/products on the market.

38
Data Warehouse Access Tools - 3
  • OLAP - On-line Analytical Processing often uses
    multi-dimensional spreadsheet tools allowing
    users to look at information from many different
    angles.
  • Users are able to slice and dice reports and to
    look at the same kinds of information at
    different levels at the same time.
  • Typical OLAP application might allow a product
    manager to view sales figures for a given product
    at the national level, see them broken down by
    division, drill down to see territories within a
    division, check sales numbers for each store
    within a territory, and then compare them against
    sales of stores from another territory.

39
Data Warehouse Access Tools - 4
  • Data Visualization - These tools turn ugly,
    boring numbers into exciting visual
    presentations.
  • These tools bring graphical representation to new
    heights. Example Geographical information
    systems turn data about stores, individuals, or
    anything else into compelling, easy to
    understand, dynamic maps.
  • PC-based Geographic Information systems have the
    ability to display spatial occurrences and the
    relationship between and among geographically
    specific variables.

40
Developing the Data Warehouse
  • The most expensive warehousing ventures involve
    major new hardware acquisitions and significant
    investments in training, analysis, and systems
    development costs.
  • Typical startup projects allocate 60 of budget
    for hardware and software for creation of a
    powerful storage component. 30 on data mining
    and acquisition tools.
  • Budgeting for Systems Analysis and Development
    has 50 of budget on acquisition capabilities,
    30 fund user solutions, 20 creation of
    databases in the storage component.

41
Developing the Data Warehouse
  • Clarify what you want to do with the Warehouse -
    How Will It be Used.
  • Scrutinize the offerings of vendors and systems
    integrators. Make sure you understand which
    functions they provide, and which you must build.
  • Most successful projects start as small, tightly
    defined tactical systems to solve pressing
    business needs, and develop into larger systems
    over time.

42
DW Summary Key Concepts
  • The DW is a collection of integrated,
    subject-oriented databases designed to support
    the decision support function where each unit of
    data is non-volatile and relevant to some moment
    in time (W.H. Inmon, 1992).
  • Implicit Assumptions
  • physically separate from operational systems
  • hold aggregated data and transactional (atomic)
    data for management separate from those used for
    OLTP.

43
DW Summary Characteristics
  • Subject-orientation
  • integrated
  • non-volatile
  • time variant
  • summarized
  • large volume
  • not normalized
  • metadata
  • data sources

44
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com