Designing a Data Warehouse - PowerPoint PPT Presentation

About This Presentation
Title:

Designing a Data Warehouse

Description:

The Storage Area Managed by Relational databases like those from Oracle Corp. or Informix Software Inc. Specialized hardware symmetric multiprocessor ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 22
Provided by: AlbertSch4
Learn more at: https://www.ou.edu
Category:

less

Transcript and Presenter's Notes

Title: Designing a Data Warehouse


1
Designing a Data Warehouse
  • Issues in DW design

2
Three Fundamental Processes
  • Data Acquisition
  • Data Storage
  • Data a
  • Access

3
(No Transcript)
4
Data Acquisition
  • Handles acquisition of data from legacy
    systems and outside sources.
  • Data is identified, copied, formatted and
    prepared for loading into the warehouse.

5
Acquisition steps
  • Catalog the data
  • Develop an inventory of where it is and what it
    means.
  • Clean and prepare the data.
  • Extract from legacy files and reformat to make it
    usable.
  • Transport data from one location to
    another.

6
Storage
  • The storage component holds the
    data so that the many different data mining,
    executive information and
    decision support systems can make use of it
    effectively.

7
The Storage Area
  • Managed by
  • Relational databases
  • like those from Oracle Corp. or Informix Software
    Inc.
  • Specialized hardware
  • symmetric multiprocessor (SMP)
  • or massively parallel processor
    (MPP) machines

8
Storage
  • The majority of warehouse storage today is being
    managed by relational databases running on Unix
    platforms.
  • Oracle, Sybase Inc., IBM Corp. and Informix
    control 65 percent of the warehouse storage
    market. Meta Group Inc. (1996)

9
Access
  • Different end-user PCs and workstations draw data
    from the warehouse with the help of
    multidimensional analysis products, neural
    networks, data discovery tools or analysis tools.
  • These powerful, "smart" software products are the
    real driving force behind the viability of data
    warehousing.

10
Access Tools
  • Intelligent Agents and Agencies
  • Query Facilities and Managed Query Environments
  • Statistical Analysis
  • Data Discovery.
  • (decision support, artificial intelligence and
    expert systems)
  • OLAP
  • Data Visualization

11
Hardware Budget
  • A typical startup warehouse project allocates
    more than 60 percent of its budget for hardware
    and software to the creation of a powerful
    storage component, spending just 30 percent on
    data mining and user access technologies.

12
Systems Analysis Budget
  • Budgeting for systems analysis and development,
    however, follows a very different pattern.
  • More than 50 percent of development dollars are
    spent on building acquisition capabilities,
  • 30 percent fund the development of user solutions
    and
  • 20 percent are dedicated to the creation of
    databases in the storage component.

13
(No Transcript)
14
Design Issues
  • Relational and Multidimensional Models
  • Denormalized and indexed relational models more
    flexible
  • Multidimensional models simpler to use and more
    efficient

15
Star Schemas in a RDBMS
  • In most companies doing ROLAP, the DBAs have
    created countless indexes and summary tables in
    order to avoid I/O-intensive table scans against
    large fact tables. As the indexes and summary
    tables proliferate in order to optimize
    performance for the known queries and
    aggregations that the users perform, the build
    times and disk space needed to create them has
    grown enormously, often requiring more time than
    is allotted and more space than the original
    data!

16
Building a Data Warehouse from a Normalized
Database
  • The steps
  • Develop a normalized entity-relationship business
    model of the data warehouse.
  • Translate this into a dimensional model. This
    step reflects the information and analytical
    characteristics of the data warehouse.
  • Translate this into the physical model. This
    reflects the changes necessary to reach the
    stated performance objectives.

17
The Business Model
  • Identify the data structure, attributes and
    constraints for the clients data warehousing
    environment.
  • Stable
  • Optimized for update
  • Flexible

18
Business Model
  • As always in life, there are some disadvantages
    to 3NF
  • Performance can be truly awful. Most of the work
    that is performed on denormalizing a data
    model is an attempt to reach performance
    objectives.
  • The structure can be overwhelmingly complex. We
    may wind up creating many small relations which
    the user might think of as a single relation or
    group of data.

19
Structural Dimensions
  • The first step is the development of the
    structural dimensions. This step corresponds very
    closely to what we normally do in a relational
    database.
  • The star architecture that we will develop here
    depends upon taking the central intersection
    entities as the fact tables and building the
    foreign key gt primary key relations as
    dimensions.

20
Simple DW pattern.
21
Other Dimensions
  • Categorical dimensions generated groups
    (additional key components)
  • Partitioning dimensions subtypes (planned vs.
    actual)
  • Informational dimensions generate different
    types of data (messy).
Write a Comment
User Comments (0)
About PowerShow.com