Data Warehousing - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

Data Warehousing

Description:

Data Warehousing Denis Manley Enterprise Systems FT228/3 Data Warehousing A data warehouse is a repository for all an organisations data. It is designed to present ... – PowerPoint PPT presentation

Number of Views:217
Avg rating:3.0/5.0
Slides: 79
Provided by: dman84
Category:

less

Transcript and Presenter's Notes

Title: Data Warehousing


1
Data Warehousing
  • Denis Manley
  • Enterprise Systems
  • FT228/3

2
Data Warehousing
  • A data warehouse is a repository for all an
    organisations data. It is designed to present
    data in very user-friendly formats to end-users.
  • Why is Data Warehousing necessary?
  • think of a typical company that uses computers
    for order processing, invoicing, etc.
  • Data is gathered for every transaction that
    occurs. Different departments probably use
    different systems, each designed for a specific
    task.

3
Data Warehousing (contd)
  • There will be lots of different database Tables,
    in different formats, on different machines.
    There will be tie-ups between data items, but
    those tie-ups may not be obvious outside the
    applications that manipulate the tables.
  • Also, the amount of data is constantly
    increasing. Some estimates state that the size of
    a typical data processing system doubles every
    two years.
  • Both standard, and ad-hoc, reports must be
    catered for, and they must run in a reasonable
    time-frame.

4
Data Warehousing (contd)
  • Some solutions to the issues listed can be
    implemented in the applications that manage
    specific data.
  • E.G. if you want to ask How much does this
    customer owe? then the original package is
    probably the one to use.
  • But if you want to ask Was this ad campaign more
    successful than that one?, you require data from
    more disparate sources, and one application may
    not provide all of it.

5
Data Warehousing (contd)
  • The new alternative is a Data Warehouse
  • a D.W. is a way of organising data from a wide
    range of formats, possibly from different
    locations, so its all available in a single
    location.
  • Once this stage is complete, the collection of
    data is usually frequently replicated around
    multiple locations.
  • This means users have a local copy of the data
    they need to inspect. This improves query
    run-times, and reduces communications overheads.

6
Data Warehousing (contd)
  • The data is not only collected and joined,
    however. What the data means is described in a
    way that the end-users can easily understand,
    e.g. using the name Customer Account Number
    instead of ordergtcacno for a field.
  • The data must be quickly accessible, so that
    users can find data quickly and easily.

7
Data Warehousing process
Operational data in legacy systems e.g. OLTP apps.
Data fusion Assembles diverse data Data
Cleansing fixing data
Meta Data shows transformations of data,origins
of data, etc.
Data Migration load data and meta-data into
warehouse periodically
Data Warehouse
Decision-support analyst queries
8
Data Warehousing Architecture
  • A data warewhouse is an architectural construct
    of an information system that provides users with
    current and historical decision support
    information that is hard to access or present in
    traditional operational data stores.
  • It comprises a number of components illustrated
    in figure 1

9
Data Warehousing Architecture
Figure 1 A datawarehouse architecture
10
Data Warehousing Architecture
  • Data Warehouse database
  • This is a cornerstone of the data warehouse (item
    2 in figure 1)
  • Almost always implemented on a relational
    database management system
  • However very large databases, ad hoc processing
    and the need for flexible user views e.g.
    aggregates and drill downs are calling for
    different technological approaches
  • Parallel relational database designs requiring
    the use of symmetric multiprocessors, Massively
    parallel processors and clusters

11
Data Warehousing Architecture
  • Speeding up traditional RDBMS
  • The use of Multidimensional databases (MDDBs)
    which are tightly coupled to on-line analytical
    processing

12
Data Warehousing Architecture
  • Sourcing, acquisation, cleanup and transformation
    of data
  • Implementing data warehouses involves extracting
    data from operational systems including legacy
    systems and putting it into a suitable format.
  • The various tools are illustrated as item 1 in
    figure 1
  • These tools perform all the conversions,
    summarisations, key changes, structural changes,
    and condensations needed to transform disparate
    data into information can be used by decision
    support tools

13
Data Warehousing Architecture
  • It produces programs and control statements
    required to move data into the data warehouse
    form multiple operational systems
  • Maintains the metadata
  • Remove unwanted data
  • Converts to common data names and definitions
  • Calculates summaries
  • Establish defaults for missing data
  • Keep track of source data definition changes

14
Data Warehousing Architecture
  • The tools have to deal with
  • Database hetergeneity DBMS can be very different
    in data models, data access language etc.
  • Data hetergeneity the difference in the way data
    is defined and used e.g. synonyms and different
    attributes for the same entity etc.

15
Data Warehousing Architecture
  • Metadata (item 3 figure1) data about data that
    describes the datawarehouse
  • Technical metadata contains information for
    warehouse designers and administrators
  • Information about data sources
  • Transformation descriptions
  • Rules used to perform data clean up
  • Access authorisation, information delivery
    history, data acquisation history, data access
    etc
  • Business metadata information that gives users
    an understanding of the information stored in the
    data warehouse.
  • Queries, reports images
  • Data warehouse operational information e.g. data
    history and ownership

16
Meta Data Versioning
  • In the operational environment, there tends to be
    a single instance of data and meta data at any
    one moment in time.
  • In the data warehouse environment, there are
    multiple definitions of data and meta data over
    an historically long period of time.
  • Therefore, versioning of data warehouse meta data
    is crucial to the success of the data warehouse
    vis-a-vis the end users ability to access and
    understand the data in the data warehouse.

17
Guidelines for Metadata Management
  • Develop an Information Directory that integrates
    technical and business metadata.
  • Keep the metadata current and accurate!
  • Maintain the time variant history of the
    metadata!
  • Provide meaningful descriptions and definitions
    (use business definitions, not technical
    definitions, where possible).
  • The end users must be educated about what
    metadata is, how to access it, how to use it, etc.

18
Meta Data Answers Questions for Users of the
Data Warehouse
  • How do I find the data I need?
  • What is the original source of the data?
  • How was this summarization created?
  • What queries are available to access the data?
  • How have business definitions and terms changed
    over time?
  • How do product lines vary across organizations?
  • What business assumptions have been made?

19
The Role of Meta Datain the Data Warehouse
Architecture
Meta Data enables data to become information,
because with it you
  • Know what data you haveand
  • You can trust it!

20
Data marts (item 4 figure 1)
  • A data mart is a data store that is subsidary to
    a data warehouse of intergrated data.
  • The data mart is directed at a partition of data
    (subject area) that is created for the use of a
    dedicated group of users and is sometimes termed
    a subject warehouse
  • The data mart might be a set of denormalised,
    summarised or aggregated data that can be placed
    on the data warehouse database or more often
    placed on a separate physical store.
  • Data marts can be dependent data marts when the
    data is sourced from the data warehouse.
  • Independent data marts represent fragmented
    solutions to a range of business problems in the
    enterprise, however, such a concept should not be
    deployed as it doesnt have the data
    intergration concept thats associated with data
    warehouses.

21
Independent Data Marts
Data Mart
Systems of Record
Extract, Transform, Clean, Integrate, Summarize,
etc....
22
Independent Data Marts
Extract, Transform, Clean, Integrate, Summarize,
etc....
Three Times!
23
Independent Data Marts

Significant and expensive duplication of effort
and data.
24
Independent Data Marts


Maintenance of proliferating unarchitected marts
expensive and cumbersome.
25
Unarchitected Data Marts
?
There may be metadata for some marts, but what
about consistency history?
26
Contrast Architected Data Warehouse
Data Access
Data Warehouse
Deptl
Deptl
Deptl
Systems of Record
Dependent (Architected) Departmental Marts
with the Appropriate Subset of Metadata
27
Independent Data Marts vs. The Real Thing
  • Architected to meet organizational as well as
    departmental requirements
  • Data and results consistent
  • Redundancy is managed
  • Detailed history available for drill-down
  • Metadata is consistent!
  • Easy to do, but...
  • Are the extracts, transformations, integrations
    loads consistent?
  • Is the redundancy managed?
  • What is the impact on the sources?

28
Independent Data marts
  • However, such marts are not necessarly all bad.
  • Often a valid solution to a pressing business
    problem
  • Extremely urgent user requirements
  • The absence of a budget for a full data warehouse
  • The decentralisation of business units

29
Data Warehousing Architecture
  • Access Tools (item 5 figure 1)
  • The principal purpose of the data warehouse is to
    provide information for strategic decision
    making.
  • The main tools used to achieve this objective
    are
  • Data query and reporting tools
  • Executive information system tools
  • On-line analytical processing tools
  • Data mining tools

30
Data Warehousing Architecture
  • Query and reporting tools
  • Reporting tools
  • Production reporting tools e.g. generate
    operational reports
  • Report writers inexpensive desktop tools
  • Managed Query tools
  • Shield users from the complexities of SQL and
    database structures by inserting a metalayer
    between the users and the database

31
A Few Definitions
  • OLAP
  • On-Line Analytical Processing
  • A set of functionality that attempts to
    facilitate multidimensional analysis
  • Multidimensional Analysis
  • The ability to manipulate information by a
    variety of relevant categories or dimensions to
    facilitate analysis and an understanding of that
    data
  • Has also been called as drill-down,
    drill-across and slice and dice

32
A Few Definitions
  • Hypercube
  • A means of visually representing multidimensional
    data
  • Star Schema
  • A means of aggregating data based on a set of
    known dimensions, attempting to store data
    multidimensionally in a two dimensional RDBMS
  • Snowflake Schema
  • An extension of the star schema by means of
    applying additional dimensions to the dimensions
    of a star schema in a relational environment

33
A Few Definitions
  • Multidimensional Database
  • Also known as MDDB or MDDBS
  • A class of proprietary, non-relational database
    management tools that store and manage data in a
    multidimensional manner, as opposed to the two
    dimensions associated with traditional relational
    database management systems
  • OLAP Tools
  • A set of software products that attempt to
    facilitate multidimensional analysis
  • Can incorporate data acquisition, data access,
    data manipulation, or any combination thereof

34
A Few Definitions
  • ROLAP
  • Relational OLAP
  • Using an RDBMS to implement an OLAP environment
  • Typically involves a star schema to provide the
    multidimensional capabilities
  • OLAP tool manipulates RDBMS star schema data

35
A Few Definitions
  • MOLAP (1)
  • Multidimensional OLAP
  • Using an MDDBS to store and access data
  • MDDBS directly manages data multidimensionally
  • Usually requires proprietary (non-SQL) access
    tools
  • MOLAP (2)
  • OLAP tool facilitates multidimensional
    capabilities without the need for a star schema
  • Often utilizes a 3-Tier environment, where middle
    tier server preprocesses data from an RDBMS
  • Some OLAP tools access an RDBMS directly and
    build cubes as a fat client

36
How Can OLAP Be Accomplished?
  • Use the Data Warehouse as the architected
    foundation for the organizations informational
    processing requirements
  • Use the appropriate design techniques to ensure
    that the data required is at the appropriate
    degree of granularity at the atomic level, and
    the appropriate degree of summarization at the
    departmental level
  • Use the appropriate tools to either access
    relational data in a multidimensional manner, or
    manage multidimensional data

37
Data mining
  • A CSF for any business is its ability to
    effectively use information.
  • This strategic use can occur from discovering
    hidden, undected and frequently valuable
    information about customers suppliers retailers
    etc.
  • This information can be used to formulate
    effective business, marketing and sales
    strategies.
  • A relatively new technology that can be used to
    achieve this strategic advantage is data mining

38
Data visualisation
  • Data visualisation is a method of presenting the
    output of the previously mentioned methods in
    such a way that the problem or solution is
    clearly visible to domain experts and even casual
    observers.
  • It goes way beyond simple bar and pie charts
  • It is a collection of complex techniques that
    focus on determining the best way to display
    complex patterns on a 2-D computer monitor.
  • Such techniques involve experimenting with
    various colours, shapes 3-d images, sound and
    vittual reality to help users really see and feel
    the problem and solution.

39
DW administration and management(Item 6 figure 1)
  • Managing data warehouses involves
  • Security and priority management
  • Monitoring updates from multiple sources
  • Data quality checks
  • Managing and updataing metadata
  • Replicating, subsetting and distributing data
  • Backup and recovery
  • Data warehouse storage management e.g. capacity
    planning, hierarchical storage management,
    purging of aged data.

40
Information delivery system(Item 7 figure 1)
  • The IDS is used to enable the process of
    subscribing for data warehouse information and
    having it delivered to one or more destinations
    of choice according to some user-specified
    scheduling algorithm
  • IDS may be based on time of day or on completion
    of an external event.
  • IDS can be achieved by Client/Server architecture
    and now by Interner/intranet and World Wide Web.

41
Top 10 Donts of Data Warehousing
42
10. Pre-selecting Your Technical Environment
  • This is a very common trap in which many
    organizations find themselves. It is traditional
    to select the hardware, software and other
    physical, technical components of a system as one
    of the earliest activities.
  • However, a data warehouse is an unstable
    environment from a sizing perspective. How do
    you know the hardware/RDBMS/end user tool is
    appropriate for your data warehouse before
    conducting even the first round of analysis?
  • If at all possible, wait to select your technical
    environment until after you have analyzed the
    business requirements for information, data, and
    potential systems of record.

43
9. Allowing Access Tool to Determine Data
Architecture
  • This is an extension of 10, but is important
    enough to list by itself.
  • If you select an end user tool before developing
    your data architecture, it is very likely that
    that architecture will suffer at the hand of
    design requirements delivered by the tool.
  • If you have to sacrifice design requirements in
    order to meet functional requirements of a tool,
    it is probably time to put that tool aside and
    select another one.

44
8. Unarchitected Data Marts
  • OK. Data marts are good they are an essential
    part of the data warehouse architecture. But to
    build only a data mart and to ignore the rest of
    the data warehouse (specifically the atomic level
    data and centralized meta data) will lead you
    down a path that will be more expensive and
    deliver less quality of data than the
    alternative.
  • The alternative is to architect and build the
    data warehouse incrementally, iteratively.
    Include data marts as departmental instances of
    the architecture, and populate them from the
    atomic level data. This will ensure accuracy
    across the architecture, and reduce costs by
    eliminating unnecessary population of stand-alone
    data marts.

45
7. Boiling the Ocean
  • It is more efficient to implement the data
    warehouse in small, achievable and palatable
    chunks than to try to implement it all at once.
    When I say boil the ocean, I mean trying to do
    too many things for too many people all at the
    same time.
  • There is an old adage You can have everything
    where would you put it all? The same holds true
    for a data warehouse. If you try to design,
    develop and implement a data warehouse that is
    all-encompassing as your first iteration, how
    will the users be able to use all that you
    delivered? And in the mean time, while youve
    been trying to meet all of their needs, you have
    failed to meet any needs. And users wont forget
    that for a long time.

46
6. If you build it they will come
  • If you design, develop and implement an
    operational system, such as an order processing
    system, that new system is typically going to
    replace an existing system. In other words, the
    old system goes away and the users have no choice
    but to use the new one. Not so with the d/w.
  • If you build it implies an analysis that
    includes only bottom-up activities. It is
    crucial to the success of a data warehouse that a
    top-down analysis of user requirements for
    information be conducted.
  • After that, users must be tutored, mentored and
    otherwise have their hands held as part of the
    implementation of the data warehouse. Existence
    does not guarantee utilization and, therefore,
    value.

47
5. Putting ROI before RFI (Requirements for
Information)
  • It is very difficult to quantify the intangible
    benefits that a data warehouse can provide to an
    organization. How can you put a price on
    increased customer loyalty. Somewhere, sometime,
    someone has probably made this calculation. In
    most cases, however, the determination of how
    beneficial the data warehouse will be is based on
    criteria that was developed for operational
    systems. Just as you cannot use operational data
    to meet your strategic informational
    requirements, it is difficult to calculate the
    return on investment (ROI) of a data warehouse.
  • In terms of benefits to the organization, it is
    more appropriate to concentrate on how well the
    data warehouse addresses the target users
    requirements for information.

48
4. No Committed User Involvement
  • Write this down
  • The success of any data warehouse is directly
    proportional to the amount of end user
    participation!
  • A data warehouse cannot be successful without
    active participate on the part of the target
    users. Period. If you do not have user
    participation, you will find yourself in a
    situation where you will build it and hope that
    they will come.
  • If there is no serious user participation in a
    data warehouse project, you have to seriously
    question whether or not the organization truly
    needs a data warehouse.

49
3. No Dedicated DBA
  • In many situations the lack of a dedicated
    database administrator (DBA) has prevented a data
    warehouse project to be complete 1) on time, or
    2) successfully.
  • Borrowing a DBA from the operational pool
    will only result in questions about the nature of
    the data warehouse data models and database
    design. Its too flat, not normalized properly,
    too much redundancy, and other criticisms are
    well suited for an operational systems database
    design, but not a data warehouse.
  • Considering that data is the first word in
    data warehouse, be sure you have a dedicated
    database administration resource committed to
    this important project.

50
2. No Meta Data
  • Meta data is like documentation and training
    Everyone knows it is necessary, but it usually
    gets dropped somewhere along the route to
    implementation.
  • For the data warehouse, meta data is more
    important than just your typical documentation.
    Remember, in order to turn data into information
    you have to have the data, know that you have it,
    be able to access it, and trust it. Meta data is
    the means by which the users will be able to
    understand and trust the data. A time-variant
    record of where data came from, what happened to
    it along the way, where it is in the data
    warehouse, and what vehicles exist to access it
    will spell the difference between success and
    frustration.

51
1. Analysis Paralysis
  • The inability to proceed past a sticking
    question. Wanting to boil the ocean and
    model/design everything before proceeding with
    development. Having to resolve political issues
    surrounding a standard or common definition.
    All of these things (and more!) will result in
    analysis paralysis.
  • The 80/20 rule is very applicable to the
    development of a data warehouse. Execute 20
    effort to get 80 of the total outcome, then move
    on to the next set of challenges and
    opportunities. Many data warehouse failures
    started when the development team stopped.
  • Get your hands around an idea, understand what
    the users requirements for information are, and
    build something that produces something that can
    be evaluated. Dont just stand theredo
    something!

52
Parallel Data Management
  • A topic thats closely linked to Data Warehousing
    is that of Parallel Data Management.
  • The argument goes
  • if your main problem is that your queries run too
    slowly, use more than one machine at a time to
    make them run faster (Parallel Processing).
  • Oracle uses this strategy in its warehousing
    products.
  • There are two types of parallel processing -
    Symmetric Multiprocessing (SMP), and Massively
    Parallel Processing (MPP)

53
Parallel Data Management (contd)
  • SMP - means the O.S. runs and schedules tasks on
    more than one processor without distinction.
  • in other words, all processors are treated
    equally in an effort to get the list of jobs
    done.
  • MPP - more varied in its design, but essentially
    consists of multiple processors, each running
    their own program.
  • the problem with MPP is to harness all these
    processors to solve a single problem.

54
Parallel Data Management (contd)
  • Regardless of the architecture used, there are
    still alternatives regarding the use of the
    parallel processing capabilities.
  • In normal Transaction processing, each
    transaction runs on a separate processor, because
    transactions are small units of work that run in
    a reasonable time-span.
  • However, the type of analysis carried out in data
    warehouse applications isnt like that. Typically
    you want to run a query that looks at all the
    data in a set of tables. The problem is splitting
    that into chunks that can be assigned to the
    multiple processors.

55
Parallel Data Management (contd)
  • There are two possible solutions to this problem
    Static and Dynamic Partitioning.
  • In Static Partitioning you break up the data into
    a number of sections. Each section is placed on a
    different processor with its own data storage and
    memory. The query is then run on each of the
    processors, and the results combined at the end
    to give the entire picture.
  • This is like joining a queue in a supermarket.
    You stay with it until you reach the check-out.

56
Parallel Data Management (contd)
  • The main problem with Static Partitioning is that
    you cant tell how much processing the various
    sections need. If most of the relevant data is
    processed by one processor you could end up
    waiting almost as long as if you didnt use
    parallel processing at all.
  • In Dynamic Partitioning the data is stored in one
    place, and the data server takes care of
    splitting the query into multiple tasks, which
    are allocated to processors as they become
    available.
  • This is like the single queue in a bank. As a
    counter position becomes free the person at the
    head of the queue takes that position

57
Parallel Data Management (contd)
  • With Dynamic Partitioning the performance
    improvement can be dramatic, but the partitioning
    is out of the users hands.

58
2 Tiered Architecture
  • End User Functionality
  • Data Display
  • Personal Data Storage
  • Personal Application Logic
  • Shared Global Data Storage
  • Shared Application Logic

This slide defines the fundamentals of a
two-tiered architecture
59
2 Tiered Architecture in the DW Environment
  • Data Access Processing
  • Individual Level Data
  • Data Manipulation
  • Atomic Data Acquisition
  • Organizational Level Data
  • Secondary Data Acquisition
  • Departmental Level Data

This slide depicts where the various components
of the data warehouse architecture would fall in
a two-tiered technical architecture.
60
2 Tiered Architecture in the DW Environment
This is a graphical representation of the
two-tiered data warehouse architecture. The
atomic level, metadata and departmental levels
would reside on the Enterprise Server, while any
individual level data would be stored on the
client. In this scenario the client is a thin
one, with little processing beyond query and
simple data manipulation.
61
2 Tiered Architecture in the DW Environment
In this variation, a fat client at the second
tier facilitates a considerable amount of data
manipulation, freeing up capacity on the
Enterprise Server for other, more global
activities.
62
2 Tiered Architecture in the DW Environment
In this variation the second tier is a thin
client, and volumes of data are pre-processed on
the Enterprise Server. This enables access to
sophisticated analyses by more than one thin
client.
63
3 Tiered Architecture
  • Shared Global Data Storage
  • Shared Application Logic
  • Shared Local Data
  • Shared Application Logic
  • End User Functionality
  • Personal Data Storage
  • Personal Application Logic

In a three-tiered technical environment a
middle tier has been added for shared data and
application logic.
64
3 Tiered Architecture in the DW Environment
  • Atomic Data Acquisition
  • Organizational Level Data
  • Secondary Data Acquisition
  • Departmental Level Data
  • Data Access Processing
  • Individual Data
  • Data Manipulation

In the data warehouse, this middle tier can take
on the responsibility for departmental level
(data mart) processing. There is little or no
impact on the third tier, while the first tier is
relieved of some mid-level functionality.
65
3 Tiered Architecture in the DW Environment
In this graphical representation of the
three-tiered technical architecture, the
departmental data has been moved to the middle
tier. In reality, departmental instances can
still exist on the first tier. In cases where the
middle tier is geographically distant from the
first tier, it is advisable to provide a subset
of local metadata for the users of the middle
tiers data.
66
3 Tiered Architecture in the DW Environment
This scenario also supports a fat client at the
third tier, where the individual level data is
maintained and manipulated.
67
3 Tiered OLAP Arch. in the DW Environment
One of the biggest benefits of a three-tiered
technical architecture for a data warehouse is
the ability to facilitate tremendous amounts of
pre-processing of data on the middle tier.
  • Secondary Data Acquisition
  • Departmental Level Data
  • Shared Data Access Processing
  • Data Access
  • Individual Data
  • Data Manipulation
  • Atomic Data Acquisition
  • Organizational Level Data

68
3 Tiered OLAP Arch. in the DW Environment
In this case, the middle tier pre-processes data
for use by many thin clients, rather than
requiring the individual levels on the third tier
to repetitively processes the same data with
complex algorithms.
69
The Atomic Schema
70
The Star Schema
Dimension Table 1
Fact Table
Dimension Key 1
Dimension Key 3
Dimension Key 1 Dimension Key 2 Dimension Key
3 Dimension Key 4
Description 1 Aggregatn Lvl 1.1 Aggregatn Lvl
1.2 Aggregatn Lvl 1.n
Description 3 Aggregatn Lvl 3.1 Aggregatn Lvl
3.2 Aggregatn Lvl 3.n
Fact 1 Fact 2 Fact 3 Fact 4 . . . Fact n
Dimension Key 4
Dimension Key 2
Description 4 Aggregatn Lvl 4.1 Aggregatn Lvl
4.2 Aggregatn Lvl 4.n
Description 2 Aggregatn Lvl 2.1 Aggregatn Lvl
2.2 Aggregatn Lvl 2.n
71
Dimension Table
  • Describes the data that has been organized in
    the Fact Table
  • Key should either be the most detailed
    aggregation level necessary (e.g. country vs.
    county), if possible, or...
  • Surrogate keys may be necessary, but will
    decrease the natural value of the key
  • Manageable number of aggregation levels

72
Fact Table
  • Quantifies the data that has been described by
    the Dimension Tables
  • Key made up of unique combination of values of
    dimension keys
  • ALWAYS contains date or date dimension
  • Fact values should be additive
  • Aggregations of quantities or amounts from atomic
    level
  • No percentages or ratios
  • May be non-additive, time-variant data

73
For Example
74
Star Schema Query
Select E.Month, B.Customer_Type,
C.Product_Type, D.Store_Location,
sum(A.Total_Quantity) From Purchases_1 A,
Customer_Type B, Product C, Selling_Responsibilit
y D, Date_Information E Where B.Customer_Type
A.Customer_Type and C.Product_Code
A.Product_Code and D.Sales_Rep_ID
A.Sales_Rep_ID and E.Week_Ending_Date
A.Week_Ending_Date and E.Year 1996
and C.Product_Category V Group
by E.Month, B.Customer_Type, C.Product_Type, D.St
ore_Location
75
Caution! Overly Complex Dimension
  • Number of aggregation levels within the dimension
    becomes unmanageable
  • Logically or functionally incorrect combination
    of aggregation levels within a dimension

76
Answer 1 Split Dimension
Split overly complex dimension into several
logical, more manageable dimensions, based on
business function of aggregation levels
77
Answer 2 The Snowflake
  • Identify hierarchies of aggregation levels and
    dimensionalize the primary dimension
  • Secondary dimensions descriptive of the primary
    dimension

78
Answer Distinct Time Period Fact Tables
  • Create separate fact tables to account for
    different time periods
  • Date still part of each fact table key
  • Same dimension tables used by both fact tables
  • Improves overall performance (loading and
    accessing) for each time period
  • Will not increase amount of managed redundancy
Write a Comment
User Comments (0)
About PowerShow.com