Title: Chapter%208%20(Part%20C)
1Chapter 8 (Part C)
Database Modeling and Design
- OLAP IN THE DATA WAREHOUSE
Paul Chen
2 Summary of Topics
- 1. OLAP Definition, Key features and Benefits
- 2. How OLAP differs from OATP?
- 3. Multidimensional data ? What? Why? How To
Use? -
- 4. OLAP Server Query, Features and
Applications. - 5. Category of OLAP tools -Multi-Dimensional
OLAP (MOLAP), Relational OLAP (ROLAP), and
Managed Query Environment (MQE). - 6. OLAP extensions to SQL.
- 7. The Microsoft Data Warehousing Framework.
3Data Warehousing and End-User Access Tools
- Accompanying growth in data warehouses is
increasing demands for more powerful access tools
providing advanced analytical capabilities. - Key developments include
- Online analytical processing (OLAP)
- SQL extensions for complex data analysis
- Data mining tools.
4Limitations of Other Analysis Methods
- SQL has been the accepted interface for
retrieving and manipulating data from relational
databases. These methods are used in OLTP systems
and in data warehousing environments (referring
to the environments with simple queries and
routine reports). - Now consider information retrieval and
manipulation in these environments- reports
writers and spreadsheets. - Report writers two features- the ability to
point and click for generating and issuing SQL
calls, and the capability to format the output
reports. However, report writers do not support
multidimensionality. With basic report writers,
you cannot drill down to lower levels in the
dimensions. You cannot rotate the results by
switching rows and columns. The report writers do
not provide aggregate navigation. Once the report
is formatted and run, you cannot alter the
presentation of the result data sets.
Spreadsheets are still cumbersome for showing all
the aggregate levels and multidimensional views,
let alone doing calculations for roll-up and
drill-down. -
5Topic 1 OLAP Definition
- On-Line Analytical Processing (OLAP) is a
category of software technology that enables
analysts, managers and executives to gain insight
into data through fast, consistent, interactive
access to a wide variety of possible views of
information that has been transformed from raw
data to reflect the real dimensionality of the
enterprise as understood by the user - -- DBMS Magazine, April, 1995
6Key Features of OLAP
- Supports analysis, dynamic synthesis and
consolidation of large volumes of
multi-dimensional data. Types of analysis ranges
from basic navigation and browsing (slicing and
dicing) to calculations, to more complex analyses
such as time series and complex modeling. -
- Is able to drill down or roll up with each
dimension. - Is capable of applying mathematical formulas and
calculations to measures.
7Key Features of OLAP
- Can easily answer who? and what?.
- Ability to answer what if? and why? type
questions. - Distinguishes OLAP from general-purpose query
- tools.
- Enables users to gain a deeper understanding and
knowledge about various aspects of their
corporate data through fast, consistent,
interactive access to a wide variety of possible
views of the data. - Can be implemented on the web.
8OLAP Benefits
- Increased productivity of end-users.
- Reduced backlog of applications development for
IT staff. - Retention of organizational control over the
integrity of corporate data. - Reduced query drag and network traffic on OLTP
systems or on the data warehouse. - Improved potential revenue and profitability.
9Topic 2 OLAP VS OLTP (ON-LINE TRANSACTION
PROCESSING)
- OLTP (RELATIONAL)
- ATOMIZED
- PRESENT
- RECORD-AT-A-TIME
- PROCESS ORIENTED
- OLAP (MULTIDIMENSIONAL)
- SUMMARIZED
- HISTORICAL
- MANY RECORDS-AT-A-TIME
- SUBJECT ORIENTED
10OLAP VS OLTP
- WHILE OLTP APPLICATIONS GENERALLY DO NOT
REQUIRE HISTORICAL DATA, NEARLY EVERY OLAP
APPLICATION IS CONCERNED WITH VIEWING TRENDS AND
THEREFORE REQUIRES HISTORICAL DATA. OLTP
APPLICATIONS AND DATABASE TEND TO BE ORGANIZED
AROUND SPECIFIC PROCESSES (SUCH AS ORDER ENTRY),
OLAP APPLICATIONS TEND TO BE SUBJECT-ORIENTED
ANSWERING SUCH QUESTIONS AS WHAT PRODUCTS ARE
SELLING WELL OR WHAT ARE MY WEAKEST SALES
OFFICES?
11Topic 3 Multi-dimensional Database
- In a multidimensional database, data is stored as
Facts and Dimensions instead of rows and columns - Multi-dimensional structures are best visualized
as cubes of data, and cubes within cubes of data.
Each side of a cube is a dimension.
12Sample Star Schema
Time
Fact Table
Sales
Time key Date Month quarter year
Store
Product Key Store Key Time Key Fixed
Cost Variable cost Profit margin YTD_Sales_dollars
_by_store YTD_Sales_dollar_by_category YTD_Sales_B
y_department
Month
Product
Product
Store
Product Key
Store key Store name region
Product Name
Category
Product line
Department
13Kinds of Queries
- Display the total sales of all products for past
five years in all stores. - Compare total sales for all stores, product by
product, between years 2000 and 1999 - Show comparison of total sales for all stores,
product by product, between years 2000 and 1999
only for those products for reduced sales.
14Cube
- Users often view and analyze data
multidimensionality, using hierarchical
segmentation along each dimension. Thus a user
may analyze sales along the time dimension (such
as months within quarters with years), along
geographical dimension (cities with regions
within countries), along the organizational
dimension (sales persons within branches within
territories). We can conceptualize the approach
as a cube.
15Cube
Fact Table View
Multi-Dimensional Cube
Property sale
p
Branch
c1
c2
c3
sale
week
price
p2
2
week 2
p1 p2 p3 p4
1 2 2 1
1 2 4 3
c1 c2 c3 c1
4
p3
C3
C1
C2
week 1
1
P1 P4
3
P property
Cells roughly equivalent to records in a
relational database.
16WHAT IS MULTIDIMENSIONAL DATA?
- RELATIONAL DATABASES ARE ORGANIZED AROUND A
LIST OF RECORDS. EACH RECORD CONTAINS RELATED
INFORMATION WHICH ARE ORGANIZED INTO FIELDS. - CUS NAME CUSTOMER TELEPHONE ADDRESS
- JACK 10001 345-4444 40
MAIN - WALTER 10002 345-6666 30
ELM - HOOVER 10003 345-8588 6
BELLRED - THIS TABLE HAS ONLY ONE DIMENSION.
17WHAT IS MULTIDIMENSIONAL DATA?
- LOOKING AT CUSTOMER BY TELEPHONE OR
- TELEPHONE BY CUSTOMER ONLY PRODUCES
- A ONE-FOR-ONE CORRESPONDENCE.
18WHAT IS MULTIDIMENSIONAL DATA?
- Lets take a look at an example of a relational
table where - there is more than one-to-one correspondence
between - the fields.
- In the following table, we have sales data for
each product - In each region-- four products sold in three
regions. The - Sales data is a two-dimensional matrix (Product
and - Region).
19product/region/sales table
Product Region Sales Nuts
East 50 Nuts
West 40 Nuts
Central 30 Screws East
60 Screws West
50 Screws Central 60
Bolts East 100
Bolts West 120 Bolts
Central 80 Washers
East 90 Washers
West 100 Washers Central
40
20A Much Better Way To Represent the Data
Two-Dimensional Matrix
- PRODUCT EAST WEST CENTRAL
- NUTS 50 40 30
- SCREWS 60 50 60
- BOLTS 100 120 80
- WASHERS 90 100 40
21QUERY ON MULTIDIMENSIONAL DATA
- QUESTIONS LIKE WHAT WERE TOTAL SALES OF
NUTS? OR WHAT WERE TOTAL SALES FOR THE EAST? TO
FIND THE ANSWER IN THE TWO DIMENSIONAL TABLE,
JUST FIND THE CELL CALLED EAST AND ADD UP ALL
THE NUMBERS IN THE COLUMN.
22AGGREGATION THE KEY TO CONSISTENTLY FAST RESPONSE
- PRODUCT EAST WEST CENTRAL TOTAL
- NUTS 50 40 30
120 - SCREWS 60 50 60 170
- BOLTS 100 120 80
300 - WASHERS 90 100 40 230
- TOTAL 300 310
210 820
23Multiple Reads Database Writes
- In the above example, computing the totals
involves 28 (44 34) database reads and eight
database writes. A typical relational database
can read about 200 records per second and writes
perhaps 20 records per second. So consolidating
this tiny database would take less than one
second. However, for some larger tables,
computing for totals could take days or even
weeks to consolidate.
24Relational Database vs. Multidimensional OLAP
Server
- With a multidimensional OLAP server, we can
perform the same consolidation with row and
column arithmetic. Whereas a relational database
can access a few hundred records per second, a
good OLP server should be capable of
consolidating 20,000 to 30,000 cells (equivalent
to records in the relational table) per second,
including the time to write the totals to the
database. The multidimensional OLAP database will
take up less space since the names of the regions
and products are not repeated in the
multidimensional database as they are in the
relational database.
25Multiple Hierarchies And Classes Within Dimensions
- The single biggest factor in determining how many
dimensions for a database is the existence of
multiple hierarchies and classes within
dimensions. - Classes are typically attributes such as size,
color and other characteristics that define a
subset of the members of a dimension. - For example, a database of shampoo sales might
want to roll up product sales by size (6 oz, 15
oz), by type(dry hair, oily hair) and possibly by
other attributes such as scented/unscented, brand
name, and so on.
26TERMINOLOGY
- Dimension roughly equivalent to Fields in a
relational database. - In the multidimensional data, Product and
Region are both dimensions. - Cells roughly equivalent to Records in a
relational - database.
27Simple Hierarchies (Roll up) Within Dimensions
- The preceding table can be represented
graphically as follows
Region Total
Central
East
West
Product Total
Bolts
Washers
nuts
Screws
Individual products roll up into a Product Total
28Multiple Levels of Hierarchies
29Slice and Dice
- A three-dimensional array has a total of six
faces, or views. A four-dimensional array has
twelve views. An n-dimensional array has n(n-1)
views. The ability to rotate the data cube is the
main technology for multi-dimensional reporting
and is sometimes called slice and dice.
Region
Product
Actual/Forecast
30Practical Limitations on Database Size
- In general, as the number of dimensions
increases, the number of cells in the database
increases exponentially. - for ex., a two-dimensional database with 100
products and 100 regions would have 10,000 cells.
If we add a third dimension for time with 52
weeks, we now have 520,000 cells. - Most commercial OLAP servers hit the cell
limit long before they run out of dimensions.
31Topic 4 A list of some important features
supported by some OLAP Servers
- Special time-series data types
- Special dimensions for variables (complex
mathematical relationships, such as computed
averages, and simultaneous equations) - Multiple hierarchies within a dimension
- Classes with a dimension
- Virtual variables (computed on the fly at run
time, such as gross margin derived from
revenues and expenses.
32Examples of OLAP applications in various
functional areas
33OLAP Applications
- Although OLAP applications are found in widely
divergent functional areas, all have following
key features - multi-dimensional views of data
- support for complex calculations
- time intelligence.
34OLAP Applications - multi-dimensional views of
data
- Core requirement of building a realistic
business model. - Provides basis for analytical processing through
flexible access to corporate data. - The underlying database design that provides the
multi-dimensional view of data should treat all
dimensions equally.
35OLAP Applications - support for complex
calculations
- Must provide a range of powerful computational
methods such as that required by sales
forecasting, which uses trend algorithms such as
moving averages and percentage growth. - Mechanisms for implementing computational methods
should be clear and non-procedural.
36OLAP Applications time intelligence
- Key feature of almost any analytical application
as performance is almost always judged over time. - Time hierarchy is not always used in same manner
as other hierarchies. - Concepts such as year-to-date and
period-over-period comparisons should be easily
defined.
37Representing Multi-Dimensional Data
- Example of two-dimensional query.
- What is the total revenue generated by property
sales in each city, in each quarter of 1997? - Choice of representation is based on types of
queries end-user may ask. - Compare representation - three-field relational
table versus two-dimensional matrix.
38Multi-dimensional Data as Three-field table
versus Two-dimensional Matrix
39Representing Multi-Dimensional Data
- Example of three-dimensional query.
- What is the total revenue generated by property
sales for each type of property (Flat or House)
in each city, in each quarter of 1997? - Compare representation - four-field relational
table versus three-dimensional cube.
40Multi-dimensional Data as Four-field Table versus
Three-dimensional Cube
41Representing Multi-Dimensional Data
- Cube represents data as cells in an array.
- Relational table only represents
multi-dimensional data in two dimensions.
42Multi-Dimensional OLAP Servers
- Use multi-dimensional structures to store data
and relationships between data. - Multi-dimensional structures are best visualized
as cubes of data, and cubes within cubes of data.
Each side of cube is a dimension. - A cube can be expanded to include other
dimensions.
43Multi-Dimensional OLAP Servers
- A cube supports matrix arithmetic.
- Multi-dimensional query response time depends on
how many cells have to be added on the fly. - As number of dimensions increases, number of the
cubes cells increases exponentially.
44Multi-Dimensional OLAP Servers
- However, majority of multi-dimensional queries
use summarized, high-level data. - Solution is to pre-aggregate (consolidate) all
logical subtotals and totals along all
dimensions. - Pre-aggregation is valuable, as typical
dimensions are hierarchical in nature. - (e.g. Time dimension hierarchy - years, quarters,
months, weeks, and days)
45Multi-Dimensional OLAP Servers
- Predefined hierarchy allows logical
pre-aggregation and, conversely, allows for a
logical drill-down. - Supports common analytical operations
- Consolidation
- Drill-down
- Slicing and dicing.
46Multi-Dimensional OLAP Servers
- Consolidation - aggregation of data such as
simple roll-ups or complex expressions
involving inter-related data. - Drill-Down - is reverse of consolidation and
involves displaying the detailed data that
comprises the consolidated data. - Slicing and Dicing - (also called pivoting)
refers to the ability to look at the data from
different viewpoints.
47Multi-Dimensional OLAP servers
- Can store data in a compressed form by
dynamically selecting physical storage
organizations and compression techniques that
maximize space utilization. - Dense data (ie., data that exists for high
percentage of cells) can be stored separately
from sparse data (ie., significant percentage of
cells are empty).
48Multi-Dimensional OLAP Servers
- Ability to omit empty or repetitive cells can
greatly reduce the size of the cube and the
amount of processing. - Allows analysis of exceptionally large amounts of
data.
49Multi-Dimensional OLAP Servers
- In summary, pre-aggregation, dimensional
hierarchy, and sparse data management can
significantly reduce the size of the cube and the
need to calculate values on-the-fly. - Removes need for multi-table joins and provides
quick and direct access to arrays of data, thus
significantly speeding up execution of
multi-dimensional queries.
50Codds Rules for OLAP Systems
- In 1993, E.F. Codd formulated twelve rules as the
basis for selecting OLAP tools. - Multi-dimensional conceptual view
- Transparency
- Accessibility
- Consistent reporting performance
- Client-server architecture
- Generic dimensionality
51Codds rules for OLAP Systems
- Dynamic sparse matrix handling
- Multi-user support
- Unrestricted cross-dimensional operations
- Intuitive data manipulation
- Flexible reporting
- Unlimited dimensions and aggregation levels.
52Codds Rules for OLAP Systems
- There are proposals to re-defined or extended the
rules. For example, to also include - Comprehensive database management tools
- Ability to drill down to detail (source record)
level - Incremental database refresh
- SQL interface to the existing enterprise
environment
53Categories of OLAP Tools
- OLAP tools are categorized according to the
architecture of the underlying database. - Three main categories of OLAP tools include
- Multi-dimensional OLAP (MOLAP or MD-OLAP)
- Relational OLAP (ROLAP), also called
multi-relational OLAP - Managed query environment (MQE)
54Topic 5 Multi-Dimensional OLAP (MOLAP)
- Use specialized data structures and
multi-dimensional Database Management Systems
(MDDBMSs) to organize, navigate, and analyze
data. - Data is typically aggregated and stored according
to predicted usage to enhance query performance.
55Multi-Dimensional OLAP (MOLAP)
- Use array technology and efficient storage
techniques that minimize the disk space
requirements through sparse data management. - Provides excellent performance when data is used
as designed, and the focus is on data for a
specific decision-support application.
56Multi-Dimensional OLAP (MOLAP)
- Traditionally, require a tight coupling with the
application layer and presentation layer. - Recent trends segregate the OLAP from the data
structures through the use of published
application programming interfaces (APIs).
57Typical Architecture for MOLAP Tools
58MOLAP Tools - Development Issues
- Underlying data structures are limited in their
ability to support multiple subject areas and to
provide access to detailed data. - Navigation and analysis of data is limited
because the data is designed according to
previously determined requirements.
59MOLAP Tools - Development Issues
- MOLAP products require a different set of skills
and tools to build and maintain the database,
thus increasing the cost and complexity of
support.
60Relational OLAP (ROLAP)
- Fastest-growing style of OLAP technology.
- Supports RDBMS products using a metadata layer -
avoids need to create a static multi-dimensional
data structure - facilitates the creation of
multiple multi-dimensional views of the
two-dimensional relation.
61Relational OLAP (ROLAP)
- To improve performance, some products use SQL
engines to support complexity of
multi-dimensional analysis, while others
recommend, or require, the use of highly
denormalized database designs such as the star
schema.
62Typical Architecture for ROLAP Tools
63ROLAP Tools - Development Issues
- Middleware to facilitate the development of
multi-dimensional applications. (Software that
converts the two-dimensional relation into a
multi-dimensional structure). - Development of an option to create persistent,
multi-dimensional structures with facilities to
assist in the administration of these structures.
64Managed Query Environment (MQE)
- Relatively new development.
- Provide limited analysis capability, either
directly against RDBMS products, or by using an
intermediate MOLAP server.
65Managed Query Environment (MQE)
- Deliver selected data directly from DBMS or via a
MOLAP server to desktop (or local server) in form
of a datacube, where it is stored, analyzed, and
maintained locally. - Promoted as being relatively simple to install
and administer with reduced cost and maintenance.
66Typical Architecture for MQE Tools
67MQE Tools - Development Issues
- Architecture results in significant data
redundancy and may cause problems for networks
that support many users. -
- Ability of each user to build a custom datacube
may cause a lack of data consistency among users.
- Only a limited amount of data can be efficiently
maintained.
68Topic 6 OLAP Extensions to SQL
- SQL promoted as easy-to-learn, nonprocedural,
free-format, DBMS-independent, and international
standard. - However, major disadvantage has been inability to
represent many of the questions most commonly
asked by business analysts. - IBM and Oracle jointly proposed OLAP extensions
to SQL early in 1999, adopted as an amendment to
SQL.
69OLAP Extensions to SQL
- Many database vendors including IBM, Oracle,
Informix, and Red Brick Systems have already
implemented portions of specifications in their
DBMSs. - Red Brick Systems was first to implement many
essential OLAP functions (as Red Brick
Intelligent SQL (RISQL)), albeit in advance of
the standard.
70OLAP Extensions to SQL - RISQL
- Designed for business analysts.
- Set of extensions that augments SQL with a
variety of powerful operations appropriate to
data analysis and decision-support applications
such as ranking, moving averages, comparisons,
market share, this year versus last year.
71Use of the RISQL CUME Function
- Show the quarterly sales for branch office B003,
along with the monthly year-to-date figures. - SELECT quarter, quarterlySales,
CUME(quarterlySales) AS Year-to-Date - FROM BranchSales
- WHERE branchNo 'B003'
72Use of the RISQL MOVINGAVG / MOVINGSUM Function
- Show the first six monthly sales for branch
office B003 without the effect of seasonality. - SELECT month, monthlySales,
- MOVINGAVG(monthlySales) AS 3-MonthMovingAvg,
- MOVINGSUM(monthlySales) AS 3-MonthMovingSum
- FROM BranchSales
- WHERE branchNo 'B003'
73Topic 7 The Microsoft Data Warehousing Framework
- The Microsoft Data Warehousing Framework is an
open architecture that is easily integrated with
existing systems. The Microsoft SQL Server DTS
tool is used to import, export, and repair or
transform data (where it is necessary). The
Framework contains a rich object-oriented
programming interface for customized data
warehousing implementations. There is also a
user interface, the Microsoft SQL Server Analysis
Services Manager that can be used to configure
the data warehouse and to populate or update the
content in a cube. It can be used to schedule
tasks, monitor performance, and perform queries
on the data warehouse.
74The Microsoft Data Warehousing Framework
Microsoft Office
OLAP Services
Operation data
Web Brouser, tools, portals Servers
OLE DB
OLE DB
OLE DB
Data Transformation Services
User
Other DW
Relational Data Store
DWA member query Reporting, and Analytical tools
External data
Microsoft Repository
75Object Linking and Embedding for Databases (OLE
DB)
- Microsoft has defined set of data objects,
collectively known as OLE DB. - Allows OLE-oriented applications to share and
manipulate sets of data as objects. - OLE DB is an object-oriented specification based
on C API. - Components can be treated as data consumers and
data providers. Consumers take data from OLE DB
interfaces and providers expose OLE DB
interfaces.
76Microsoft Tools To Implement A Data Warehouse
System
- Microsoft SQL Server 2000 provides a set of tools
to implement a data - warehouse system, including decision support and
analytical tools - SQL Server Relational Databases
- Data Transform Services
- Meta Data Servicesusually stored in SQL Server
or Microsoft Access Databases that can be
accessed through administrative interfaces. - Analytical Services providing OLAP technology
to organize large quantities of DW data into
cubes for rapid analysis and sophisticated data
mining technology. - Replicationoften used to distribute data and
coordinate updates of distributed data in OLTP
systems. - English querya development tool for creating
client applications that transform English into
the syntax of SQL to query relational databases,
or the syntax of Multidimensional (MDX) to query
OLAP cubes. -
77Microsoft Data WarehouseProduct
- Data warehousing software has been included with
Microsoft SQL Servertrade since the release
of version 7.0 in 1998.
78The Cube as a Model for the Data Warehouse
- The cube is an imperfect yet satisfactory name
for a data warehouse repository. How does a data
warehouse cube differ from a geometrical cube?
There are a few important differences. A data
warehouse cube is defined by any number of
dimensions (it is not limited to three, and
sometimes a data-warehousing cube may have fewer
than three dimensions). Dimensions describe a
data-warehousing cube just as width, height, and
depth describe a geometrical cube. Where it is
appropriate, dimensions can be organized into
any number of levels.
79The Cube as a Model for the Data Warehouse
- The relationship between two dimensions can be
modeled using a grid. Dimensions are like the
labels of along the axes of the grid. The cells
are facts. Facts correspond to the cross product
of each dimension of the cube. The data in the
cell is a measure. Measures are the whole reason
for the cube. If the cube is about the number of
items sold, the measure is a count of the number
of items sold. To repeat the grid example, the
measure is the number that you would find in the
grid cell.
80Dimensions and Levels
- Levels are used to organize dimensions into
smaller units where necessary. Levels may also
contain other levels, depending on how they are
configured in the cube. For example, consider
that there is a region dimension. Perhaps this
grocery store operates in three states and uses
the state boundaries as territorial boundaries.
Let's say that the region dimension contains
three levels California, Oregon, and Washington.
If the business has additional sub-regions such
as Seattle, Olympia, Yakima, and Spokane in the
state of Washington, these levels can be added as
levels to the Washington region, even if such
detail is not needed for the California and
Oregon region. Levels are just a convenient way
to organize facts for a dimension.
81Facts and Measures
- A fact is about the combination of the various
dimensions. Locating a fact is like using a
coordinate system. Just like a position in a
mathematical cube such as the origin, which
might be represented by (x0,y0,z0), a fact
would be represented by specific combination of
dimensions such as - (Productbroccoli, RegionSeattle,
TimeWednesday) yielding a specific fact about
broccoli being sold in Seattle on Wednesday.
Depending on the way that the cube is being
used, the fact may show a measure of something
like 580 units sold or perhaps a different
measure like 860.00 in sales. The meaning of
the measure depends on how the cube is defined.
82Aggregations
- The mathematical operations of count and sum are
an essential part of the reason data warehouses
are useful. These are aggregations. Once
dimensions are organized and a cube is being
processed, the aggregations are calculated.
Generally, aggregations are calculated
immediately after the cube is initially
populated or there is a change to the content of
the cube.
83Using a Data Warehouse to Make Decisions
- Consider a grocery store. Let's say that a
promotion has been running for a few days and
the grocer needs to decide whether or not to run
the promotion again. A question that the grocer
might pose would be something like, "Has more of
the product been sold during the promotion
period compared to prior periods?"
84Using a Data Warehouse to Make Decisions
- A grocery store inventory system may record
prices, products, sales, and - promotions in a transactional database using
normalized structures. - Inventory systems are optimized for inserting
and updating records and - perhaps for simple procedural selections such
as retrieving the cost of an - item. It is much less likely that the system is
organized in such a way that it - would be just as efficient to produce a report
that details on a day-by-day - and product-by-product basis the effectiveness
of a sale. In fact, there is - usually a contradiction between systems that
are designed for transactional - efficiency and those that are designed for
efficient queries. This is where a - data warehouse should be used. The data
warehouse is a separate - repository that uses the relevant data from
existing sources in a structure - that has been optimized for selection.
85Using a Data Warehouse to Make Decisions
- In this case, it is easy to use a data warehouse
to answer the grocer's question. The sum of the
fact records that measure the number of items
sold using the cube dimensions of products,
promotion, and time can produce the needed
results.
86Using a Data Warehouse to Make Decisions
- To contrast this technique, the information in
the other systems may not even be in the same
database. The inventory data source may not be
the same data source as the customer data
source or the employee data source. Even if the
systems are in the same database, it would still
be a chore to build a system of queries that
will combine and aggregate the results in a way
that will produce the correct answer. In fact,
this effort of combining data sources and
aggregating the results is just what the data
warehousing software does best.
87Viewing a Slice and the Programming Interface to
A Cube
- Although fashioning a data warehouse in multiple
dimensions may be a simple design choice, and
performing queries that produce results that span
several dimensions may not be a significant
chore for the processor, the very constitution
of multi-dimensional output can make it difficult
to display. Charts, graphs, and tables are
almost always presented in two dimensions.
There are some good three-dimensional charting
tools, but beyond that, the chart becomes more
of a puzzle than a visual aid. A common
technique of viewing multi-dimensional output is
to view the output one two-dimensional "slice"
of a cube at a time. This is the way that the
Microsoft SQL Server Analysis Services Tool
displays output.
88Using DSO (Decision Support Object)
- Fortunately, output is not restricted to two
dimensions. Microsoft SQL Server Analysis
Services provides a programming interface to
multi-dimensional data warehouse output the DSO
(Decision Support Object). DSO can be used
programmatically to access the various
dimensions.
89Using MDX
- MDX (multidimensional extensions) is a syntax
designed for querying multidimensional objects
and data. For such systems, it is more efficient
and intuitive to use than SQL, which was designed
for an entirely different set of objects. The
grammar of an MDX query has a similar feel to the
grammar of an SQL query. Observe the MDX query
below which would select the sales figures for
the broccoli sold in Seattle on Wednesday that I
used in the example above
90Using MDX
- Microsoft SQL Server OLAP Services provides an
architecture for access to multidimensional data.
This data is summarized, organized, and stored
in multidimensional structures for rapid response
to user queries. Through OLE DB for OLAP, a
PivotTable Service provides client access to
this multidimensional online analytical
processing - (OLAP) data. For expressing queries to this
data, OLE DB for OLAP employs a full-fledged,
highly functional expression syntax
multidimensional expressions (MDX). - Â
91Using MDX
- OLE DB for OLAP is a set of Component Object
Model (COM) interfaces designed to extend OLE DB
for efficient access to multidimensional data.
ADO has been extended with new objects,
collections, and methods that are designed to
take advantage of OLE DB for OLAP. These
extensions are collectively known as ADO MD
(multidimensional) and are designed to provide a
simple, high-level object model for accessing
tabular and OLAP data. - Â
92MDX expression
- SELECT axis specification ON COLUMNS,
- axis specification ON ROWS
- FROM cube_name
- WHERE slicer_specification
93SELECT
- Measures.Sales ON COLUMNS
- Time.Wednesday ON ROWS
- FROM MySalesCube
- WHERE Region.Washington.Seattle
- AND Product.Vegetable.Broccholi
- Â
- The output of this query would be a column
labeled Sales, a row labeled Wednesday and a
single grid cell at the intersection with the
sales figure - 860.00.
94SELECT
- The SQL Server Analysis Server Manager has an
interface that accepts MDX queries.
Alternatively, MDX queries can be incorporated
into programs that employ the DSO.
95Microsoft SQL Data Transformation Services (DTS)
- DTS is a set of graphical tools and programmable
objects - to extract, transform, and consolidate data from
disparate - services into single or multiple destinations.
- By using DTS, you can
- Access heterogeneous data sources-including
relational and non-relational data sources. - Import, export, and transform data- moving entire
tales, and copying other types of database
objects. - Create reusable data transformations and
functions-creating data transformations that
manipulate data as it moves from sources to
destination and creating functions that perform
operations external to DTS, such as sending
e-mail. -
96Microsoft SQL Data Transformation Services
(Contd)
- Automate data loads-to accommodate differing
deployment environments, you can automate DTS
data loads for recurring executions by using the
SQL Server Agent a scheduling component of SQL
Serve. - Manage metadata- recording metadata about data
loads, such as the owner of a data load. - Customize and extend functionality-providing a
customized data transport solution that you can
use to integrate supplied functionality with
customized programmatic extensions.
97Microsoft SQL Data Transformation Services
(Contd)
Transformation Package Workflow
Transformation Package Designer
Schema Designer
- Define Data Flow
- Define Transformations
- Populate Star Schema
98Populating Data Warehouse Structures
- By using DTS you can
- Access heterogeneous data structures by using
OLE DB. - Import, export, and transform datausing the DTS
import/export wizard. - Create reusable data transformations and
functions. - Automate data loads.
- Manage metadata.
-
99Applying the Wizard
- Because of its generic, easy-to-use features,
you can apply the DTS - Import/Export Wizard in many data transfer
situations - Performing ad-hoc table and data transfers.
- Copying data subsets.
- Transferring data to new destinations.
- Developing a prototyping package.
- When you configure the source and destination,
you can connect to - Both
- OLE DB data sources.
- ODBC data sources.
-
100The Microsoft Data Warehousing Framework
- There is more information about how to build a
cube using the Microsoft SQL Server Analysis
Services Manager at MSDN Online, as well as
guidelines to consider for the design and
configuration of your data warehouse. See the
"How to" article at this link - http//msdn.microsoft.com/library/psdk/sql/aghti
ntro_2vov.htm.