Title: Introduction to Dimensional Analysis Session 2
1Introduction to Dimensional AnalysisSession 2
Dan McCreary President Dan McCreary
Associates dan_at_danmccreary.com (952) 931-9198
2Agenda
- General introduction to Data Dictionaries that
drive Business Intelligence (BI) concepts and
terminology - Understand why Data Dictionaries are so critical
in accurate BI - Understand how BI looks at the world in different
ways - Understand how data warehouse concepts and data
dictionary impact analysis and research
3What is a Data Warehouse?
- Fast Retrieval
- Internally Consistent
- Slice and Dice Capability
- Easy to Browse
- Complete and Reliable
- Data Quality Controls
- GI-GO (Garbage-In, Garbage-Out)
- Source Ralph Kimball
4Factors Driving Business Intelligence
- Computer process and store twice as much data per
dollar every 18 months (Moores Law) - People can make better decisions if they have
tools to quickly see only the data they are
interested in seeing - People frequently want to analyze data in new
ways that was unanticipated by people creating
"canned reports" - Tools can be designed to allow non-technical
(non-SQL programmers) to generate their own
reports - People have an incredible ability to categorize
things base on their properties and attributes
but if they don't have consistent definitions of
these properties they will not be generating
consistent results
5The BI Iterative Process
Access Data Warehouse
Analysis
BI ProjectManagement
Publishing, Change, DataGap Analysis, New
DataGathered
Insights, Conclusions andFindings
- The BI process in an on-going iterative process
where the structure of the data warehouse changes
based on what data is critical to an
organizations business objectives.
6BI Evolution
Increasing Responsiveness
Monthly Green Bar Reports
BrowseableGraphical Interface
- Shorten the time-to-report interval
- Allow users to "browse" data sets interactively
- Remove programmers with "backlogs" of reports
- Users frequently waited days, weeks for months to
get a custom report created
7Dimensions of BI
Technical SophisticationRequired
Low (analysts)
Highly Responsive to "What If" Scenarios
Degree ofEnd User Control
High (programmers)
Few Dimensions few parameters, few filters
Many Dimensions many variables
8Overlapping Terminology
Data Mining
Statistical Analysis
PatternDiscovery
Data Dictionaries Data Modeling Semantics
Business Intelligence
Data Warehousing
Transaction Processing (OLTP)
Indexing
Dimensional Analysis
Aggregates
Data Storage (RDBMS)
9Key Terms Covered in This Class
- Properties
- Dimension
- Aggregation and Levels
- Enumerations of Categorical Data
- Labeling Categories
- Giving precise definitions to Labels
- Dimension Hierarchies and Levels
- Cubes
- Measures
- Filters
- Data Warehouse Presentation
10Things Have Many "Properties"
People are very good at recognizing and sorting
things by their properties.
11Sorting by an Object's Property
- Sort objects by their color
12Sorting by A Property
- Sort objects by their shape
13Sorting by Color AND Shape
Shape Dimension
Color Dimension
14Dimensional Analysis
- The science of figuring out intuitive ways that
people want to categorize information using
independent variables to graphically filter and
browse their data
15Dimension
- List of categories used to partition the
information based on a property of the objects - Dimension Names Color, Shape
16Labels
- A name given to a non-overlapping category within
a dimensions
Labels
"red"
"blue"
"green"
17Enumeration
- Whenever we decide to break the continuous
observable world into a predefined list of
categories when each category has a label we call
this an "enumerated value domain". These will
then become the "dimensions" of our cube.
"green"
"red"
"blue"
Note NO OVERLAP!
Statisticians call this type of "categorical
data" and it requires the categories to be
non-overlapping.
18The Challenge of Semantic Classification
- People are good at sorting based on a property
they see - People are good at assigning names to a property
type - People usually come up with different names for
properties - Some dimensions people easily agree on
- Some are very difficult to classifyand even more
difficult go get peopleto agree on a
non-overlapping classification system
"Red Circle"
"Polygon"
"Square"
"Blue"
"Green"
"Blue-Green"
What happens with a small percentage of data does
not quite fit into a discrete category?
19Level
- A layer of "aggregation" within a single
dimension categorization of properties
All Shapes
Shapes With Curves
Shapes Without Curves
Levels
Circle
Heart
Square
Trapezoid
Moon
Star
Diamond
20Measures (example weight)
5.7
7.4
3.5
6.6
10
1.1
6.1
8.2
5.5
3.8
9.1
9.3
2.6
8.4
A measure is any property that you can perform
math on (sums, averages).
21Measures
- Something that you can do math on.
-
X
/
average
sum
22Sample Object "Fact Table"
Measures tend to have data types of integers
and floating point numbers.
Note that categorical data can not beadded
together. But we can count thefrequencies of
items with a category!
23Shape Dimension
Note that there is no reference to "Has Curves"
in the prior table. "HasCurves" is a property
of the shape value domain because it can be
"inferred"from the shape of the object. Some
categorical definitions use "exclusionary"
language.Note that "Has Curves" also must have
a precise definition in the data dictionary.
24Facts and Dimension
Shape Facts
Color_FK
Color Dim
Shape_FK
Shape Dim
Color Name
Weight
Has Curves
Shape Name
Note that "Has curves" does not need to be in the
central fact table.It is a property of the shape!
25Adding Dimensions
5.7
7.4
3.5
6.6
10
1.1
6.1
8.2
5.5
3.8
9.1
9.3
2.6
8.4
We have now added a 3rd dimension "Dash Style"
26Each New Property is Another Dimension
Shape Facts
Color_FK
Color Dim
Shape Dim
Shape_FK
Color Code
Has Curves
DashStyle_FK
Shape Code
Weight
DashStype Dim
Shape
27Filters
5.7
7.4
3.5
6.6
10
1.1
6.1
8.2
5.5
3.8
9.1
9.3
2.6
8.4
A filter will exclude all objects with a
specified property. For example we can exclude
all shapes with a property of "Circle"
28Example Discarding Invalid Scores
This example filter removes all scores EXCEPT the
valid scores.
29Selecting Only Scale Scores
This filter removes all scores EXCEPT the
assessments Scale Scoreusing the Test Score Type
dimension.
30The Star Schema
31Adding Measures
Shape Facts
Color_FK
Color Dim
Shape_FK
Shape Dim
ColorCode
DashStyle_FK
ShapeCode
WeightValue
DashStype Dim
Measures are Integers or floats that you can
perform math on.
HeightValue
ShapeCode
PriceAmount
DensityValue
Measures can be easily be added to the fact table
without changing any of the dimensions.
32Cube
- A Cube is a pre-built structure that has facts
and many dimensions (not necessarily just three) - Designed to have averages and sums for most
levels "pre-calculated" to make analysis fast
Color Dimension
Dash-Style Dimension
Shape Dimension
33Build a Mental Model
(aka "Page Fields")
Filter Funnel
Measure count
Vertical Dimension (rows)
Presentation
Horizontal Dimension (columns)
34Using Cubes in Excel
Filter Dropped Here
Measures Dropped Here
Row and Column Dropped Here
35Count of Year vs. Assessment Name
The measure is the count of records in the cube.
The Column Dimension is the "Fiscal Year"
There are around 25 million test results
The Row Dimension is the "Test Name".
36Conformed Dimensions
- When building many cubes, there is a large
benefit to "reusing" dimensions - Commonly reused dimensions
- Time (Fiscal Year, Quarter)
- Organization (School, District)
- Expense Category
- Student
37Each bar represents the sum of all the
expendituresin the category (Expenditures On
girls athletics for the fiscalyear 1991)
38Sample of National Conformed Dimensions
School Incident Data
School Technology
Organization
Assessment
Financial
Student
Teacher
Claims
Process
Date
Student Attendance
Student Assessment
District Financial Reporting
School and District Status
Teacher Licensing
School Food and Nutrition
Student Disciplinary Reporting
Student Safety Reporting
District Technology Planning
39Role of Data Architecture
- Facilitate how business users want to identify
and categorize data - Assist in the creation and documentation of
categorical value domains and measures - Creation of machine-readable data dictionaries
for use in building data warehouse structures
40Summary
- We found a way for non-SQL programmers to analyze
complex data by looking at one dimension at a
time - Users don't have to memorize "codes"
- Users do need to understand how continuous data
is mapped into categories and what the labels on
these categories mean