Introduction to Dimensional Analysis Session 2 - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Dimensional Analysis Session 2

Description:

Understand why Data Dictionaries are so critical in accurate BI ... 'Polygon' 'Square' 'Red Circle' 'Green' 'Blue' 'Blue-Green' ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 41
Provided by: danmcc6
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Dimensional Analysis Session 2


1
Introduction to Dimensional AnalysisSession 2
  • 5/20/2005

Dan McCreary President Dan McCreary
Associates dan_at_danmccreary.com (952) 931-9198
2
Agenda
  • General introduction to Data Dictionaries that
    drive Business Intelligence (BI) concepts and
    terminology
  • Understand why Data Dictionaries are so critical
    in accurate BI
  • Understand how BI looks at the world in different
    ways
  • Understand how data warehouse concepts and data
    dictionary impact analysis and research

3
What is a Data Warehouse?
  • Fast Retrieval
  • Internally Consistent
  • Slice and Dice Capability
  • Easy to Browse
  • Complete and Reliable
  • Data Quality Controls
  • GI-GO (Garbage-In, Garbage-Out)
  • Source Ralph Kimball

4
Factors Driving Business Intelligence
  • Computer process and store twice as much data per
    dollar every 18 months (Moores Law)
  • People can make better decisions if they have
    tools to quickly see only the data they are
    interested in seeing
  • People frequently want to analyze data in new
    ways that was unanticipated by people creating
    "canned reports"
  • Tools can be designed to allow non-technical
    (non-SQL programmers) to generate their own
    reports
  • People have an incredible ability to categorize
    things base on their properties and attributes
    but if they don't have consistent definitions of
    these properties they will not be generating
    consistent results

5
The BI Iterative Process
Access Data Warehouse
Analysis
BI ProjectManagement
Publishing, Change, DataGap Analysis, New
DataGathered
Insights, Conclusions andFindings
  • The BI process in an on-going iterative process
    where the structure of the data warehouse changes
    based on what data is critical to an
    organizations business objectives.

6
BI Evolution
Increasing Responsiveness
Monthly Green Bar Reports
BrowseableGraphical Interface
  • Shorten the time-to-report interval
  • Allow users to "browse" data sets interactively
  • Remove programmers with "backlogs" of reports
  • Users frequently waited days, weeks for months to
    get a custom report created

7
Dimensions of BI
Technical SophisticationRequired
Low (analysts)
Highly Responsive to "What If" Scenarios
Degree ofEnd User Control
High (programmers)
Few Dimensions few parameters, few filters
Many Dimensions many variables
8
Overlapping Terminology
Data Mining
Statistical Analysis
PatternDiscovery
Data Dictionaries Data Modeling Semantics
Business Intelligence
Data Warehousing
Transaction Processing (OLTP)
Indexing
Dimensional Analysis
Aggregates
Data Storage (RDBMS)
9
Key Terms Covered in This Class
  • Properties
  • Dimension
  • Aggregation and Levels
  • Enumerations of Categorical Data
  • Labeling Categories
  • Giving precise definitions to Labels
  • Dimension Hierarchies and Levels
  • Cubes
  • Measures
  • Filters
  • Data Warehouse Presentation

10
Things Have Many "Properties"
People are very good at recognizing and sorting
things by their properties.
11
Sorting by an Object's Property
  • Sort objects by their color

12
Sorting by A Property
  • Sort objects by their shape

13
Sorting by Color AND Shape
Shape Dimension
Color Dimension
14
Dimensional Analysis
  • The science of figuring out intuitive ways that
    people want to categorize information using
    independent variables to graphically filter and
    browse their data

15
Dimension
  • List of categories used to partition the
    information based on a property of the objects
  • Dimension Names Color, Shape

16
Labels
  • A name given to a non-overlapping category within
    a dimensions

Labels
"red"
"blue"
"green"
17
Enumeration
  • Whenever we decide to break the continuous
    observable world into a predefined list of
    categories when each category has a label we call
    this an "enumerated value domain". These will
    then become the "dimensions" of our cube.

"green"
"red"
"blue"
Note NO OVERLAP!
Statisticians call this type of "categorical
data" and it requires the categories to be
non-overlapping.
18
The Challenge of Semantic Classification
  • People are good at sorting based on a property
    they see
  • People are good at assigning names to a property
    type
  • People usually come up with different names for
    properties
  • Some dimensions people easily agree on
  • Some are very difficult to classifyand even more
    difficult go get peopleto agree on a
    non-overlapping classification system

"Red Circle"
"Polygon"
"Square"
"Blue"
"Green"
"Blue-Green"
What happens with a small percentage of data does
not quite fit into a discrete category?
19
Level
  • A layer of "aggregation" within a single
    dimension categorization of properties

All Shapes
Shapes With Curves
Shapes Without Curves
Levels
Circle
Heart
Square
Trapezoid
Moon
Star
Diamond
20
Measures (example weight)
5.7
7.4
3.5
6.6
10
1.1
6.1
8.2
5.5
3.8
9.1
9.3
2.6
8.4
A measure is any property that you can perform
math on (sums, averages).
21
Measures
  • Something that you can do math on.

-
X


/
average
sum
22
Sample Object "Fact Table"
Measures tend to have data types of integers
and floating point numbers.
Note that categorical data can not beadded
together. But we can count thefrequencies of
items with a category!
23
Shape Dimension
Note that there is no reference to "Has Curves"
in the prior table. "HasCurves" is a property
of the shape value domain because it can be
"inferred"from the shape of the object. Some
categorical definitions use "exclusionary"
language.Note that "Has Curves" also must have
a precise definition in the data dictionary.
24
Facts and Dimension
Shape Facts
Color_FK
Color Dim
Shape_FK
Shape Dim
Color Name
Weight
Has Curves
Shape Name
Note that "Has curves" does not need to be in the
central fact table.It is a property of the shape!
25
Adding Dimensions
5.7
7.4
3.5
6.6
10
1.1
6.1
8.2
5.5
3.8
9.1
9.3
2.6
8.4
We have now added a 3rd dimension "Dash Style"
26
Each New Property is Another Dimension
Shape Facts
Color_FK
Color Dim
Shape Dim
Shape_FK
Color Code
Has Curves
DashStyle_FK
Shape Code
Weight
DashStype Dim
Shape
27
Filters
5.7
7.4
3.5
6.6
10
1.1
6.1
8.2
5.5
3.8
9.1
9.3
2.6
8.4
A filter will exclude all objects with a
specified property. For example we can exclude
all shapes with a property of "Circle"
28
Example Discarding Invalid Scores
This example filter removes all scores EXCEPT the
valid scores.
29
Selecting Only Scale Scores
This filter removes all scores EXCEPT the
assessments Scale Scoreusing the Test Score Type
dimension.
30
The Star Schema
31
Adding Measures
Shape Facts
Color_FK
Color Dim
Shape_FK
Shape Dim
ColorCode
DashStyle_FK
ShapeCode
WeightValue
DashStype Dim
Measures are Integers or floats that you can
perform math on.
HeightValue
ShapeCode
PriceAmount
DensityValue
Measures can be easily be added to the fact table
without changing any of the dimensions.
32
Cube
  • A Cube is a pre-built structure that has facts
    and many dimensions (not necessarily just three)
  • Designed to have averages and sums for most
    levels "pre-calculated" to make analysis fast

Color Dimension
Dash-Style Dimension
Shape Dimension
33
Build a Mental Model
(aka "Page Fields")
Filter Funnel
Measure count
Vertical Dimension (rows)
Presentation
Horizontal Dimension (columns)
34
Using Cubes in Excel
Filter Dropped Here
Measures Dropped Here
Row and Column Dropped Here
35
Count of Year vs. Assessment Name
The measure is the count of records in the cube.
The Column Dimension is the "Fiscal Year"
There are around 25 million test results
The Row Dimension is the "Test Name".
36
Conformed Dimensions
  • When building many cubes, there is a large
    benefit to "reusing" dimensions
  • Commonly reused dimensions
  • Time (Fiscal Year, Quarter)
  • Organization (School, District)
  • Expense Category
  • Student

37
Each bar represents the sum of all the
expendituresin the category (Expenditures On
girls athletics for the fiscalyear 1991)
38
Sample of National Conformed Dimensions
School Incident Data
School Technology
Organization
Assessment
Financial
Student
Teacher
Claims
Process
Date
Student Attendance
Student Assessment
District Financial Reporting
School and District Status
Teacher Licensing
School Food and Nutrition
Student Disciplinary Reporting
Student Safety Reporting
District Technology Planning
39
Role of Data Architecture
  • Facilitate how business users want to identify
    and categorize data
  • Assist in the creation and documentation of
    categorical value domains and measures
  • Creation of machine-readable data dictionaries
    for use in building data warehouse structures

40
Summary
  • We found a way for non-SQL programmers to analyze
    complex data by looking at one dimension at a
    time
  • Users don't have to memorize "codes"
  • Users do need to understand how continuous data
    is mapped into categories and what the labels on
    these categories mean
Write a Comment
User Comments (0)
About PowerShow.com