Title: Data Warehousing and Decision Support
1Data Warehousingand Decision Support
- Mickey Yost
- United States Department of Agriculture
- National Agricultural Statistics Service
- ASC 99
- September 24, 1999
2Metadata Standards
The most visible aspects of the data warehouse
and the ones by which the project will be judged
are the portions that rise above the surface the
schema, metadata, and the desktop data access
tools you provide. Managing the Data Warehouse
(1997) Inmon, Welch, and Glassey
3Data Warehousing
4The Simplest Dimensional Model or Star Schema is
characterized by a long and narrow central Fact
table surrounded by 6 to 12 Dimension tables.
5In the NASS model, the Fact table primarily
contains the Survey Responses reported by
respondents in a survey or census.
6The dimension tables store and manage the
information that describes the facts stored in
the Fact table, and are each joined to that table.
7The Varname dimension table, for example,
contains the question text used in a sample
survey or census. The response to a particular
question is not in this table.
8Likewise, the Reporter dimension table contains
the names, addresses, and demographics of the
individuals who were interviewed in a survey or
census.
9The Survey Dimension Table merely contains one
row for each unique survey or census being
tracked. For example 1997 CENSUS OF
AGRICULTURE OR 1999 SEPTEMBER HOG SURVEY
10The Location dimension table merely contains one
row for each State and County in the U. S.
11Dimensional database models are primarily
designed to allow for AD HOC data analysis by the
dimensional attributes being tracked in the model.
12- Questions that may be posed to this database
include - Show me the entire reporting history for a single
reporter. - Add up the number of Hogs reported in a survey by
state, county, race, sex and age.
13Data Warehousing
- Uses a Dimensional Model
- Tracks the Facts
14Unlike transaction system databases that replace
specific data in a specific field, the Data
Warehouse accumulates Facts over time.
15In the NASS Data Warehouse, the Facts are the
individual Survey Responses from 1997 to the
present covering multiple surveys, a census, 3600
questions, and 1.8 million respondents.
16Data Warehousing
- Uses a Dimensional Model
- Tracks the Facts
- Tracks the Metadata
17Metadata is defined as the data about the data.
18For our purposes, metadata describes the facts
stored in the dimension tables, and is primarily
intended for the business end user.
19One of the main responsibilities of the Data
Warehouse is to correctly represent prior
history. Ralph Kimball That is done by adding
rows to the dimension tables.
20Slowly changing dimensional attributes, and their
relationship to the facts they describe, are
managed by the addition of new rows when any of
the key attributes change.
21For example, a key attribute in the Varname table
is the Varname Survey Name. If a specific
question in a Hog survey is used in an
Environmental survey, a new row is added for that
survey.
22Not only is the survey source of the variable
maintained, but all of the original attributes
are also maintained.
23This has created a golden opportunity to
standardize all variable names and their
definitions. This is accomplished by using the
Master Varname and Master Varname Description
columns in the Varname table.
24Master Master Varname
Varname Original Varname Description
Survey Name Varname CCRNXXHV CORN ALL
AG SURVEY CCRNXXHV HARVESTED
HOG SURVEY CCRNXXHV ACRES
COUNTY ESTIMATES C133 COUNTY
ESTIMATES C202 COUNTY ESTIMATES C203
COUNTY ESTIMATES C531 COUNTY
ESTIMATES C538 COUNTY ESTIMATES C543
COUNTY ESTIMATES CNAHCURR COUNTY
ESTIMATES CYDGCHRV COUNTY
ESTIMATES HARVGRN COUNTY
ESTIMATES IC321 CENSUS K67
25- Standardization benefits
- Browsing and selection of variables is much
easer. - Difference between original variable and master
variable reduced over time. - Survey and Census specifications integrated.