Title: Cash Registers
1Cash Registers Satellites
- Briefing to the 2006 NOAATech Conference
- November 2, 2005
-
- Stan Cutler
- Stanley.cutler_at_noaa.gov
- 301-457-5210 x 163
- Mitretek Systems/NESDIS/OSD
2 Purpose
- Improve communication between NOAAs developers
and the wider community of data management
professionals - Introduce vocabulary
- Identify NOAA applications that can be described
using common vocabulary
3 Agenda
- Universal Data Management Challenges
- Notional Data Warehouse Architecture
- Data Modeling Approaches
- Relational
- Dimensional
4I. Universal Data Management Challenges
5Data Mining Example Market Basket Analysis
- Decisions
- Move beer display closer to the diaper display
- On Thursdays, sell beer diapers at full price
- Rationale
- 1) When men bought diapers on Thursdays and
Saturdays, they also tended to buy beer - 2) Men typically did their weekly grocery
shopping on Saturdays - 3) On Thursdays, they only bought a few items
6Many Disciplines Mine Their Data
- Law Enforcement - Optimal Deployment
- Health Care Coverage Risks
- E-Commerce Pop-up/Link Selection
- Medicine Gene/Disease Associations
- Etc.
Data Management Goal Develop systems in which the
data and procedures are configured to answer
questions that are important to the enterprise
7NOAAs Future
We are not unique. Any enterprise that collects
large amounts of data has the same kind of
challenges and goals
- Integrating Global (Environmental Observations)
and Data Management - Ensuring Sound, State-of-the-Art (Research)
- Developing, Valuing, and Sustaining a World-Class
Workforce
8We can find valuable expertise outside the NOAA
community
- Ask the same kinds of questions as those
challenged with similar problems - Understand the constructs and vocabulary
- Architectures
- Data Modeling
9II. Notional Data Warehouse Architecture
10Hub and Spoke Architecture
Application Specific Data Marts use OLAP
Technologies ()
ETL Extract, Transform and Load
Internal Data
Transform Cleanse
Data Staging Area
Data Ware-house
External Data
Application Neutral
OLAP Online Analytical Processing
11Retail ApplicationHub and Spoke Architecture
Sales Data
Transform Cleanse
External Customer Lists
Application Neutral
OLAP Data Marts (Application Specific)
12Notional NOAA Hub and Spoke Architecture
NOAA Applications (Data Marts using OLAP)
CLASS
ESPC
Data Staging Area (Rich Inventory?)
Data Ware-house
Ecosystems Management
Transform Cleanse
Other Satellite Archives
Data Centers
Commerce Transportation
External Customers
Application Neutral
13III. Data Modeling Approaches
14Relational Vocabulary
- Relational technologies
- Relational Data Base Management Systems (RDBMS)
- COTS Products (INFORMIX, DB2, ORACLE, MS/SS,
etc.) - Proprietary data management/manipulation software
- RDBMS Extensions (Most COTS products built on an
RDBMS) - GUIs, CASE Tools, COOP, Application Generators,
Security, etc. - Relational Data Models - Evolutionary approach
to data base design - Conceptual Entity Relationship Diagrams (ERD)
used to identify data requirements,
relationships, rules - Diagrams
- Data Dictionaries
- Logical ERDs used to normalize (eliminate
redundancies) - Physical models are the Table Schema entered into
the RDBMS - Online Transaction Processing (OLTP)
- e.g., CLASS
15Entity Relationship Diagram (ERD)
The foundation of all OLTP systems, such as CLASS
Entity
Entity Class
Relationship
Attributes
Cardinality (1, Many, or 0)
Attributes, entities, and relationships are
described in the data dictionary
16Object Models inherit ERD constructs
Object Class
key Behavior gtgtgtgt gtgtgtgt
17Pros Cons of systems based on Relational models
- Strengths
- Referential integrity
- Data locking
- Fast Look-up and Retrieval
- GUIs
- Weaknesses
- Entity proliferation
- Users dont understand them
- Complex code must be written to accumulate
multiple instances (Hard to use for Data Mining)
18Dimensional Data Models
- Fact
- An instance of numeric data
- Dimension
- Foreign key
- Fact Table
- Key is a concatenation of foreign keys
(dimensions) - An instance can have dozens of foreign keys
- Millions of instances (rows) often required
- Programmers revenge on Data Base Administrators
- Break many relational rules
- Re-invented often
19A Dimensional Data Model for Retailing
- Who (buys, sells)
- Customer (age, gender, marital status,
occupation, etc.) - Sales person ( , , training, etc.)
- Cash Register
- What (products)
- Brand, color, size, type, etc
- When
- Time of day, day of week, season
- Where
- Store (location, size, type), Shelf
- Why
- Promotions, advertising, discounts, economic
trends - How much (was spent)
- Per product, per total sale
20Classical Star Schema Point of Sale
FACT
Time_key Customer_key Store_key Clerk_key Promo_ke
y Product_key Register_key Dollars Sold Units
Sold Dollars Cost
21Snowflake Schema Point of Sale
Brand-ID_PK Maker-Desc
Model-Num_PK Model-Desc
FACT
Sub-Type_PK Sub-Type-Desc
Time_key Customer_key Store_key Clerk_key Promo_ke
y Product_key Register_key Dollars Sold Units
Sold Dollars Cost
Product Dimension
Product_Type_PK Product_Type_Desc
Sub-Type_PK Sub-Type-Desc
Sub-Type_PK Sub-Type-Desc
Brand-ID_PK Maker-Desc
Model-Num_PK Model-Desc
Sub-Type_PK Sub-Type-Desc
22Metadata in Dimensional Modeling
- NOAA usage
- If its not a fact
- If its not a key
- Its metadata
- Conventional Dimensional usage
- If its not a fact
- If its not a key
- Its documentation
- BUT
- If its a key
- Its metadata (because it describes the
fact)
23Dimensional Models for NOAA
- Which
- Satellite
- Instrument
- When
- Orbit, UTC, Season, decade, epoch, etc
- Where
- Geospatial coordinates
- Who
- User affiliation
- Developer affiliation
- FACT How much?
- Temperature, moisture, radiance, color, etc.
24A NOAA Star Schema?
FACT TABLE
Time_key (fk) Location-key (fk) Altitude key
(fk) Product_key (fk) Satellite_key
(fk) Instrument_key (fk) Temperature
25Pros Cons of systems based on dimensional models
- Strengths
- Very few entity types needed
- Decision Support Systems (DSS)
- End-Users construct complex queries by selecting
dimensions from a GUI - Statistical analysis of very large data bases
- Artificial Intelligence (AI)
- Automated scheduling of continuous executions
- System identifies (discovers) new relationships
- Discoveries shape successive execution
- Weaknesses
- Development Cost
- Storage
- Operational Cost - Requires much care and
feeding
26False Dichotomy Relational vs. Dimensional
- Relational and dimensional systems are not
mutually exclusive - Data warehouses usually extract fact tables from
relational data bases - Data warehouse capabilities are extensions in
RDBMSs - Depends on the business
- Feasibility Is the application data good enough
for ETL? - ROI Does the business benefit outweigh the cost?
27SUMMARY
- NOAAs data mining challenge is similar to that
of other enterprises - A world-wide community of IT professionals uses a
particular vocabulary to address the challenge - Relational technologies models are the
essential first step - Dimensional technologies models come next
28Questions
Stan Cutler Mitretek System/NESDIS/OSD Stanley.Cut
ler_at_nooa.gov 301-457-5210 ex 163