A Field Study and Framework about Data Warehouse Refresh Policies - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

A Field Study and Framework about Data Warehouse Refresh Policies

Description:

A Field Study and Framework about Data Warehouse Refresh Policies ... List of refresh jobs. Frequency. Scope: data sources and change data volumes ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 30
Provided by: michaelv56
Category:

less

Transcript and Presenter's Notes

Title: A Field Study and Framework about Data Warehouse Refresh Policies


1
A Field Study and Framework about Data Warehouse
Refresh Policies
  • Michael V. Mannino and Ping Walter
  • The Business School
  • University of Colorado at Denver

2
Outline
  • 1. Background
  • 2. Field Study
  • 3. Analysis of Results
  • 4. Contributions and Future Work

3
Refresh Process Background
4
Refresh Process Flow
5
Refresh Process Activities
6
Push-Pull Policies
  • Push policy
  • Server-initiated information requests
  • Periodic refreshment after update or time
    threshold
  • Pull policy
  • Client-initiated information requests
  • Web browsing on demand requests
  • Precise measure
  • Server and client involvement with information
    delivery
  • Dimensions number of messages and message size

7
Refresh Policy Specification
  • List of refresh jobs
  • Frequency
  • Scope data sources and change data volumes
  • Weighted load time lag

8
Refresh Policy Orientation
9
Industry Trends
  • Operational data stores
  • Real-time data warehouses
  • Virtual data warehouse
  • Right-time data warehouse

10
Field Study
11
Field Study Design
  • Multiple case format
  • Cross section of industries
  • Semi-structured interview format
  • 1 to 3 hours of interview time with each data
    warehouse administrator

12
Interview Format
  • Organizational representative background
  • Boundary between data warehouse and source
    systems
  • Data warehouse characteristics
  • Refresh process
  • Data warehouse usage
  • Organizational issues

13
Organization Summary
  • Industries retail, financial services,
    telecommunications (retail and wholesale),
    engineering services, health care, manufacturing,
    wholesale services
  • Size 3,000 to 125,000 employees 350 million to
    31 billion revenue
  • Data warehouse scope most integrated 4 to 6
    functional areas

14
Data Warehouse Summary
  • Schema size 30 tables to 340 tables 100 typical
  • Physical size 2 GB to 150 TB 1.5 TB typical
  • Source systems 3 to 70 source systems hundreds
    of data sources 20 source systems typical
  • Change data volumes thousands to 200 millions of
    rows per day millions of rows per day typical

15
Refresh Policy Summary
  • Most common daily refresh during non business
    hours
  • Significant deviations
  • Telecommunications firm with 5 minute refreshes
  • Retail firm with 15 minute and 3 hour refreshes
  • Some firms with 24 to 48 hour lag
  • One beverage manufacturer with 7 to 10 day lag

16
Refresh Process Constraints
  • Source system constraints
  • Data warehouse constraints
  • Lack of fixed costs
  • Online availability is a constraint, not a fixed
    cost
  • Index costs can be fixed or variable
  • Monitoring costs are not dependent on refresh
    frequency

17
Data Timeliness Requirements
  • Governed by SLAs
  • Content more important than refresh frequency
  • Data timeliness requirements not directly
    connected to refresh frequency

18
Problems and Investments
  • Process improvements
  • Capacity planning
  • New content
  • Infrastructure for change data capture

19
Analysis of Results
20
Refresh Policy Framework
21
Framework Support I
  • User satisfaction
  • Prevalence of user committees and SLAs
  • Willingness to respond to problems
  • Perceived net benefits
  • Business process changes
  • Refresh policies directly affect timeliness,
    availability, and response time

22
Framework Support II
  • Constraints most significant influence on
    requirements and refresh policies
  • Direct influence on requirements
  • Direct influence on refresh policies

23
Framework Support III
  • DW characteristics and user expectations had
    lesser influence on requirements
  • DW characteristics nature of data, type of user,
    type of usage
  • User expectations industry standards and work
    environment

24
Long-Term Influences
  • Investment level
  • Source system technology
  • Middleware
  • Replication/Partitioning
  • Standardized interfaces
  • Organizational variables
  • Operational benefits
  • Control of source systems
  • Profit motive for the data warehouse

25
Practice Recommendations
  • Daily refresh dominant policy
  • Consider refresh process improvements to
    compensate for source system deficiencies
  • Major IT investments for large reductions in load
    time lag
  • Operational benefits imperative for cost
    justification

26
Research Recommendations
  • Process design
  • Capacity planning
  • Investment models data timeliness evaluation
  • Efficiency models
  • Optimal refresh policy

27
Contributions
  • Empirical evidence about refresh policies and
    influencing factors
  • Framework
  • Agenda for future work

28
Optimal Refresh Policy Research
  • Constraint driven
  • No minimization of data staleness costs
  • Use currency constraints for data staleness
    impact

29
Refresh Policy Influences
  • Constraints
  • Source system deficiencies
  • Competitive environment
  • Lack of compelling data timeliness requirements
  • Simple heuristics
Write a Comment
User Comments (0)
About PowerShow.com