Decision Support Systems 1201311 Data Warehousing - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Decision Support Systems 1201311 Data Warehousing

Description:

Decision Support Systems 1201311 Data Warehousing Chattrakul Sombattheera Agenda Definitions and Concepts Process Overview Architecture Data Integration and the ETL ... – PowerPoint PPT presentation

Number of Views:203
Avg rating:3.0/5.0
Slides: 33
Provided by: msu149
Category:

less

Transcript and Presenter's Notes

Title: Decision Support Systems 1201311 Data Warehousing


1
Decision Support Systems1201311Data Warehousing
  • Chattrakul Sombattheera

2
Agenda
  • Definitions and Concepts
  • Process Overview
  • Architecture
  • Data Integration and the ETL Processes
  • Development
  • Real-Time Data Warehousing
  • Administration and Security Issues

3
Definitions and Concepts
  • A data warehouse (DW) is a pool of data produced
    to support decision making it is also a
    repository of current and historical data of
    potential interest to managers throughout the
    organization.
  • Data are usually structured to be available in
    form ready for analytical processing activities
    (e.g., online analytical processing OLAP, data
    mining, querying, reporting, other decision
    support applications).
  • A data warehouse is a subject-oriented,
    integrated, time-variant, nonvolatile collection
    of data in support of managements
    decision-making process.

4
Characteristics of DW
  • Subject-oriented. Data are organized by detailed
    subject, such as sales, products, or customers,
    containing only information relevant for decision
    support that allows decision makers to both know
    how their business performs and why.
  • A DW differs from typical DBs that they are more
    product-oriented and are tuned to handle
    transactions of DBs. Subject orientation allows a
    more comprehensive view of organization
  • Integrated. Data are from different sources and
    must be in a consistent form. DWs must deal with
    naming conflicts and discrepancies among units of
    measure.
  • Time variant (time series). A DW usually contains
    historical data (containing multiple time points,
    e.g. daily, weekly, monthly), except in real-time
    systems. They detect trends, deviations,
    long-term relationships for forecasting and
    comparisons, leading to decision making.
  • Nonvolatile. Users cannot change data in DWs.
    Obsolete data are discarded, and changes are
    recorded as new data.

5
Additional DW Characteristics
  • Web-based. DWs are typically web-based
    application.
  • Relational/Multidimensional. DWs use either a
    relational structure or multidimensional
    structure.
  • Client/server. DWs use client/server architecture
    to provide easy access to end users.
  • Real-time. Newer DWs provide real-time or active,
    data access and analysis capabilities.
  • Include metadata. DWs contain metadata (data
    about data) about how data organized and how to
    effectively use them.

6
Parts of DWs.
  • Data Marts. A data mart is a subset of a DW,
    typically consisting of a single subject area
    (e.g. marketing, operations).
  • Operational Data Stores (ODS). An ODS provides a
    fairly recent form of customer information file
    (CIF) and is used for short-term decision
    involving mission-critical applications. The
    contents of an ODS are updated through the course
    of business operations.
  • Enterprise Data Warehouses (EDW). An EDW is a
    large scale DW that is used across the enterprise
    for decision support. EDW are used to provide
    data for many types of DSS, including customer
    relation management (CRM), supply chain
    management (SCM), business performance management
    (BPM), business activity monitoring (BAM),
    product lifecycle management (PLM), revenue
    management, knowledge management systems (KMS),
    etc.

7
DW Process Overview
  • Data sources. Data come from various sources
    including legacy systems, external data
    providers, online transaction processing (OLTP),
    enterprise resource planning (ERP) system, Web
    logs, etc.
  • Data extractions. Data are extracted using
    custom-written or commercial software (ETL).
  • Data loading. Data are loaded into staging area,
    where they are transformed and cleansed.
  • Comprehensive database. EDWs support all decision
    analysis by providing relevant summarised and
    detailed information originating from many
    different sources.
  • Metadata. Medata are maintained so that they can
    be accessed by IT personnel and users. Metadata
    include software programs about data and rules
    for organizing data summaries that are easy to
    index and search.
  • Middleware tools. Middleware tools enable access
    to the data warehouse. Middleware tools include
    SQL, Business Object, applications (data mining,
    OLAP, reporting tools, data visualize tools),
    etc.

8
DW Process Overview
9
DW Architectures
10
DW Architectures
11
DW Architectures
12
DW Architectures
13
DW Architecture
14
DW Architectures
15
DW Architectures
16
DW Architectures
17
DW Architectures
18
10 Factors for choosing DW architecture
  • Information interdependence between
    organizational units
  • Upper managements information needs
  • Urgency of need for a data warehouse
  • Nature of end-user tasks
  • Constraints on resources
  • Strategic view of the data warehouse prior to
    implementation
  • Compatibility with existing systems
  • Perceived ability of the in-house IT staff
  • Technical issues
  • Social/Political factors

19
Data Integration
  • Data integration comprises three major processes
    that, when correctly implemented, permit data to
    be accessed and made accessible to an array of
    ETL and analysis tools and data warehousing
    environment
  • Data access (i.e., the ability to access and
    extract data from any data source),
  • Data federation (i.e., the integration of
    business views across multiple data stores), and
  • Change capture (i.e., based on the
    indentification, cpature, and delivery of the
    changes made to enterprise data sources).

20
Data Integration Techniques
  • Enterprise application integration (EAI) provides
    a vehicle for pushing data from source systems
    into the data warehouse.
  • EAI involves integrating application
    functionality and is focused on sharing
    functionality (rather than data) across systems,
    thereby enabling flexibility and reuse.
  • Traditional EAP focuses on enabling application
    reuse at the programming level, whereas modern
    EAP uses services-oriented architecture (SOA).
  • Enterprise information integration (EII) proposes
    real-time data integration from a variety of
    sources, such as relational databases, Web
    services, and multidimensional databases.
  • EII tools use predefined metadata to populate
    views that make integrated data appear relational
    to end users. XML seems to be the most
    appropriate tool to define metadata.

21
Extraction, Transformation, and Load (ETL)
  • ETL is the heart of DW.
  • ETL is composed of
  • Extraction reading data from one or more
    databases,
  • Transformation converting the extracted data
    from its previous form into the form in which it
    needs to be so that it can be placed into a data
    warehouse or simply another database, and
  • Load putting the data into datawarehouse.
  • Transformation occurs by using rules or lookup
    tables or by combining the data with other data.
  • The three database functions are integrated into
    one tool to pull data out of one or more
    databases and place them into another,
    consolidated database or a data warehouse.
  • ETL tools transport data between sources and
    targets, document how data elements (e.g.
    metadata) change as the move between source and
    target, exchange metadata with other application
    as needed, and administer all runtime processes
    and operations (e.g., scheduling, error
    management, audit logs, statistics).

22
ETL Process
23
DW Development Approaches
  • The inmon Model The EDW Approach
  • The Kimbell Model The Data Mart Approach

24
DW Structure Star Schema
25
DW Implementation Issues
  • The following is a list of major tasks that could
    be performed
  • Establishment of service-level agreements and
    data-refresh requirements
  • Identification of data sources and their
    governance policies
  • Data quality planning
  • Data model design
  • ETL tool selection
  • Relational database software and platform
    selection
  • Data transport
  • Data conversion
  • Reconciliation process
  • Purge and archive planning
  • End-user support

26
DW Implementation Issues
  • Some of the best practices for implementing a
    data warehouse
  • The project must fit with corporate strategy and
    business objectives.
  • There must be complete buy-in to the project by
    executives, managers, and users.
  • It is important to manage user expectation about
    the completed project.
  • The DW must be built incrementally.
  • Build in adaptability.
  • The project must be managed by both IT and
    business professionals.
  • Develop a business/supplier relationship.
  • Only load data that have been cleansed and are of
    a quality understood by the organization.
  • Do not overlook training requirements.
  • Be politically aware.

27
DW Implementation issues
  • Risks in implementing DW
  • No mission or objective
  • Quality of source data unknown
  • Skills not in place
  • Inadequate budget
  • Lack of supporting software
  • Source data not understood
  • Weak sponsor
  • Users not computer literate
  • Political problems or turf wars
  • Unrealistic user expectations
  • Architectural and design risks
  • Scope creep and changing requirements
  • Vendors out of control
  • Multiple platforms
  • Key people leaving the project
  • Loss of the sponsor
  • Too much new technology
  • Having to fix an operating system

28
DW Implementation Issues
  • When developing a DW, the followings are not to
    things
  • Starting with the wrong sponsorship chain
  • Setting expectations that you cannot meet and
    frustrating executives at the moment of truth
  • Engaging in politically naïve behavior
  • Loading the warehouse with information just
    because it is available
  • Believing that data warehousing database design
    is the same as transactional database design
  • Choosing a data warehouse manager who is
    technology oriented rather than user oriented.
  • Focusing on traditional internal record-oriented
    data and ignoring the value of external data and
    of text, images, and, perhaps, sound and video.
  • Delivering data with overlapping and confusing
    definitions.
  • Believing promises of performance, capacity, and
    scalability.
  • Believing that your problems are over when the
    data warehouse is up and running.
  • Focusing on ad hoc data mining and periodic
    reporting instead of alerts.

29
Massive DW and Scalability
  • Scalability is the ability of a system to
    maintain its level of performance when the size
    of input grows.
  • When the size of input in DWs grows, DWs must be
    able to handle the number of users and the
    complexity of their queries.
  • Wal-Mart use hundreds of terabytes of data to
    study sales trends and other tasks.
  • US Defence use 5 petabyte DW to maintain the
    medical records of its 9 million personnel.
  • Good scalability of DW means that queries and
    other data access functions will grow (ideally)
    linearly with the size of the DW.

30
Real-Time Data Warehousing
  • Traditionally, DWs work mainly on historical data
    to support strategic and tactical decision
    making.
  • For many business, making fast and consistent
    decisions across the enterprise requires
    real-time data warehousing. Decision support has
    become operational.
  • Real-time data warehouse (RDW), also known as
    active data warehouse (ADW), is the process of
    loading and providing data via the data warehouse
    as they become available.
  • At a basic level, ADW simply report what
    happened. At the next level, some analysis
    occurs. As the system evolves, it provides
    prediction capabilities, which lead to the next
    level of operationalization.
  • At its highest evolution, ADW is capable of
    making events happen (e.g. activities such as
    creating sales and marketing compaigns or
    idnetifying and exploiting opportunities).

31
Traditional vs Active DW
  • Strategic, tactic and Operational decisions
  • Results measured with operations
  • Only comprehensive detailed data available within
    minutes is acceptable
  • High number (1000 or more) of users accessing and
    querying the system simultaneously
  • Flexible ad hoc reporting, as well as
    machine-assisted modeling (e.g., data mining) to
    discover new hypotheses and relationships
  • Operational staffs, call centers, external users
  • Strategic decision only
  • Results sometimes hard to measure
  • Daily, weekly, monthly data currency acceptable
    summaries often appropriate
  • Moderate user concurrency
  • Highly restrictive reporting used to confirm or
    check existing processes and patterns often uses
    predeveloped summary tables or data marts
  • Power users, knowledge workers, internal users

32
DW Administration and Security Issues
  • Establishing effective corporate security
    policies and procedures. An effective security
    policy should start at the top, with executive
    management, and should be communicated to all
    individuals within the organization.
  • Implementing logical security procedures and
    techniques to restrict access. This includes user
    authentication, access controls, and encryption
    technology.
  • Limiting physical access to the data center
    environment.
  • Establishing an effective internal control review
    process with an emphasis on security and privacy.
Write a Comment
User Comments (0)
About PowerShow.com