Tour of Data Warehousing - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Tour of Data Warehousing

Description:

... to build and manage a data warehouse still require in-house customization of ... The virtual data warehouse (VDW) is a concept being touted by some as a more ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 28
Provided by: nitin94
Category:

less

Transcript and Presenter's Notes

Title: Tour of Data Warehousing


1
Tour of Data Warehousing
  • CIS 609

2
Introduction
  • The ultimate goal of advanced data analysis is to
    make decisions that lead as directly as possible
    to benefits.
  • Data warehouses aim at physically framing
    multiple sources of data (e.g., databases and
    file collections) in an architecture that
    requires the mapping of data from one or more
    operational data sources to a target database
    management system that supports the many decision
    making processes and business intelligence (BI)
    systems of an enterprise.

3
The Value Chain
4
Data to Decisions
  • Historically, BI systems have evolved through
    three main phases.
  • Phase 1. Enterprises started to put in place
    structured data stores filing data relevant to
    their needs. Typically a snapshot of data from a
    single operational source (i.e., information
    center) or sometimes directly against the
    operational source. In the 1980s, decision
    support tasks were performed centrally, with
    highly skilled individuals analyzing
    mainframe-resident data. The results were
    delivered to management as hard-copy reports and
    graphs.

5
Data to Decisions
  • Phase 2. Competitive factors pushed enterprises
    toward a better leverage of the data and they
    adopted augmented Decision Support Systems (DSS).
  • With data analysis techniques such as statistics,
    creating a focused store of data with subject
    matter from two or three operational sources.
  • Along with the storage, tools taking advantage of
    the structure have been setup for enterprises to
    question these data stores.

6
Data to Decisions
  • Phase 3. Finally, to cope with the multiplicity
    and diversity of the data stores, enterprises are
    starting to unify and rationalize these through
    data warehouse frameworks.
  • In addition to this effort, and due to an
    increased competitive pressure, advanced data
    analysis techniques are also implemented to
    leverage further - and gain business advantages -
    the resulting and ever growing amount of data.

7
Three Phases of Data Leverage
8
Data Warehouse Definition
  • A data warehouse is a process and architecture
    that requires robust planning to implement a
    platform, which consists of the selection,
    conversion, transformation, consolidation,
    integration, cleansing and mapping of data (i.e.,
    recent and historical) from multiple operational
    data sources to a target DBMS that supports an
    enterprises decision-making processes and BI
    systems.
  • As an architecture, a data warehouse is more than
    a single product and requires significant
    planning of five essential components
    operational data sources, data conversion and
    extraction, data warehouse DBMS, data warehouse
    administration, and BI tools.

9
Data Mining Definition
  • Data mining is the process of discovering
    meaningful correlations, patterns, and trends by
    sifting through large amounts of data stored in
    repositories.
  • Whether or not this discovery or exploration is
    performed by human analysts, software agents or
    machine learning techniques, it is important that
    the results provide enterprises with insights not
    available through traditional techniques or
    predefined relationships (e.g., relational
    tables).

10
Business Intelligence Definition
  • Business Intelligence (BI) is the user-centered
    process of exploring data, data relationships and
    trends - thereby helping to improve overall
    decision making.
  • This involves an iterative process of accessing
    data (ideally stored in the data warehouse) and
    analyzing it - thereby deriving insights, drawing
    conclusions and communicating findings - to
    effect change positively within the enterprise.
  • BI is an application of a data warehouse, but
    does not necessitate a data warehouse. BI is
    comprised of four major product segments
    interactive query tools, reporting tools,
    advanced DSSs, and EISs.

11
Decision Support Systems (DSS)
  • DSSs, as a subset of BI, utilizes organized
    collections of data, systems, applications, tools
    and techniques by which enterprises gather and
    interpret relevant information about the business
    and turn it into highly quantifiable plans,
    policies, procedures and metrics (see Figure 1).

12
Data Warehousing
  • Data warehousing technology and architectures are
    becoming a mainstream activity
  • However, many enterprises are balking at the high
    cost of implementation and experience difficulty
    in quantifying benefits and the return on
    investment (ROI).
  • In addition, the vast majority of efforts to
    build and manage a data warehouse still require
    in-house customization of products, thus
    consuming significant resources.
  • Many enterprises will need to turn to outside
    help to obtain the expertise necessary to perform
    the architecture integration.
  • There are several key trends that indicate that
    data warehousing is a market that is growing and
    will produce maturing technology

13
Data Warehousing contd
  • Increased investment by portable RDBMS vendors is
    improving support for data warehousing and
    complex DSS applications.
  • Niche tool vendors are moving to provide
    comprehensive solutions those that do not could
    be acquired.
  • Increasing consulting business, with systems
    integrators bringing vertical expertise and data
    models to the table.
  • Specific industries (e.g., retail) will increase
    granularity and the amount of historical data,
    creating 100 terabyte databases easily.
  • Inclusion of external data (e.g., demographic)
    that occurs in a more flexible and integrated way
  • Query monitors that give administrators some
    relief in managing the data warehouse and
    anticipate enterprises data needs, but many
    problems for administrators in managing the data
    warehouse environment (e.g., metadata redundancy
    and summary table maintenance) continue with
    tools slow to emerge and mature.

14
Supporting Multiple BI Applications
  • A data warehouse is more than a single product
    and requires significant planning of five
    essential components
  • Operational data source
  • Data conversion and extraction
  • Data warehouse DBMS
  • Data warehouse administration
  • BI tools

15
Operational Data Source
  • Data administration must take an active role to
    help plan extracts and to work with the business
    administrator to define the data needs.
  • The data administrator can help gather the
    information about the operational data and assist
    in designing the data model that will be used for
    the data warehouse DBMS.

16
Data Conversion and Extraction
  • A data warehouse architecture should include
    extracts of operational data that are frozen
    views of information trapped in time capsules,
    which in some cases have some level of
    summarization and history associated with the
    view of information.
  • The extracts are created either by handcrafted
    programs that take time and expense to maintain,
    or through tools that help automate the
    generation of the extract applications or
    processes.
  • These applications should provide the
    capabilities to perform the complex task of
    integrating data from multiple sources to create
    a consolidated view of the data, as well as the
    transformation of data for use by BI
    applications.

17
Data Warehouse DBMS
  • The RDBMS vendors (e.g., Oracle, IBM, and
    Microsoft) have significantly increased the
    amount of research and development to improve
    support for data warehousing and complex DSS
    applications.
  • This investment is geared toward providing strong
    support of complex database schemas with
    databases approaching several hundred terabytes.
  • Some of the returns (e.g., improved parallel
    techniques, bitmap indexing and improved query
    algorithms) from this investment are beginning to
    emerge out of the labs and into the products.

18
Data Warehouse Administration
  • Data warehousing brings many complex
    administration issues that are much different
    from handling transactional applications
  • With the increasing number of subject areas and
    the increasing amount of historical data (i.e.,
    more than five years) typically maintained, a
    data warehouse requires significant amounts of
    disk storage, memory and processing power.
  • These new administration requirements need
    extensive planning to provide data usage
    auditing, business data model, directory
    management, chargeback, summary tables, security,
    request control, query catalog, subscription
    services, and managing of operational data
    extracts.

19
Business Intelligence (BI)
  • BI empowers enterprises with systems that
    facilitate the access and analysis of data
    contained in the data warehouse.
  • A data warehouse in combination with the right BI
    tools can be an important part of supporting a
    business mission.
  • The selection of BI tools needs to be done after
    an analysis of the business and enterprise needs
    have been performed.
  • It is likely that a single tool will not support
    the needs of all users. Therefore, many times
    several tools will be selected to perform each of
    the following functions ad hoc queries and
    report writing, OLAP, advanced DSS and EIS.

20
The Data Warehouse Market Clarifying the Hype
and Confusion
  • With most market size estimates reaching as high
    as 6 billion, it is clear that data warehousing
    has captured the imagination of the masses.
  • How these estimates have been drawn, or that they
    cover a laundry list of sundry items, is less
    clear. This is not surprising since the data
    warehouse market is a challenging one to quantify
    accurately.
  • It is difficult to ascertain definitively whether
    a particular product is being used in a
    structured data warehouse environment or merely
    in a conventional DSS context

21
Virtual Data Warehouse
  • The virtual data warehouse (VDW) is a concept
    being touted by some as a more timely and less
    risky approach for implementing DSSs than
    traditional data warehousing.
  • VDW is an architecture for accessing data
    directly from data stores, regardless of the data
    source, thus eliminating the need to construct a
    separate database of redundant data.
  • This is not a new idea Information Builders
    incorporated the concept with the Enterprise Data
    Access/SQL (EDA/SQL) product in the early 1990s,
    and it was previously known as universal data
    access.

22
Virtual Data Warehouse contd
  • Proponents of VDW tout the speed of deployment
    over the traditional processes required for
    building a data warehouse, by focusing on user
    data access without the heavy analysis and
    infrastructure requirements.
  • However, this is misleading because many of the
    planning activities (e.g., identifying source
    data and transformation of the data) are still
    required with the VDW architecture to ensure that
    the correct data is available for user access.
  • The identification of source data is a
    complicated and time-consuming process.

23
VDW Challenges
  • These challenges are plentiful
  • Data integration, consolidation and
    transformation
  • Performance issues of SQL access
  • Impact on performance of operational systems
  • The cost of repeated extracts to obtain the same
    data vs. a single extract to create a database
  • Unification of different data management systems
  • Schema integration and reconciliation of the
    different data models implemented with each
    database or application
  • Distributed query processing (i.e., decompose,
    distribute and determine where the query should
    be processed based on the cost of the query)

24
Data Marts
  • A data mart is a subset of data from the data
    warehouse, designed to support the unique
    business unit requirements of a specific
    application.
  • If the right planning is done to incorporate the
    data mart into the overall data warehouse
    architecture, data marts can be beneficial.
  • This may save valuable server resources and give
    the business analysts more freedom to perform
    heavy analysis activities.
  • A users data does not have to reside physically
    in their departments for them to be empowered,
    but instead could be a set of aggregate tables
    within the data warehouse designed to support a
    specific set of users or application.

25
Data Mart Challenges
  • The need to create multiple data extraction and
    copy management applications could consume a
    great deal of IS resources and prove problematic
    to manage.
  • Improved data quality, consolidation and sharing
    are primary goals for creating a data warehouse,
    a strategy that could be jeopardized without
    centralized control for the distribution of data
    to the data marts.

26
Data Warehouse/Mart Architecture
27
Data Mart Considerations
  • The data mart and data warehouse are not mutually
    exclusive. A data mart should be a component of a
    data warehouse architecture.
  • The data mart should not be used as a replacement
    for accessing the data warehouse.
  • The cost and benefits of implementing each data
    mart, which makes the data architecture a great
    deal more complex, needs to be understood and
    compared with directly accessing the data
    warehouse. This is analogous to the higher cost
    of items found at a small convenience store, over
    those at a large grocery store.
Write a Comment
User Comments (0)
About PowerShow.com