Data Warehousing De-Mystified - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Data Warehousing De-Mystified

Description:

www.adobe.com/products/vdp/glossary.html ... web.mit.edu/sapr3/docs/webdocs/glossary/glCD.html ... www.payorid.com/glossary.asp ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 25
Provided by: baytre
Category:

less

Transcript and Presenter's Notes

Title: Data Warehousing De-Mystified


1
Data Warehousing De-Mystified
  • Presented to the Annual
  • Georgia Oracle Users Group
  • Conference
  • on March 20, 2006
  • By Rusty Barnett

2
Data Warehousing De-Mystified
  • What do people mean when they call a grouping of
    data a Data Warehouse?
  • Large Data Volume, Loaded using an ETL Process
    (Extract, Transform, Load)
  • Data is Time-sensitive and Subject Oriented
  • Enterprise Data Warehouse, Data Marts, ODS
    (Operational Data Stores), DSS, EIS
  • Dimensional Model, Third-Normal Form

3
Data Warehousing De-Mystified
  • What do people mean when they refer to Data
    Warehousing?
  • Dimensional Data Modeling
  • ETL processing and administrating
  • Business Intelligence Reporting
  • Database managing and administrating
  • As it pertains to a Data Warehouse

4
Data Warehousing De-Mystified
  • What is a Data Warehouse?
  • "A data warehouse is a subject-oriented,
    integrated, time-variant and non-volatile
    collection of data in support of management's
    decision making process. Bill Inmon, 1990.
  • "A data warehouse is the queryable presentation
    resource for an enterprises data". Ralph
    Kimball, page 19, "The Data Warehouse Lifecycle
    Toolkit.

5
Data Warehousing De-Mystified
  • Definitions of Data Warehouse on the Web
  • A data warehouse is a database geared towards the
    business intelligence requirements of an
    organization. www.oranz.co.uk/glossary_text.htm
  • An information infrastructure that enables
    businesses to access and analyze detailed data
    and trends.www.adobe.com/products/vdp/glossary.ht
    ml
  • A collection of integrated, subject-oriented
    databases designed to support the DSS function.
    The data warehouse contains atomic data and
    lightly summarized data.it.csumb.edu/departments/
    data/glossary.html
  • A data warehouse is a central repository for all
    or significant parts of the data that an
    enterprise's various business systems
    collect.www.pdacortex.com/glossary.htm

6
Data Warehousing De-Mystified
  • Definitions of Data Warehouse on the Web
    Continued
  • The Data Warehouse is a central repository of
    data that provides the MIT community with
    integrated, up-to-date data from various
    administrative systems. web.mit.edu/sapr3/docs/web
    docs/glossary/glCD.html
  • An information repository from which queries
    analysis are made.www.pcai.com/web/glossary/pcai_
    d_f_glossary.html
  • This vast database stores information like a data
    repository, but goes a step further, allowing
    users to access data to perform research-oriented
    analyses.www.payorid.com/glossary.asp
  • A collection of databases combined with a
    flexible data extraction system.www.genpromag.com
    /GlossaryLETTERD.html

7
Data Warehousing De-Mystified
  • Definitions of Data Warehouse on the Web
    Continued
  • Datawarehouse (database) A generic term for a
    system for storing, retrieving and managing large
    amounts of any type of data. www2.themanualpage.or
    g/glossary/glo_d.php3
  • The department or entity charged with collecting
    organization-wide data, verifying its accuracy,
    and analyzing, managing, and distributing it
    throughout the organization. In organizations
    without a data warehouse, each department may
    collect, analyze, manage, and distribute the data
    it needs for its operations.www2.uta.edu/ssw/trai
    nasfa/glossary.htm
  • A repository made up of databases of data
    extracted from a variety of sources, with a view
    to analysis to reveal additional
    information.www.gbc.t-online.hu/english/bszotare2
    .htm

8
Data Warehousing De-Mystified
  • Definitions of Data Warehouse on the Web
    Continued
  • Central repository of data extracted from various
    sources. The current CMSU DW is a read-only
    system with extracted data from SIS, AFINS, HRS
    and Enrollment Management.www.cmsu.edu/x18299.xml
  • A collection of data and information from various
    source systems.www.gov.bc.ca/prem/popt/service_pl
    ans/srv_pln/pssg/appen_a.htm
  • A subject-oriented non-volatile collection of
    data used to support strategic decision making.
    The warehouse is the central point of data
    integration for business intelligence. It is the
    source of data for data marts within an
    enterprise and delivers a common view of
    enterprise data.publib.boulder.ibm.com/tividd/td/
    TEDW/GC32-0744-01/en_US/HTML/insmst342.htm

9
Data Warehousing De-Mystified
  • Definitions of Data Warehouse on the Web
    Continued
  • A logical collection of information, gathered
    from many different operational databases, that
    supports business analysis activities and
    decision-making taskswww.321site.com/greg/courses
    /mis1/glossary.htm
  • A data warehouse is, primarily, a record of an
    enterprise's past transactional and operational
    information, stored in a database designed to
    favor efficient data analysis and reporting
    (especially OLAP). Data warehousing is not meant
    for current, "live" data. en.wikipedia.org/wiki/D
    ata_warehouse

10
Data Warehousing De-Mystified
  • Definitions of Data Warehouse on the Web
    Continued
  • A very large repository of data comprising nearly
    all of a companys information.www.mosaictec.com/
    storage/storage_terms.htm
  • A large database capable of storing all the
    information possessed by a large
    organization.www.jqjacobs.net/edu/cis105/concepts
    /CIS105_concepts_13.html
  • A database used for storing historical data,
    which is used for data analysis.docs.rinet.ru/O8/
    glossary.htm
  • One or more data stores originating from prime
    authoritative data sources by an auditable
    replication process.www.data-core.com/glossary-of
    -terms.htm

11
Data Warehousing De-Mystified
  • Goal of Data Warehousing
  • A System that displays Large Volumes of Data as
    Business INFORMATION!

12
Data Warehousing De-Mystified
  • Terms
  • Data Warehouse ---The term Data Warehouse was
    coined by Bill Inmon in 1990
  • Enterprise Data Warehouse --- Accumulation of
    data from all significant areas of an Enterprise
  • Data Mart --- A subset of a data warehouse, for
    use by a functional area (department) on a
    subject
  • Dimensional Modeling --- A design technique that
    seeks to present the data in a framework thats
    intuitive, allows for high-performance access,
    and resistance to change
  • ERD --- Entity Relational Diagram

13
Data Warehousing De-Mystified
  • Terms
  • Schema --- A collection of database objects,
    including tables, views, indexes, and synonyms
  • Star Schema --- A method of designing tables with
    a Fact Table in the middle and Dimension Tables
    around it a standard technique for designing and
    building multi-dimensional databases
  • Snowflake Schema --- Star Schema with child
    Dimensions
  • Star Query --- A join between a Fact table and a
    number of Dimension tables, where the Dimension
    tables are not joined to each other, only to the
    Fact table

14
Data Warehousing De-Mystified
  • Terms
  • Fact Table --- Primary table in Dimensional Model
    that is meant to contain measurements of the
    business, whose most useful columns are numeric
    and additive
  • Dimension Table --- One of the set of companion
    tables to a Fact Table, whose columns are usually
    textual and are used for constraining, grouping,
    ordering within queries
  • Grain --- The level of the data, like Weekly or
    Yearly
  • Measures --- The numeric columns in a Fact Table
  • Audit Columns --- Columns specifying change
    (who,when)

15
Data Warehousing De-Mystified
  • Terms
  • Conformance --- A Dimension is said to be
    conformed if it can be used by more than one Fact
    Table and/or Data Mart and/or subject area, i.e.,
    shareable
  • Associative (Intersection) Table --- A table that
    is used to associate a value from one table to a
    value in another table in order to reduce
    many-to-many relationships between tables
  • Staging Tables --- Tables used to stage data
    along the way toward loading data, especially
    useful in ETL
  • ETL --- Extract, Transform, and Load also a
    Verb, meaning to load data into one system from
    data in another

16
Data Warehousing De-Mystified
  • Data Warehousing Roles
  • Business Users --- People with knowledge of the
    business and needs for reporting
  • Report Writers --- People who write reports
    (Developers) tools for writing reports
  • Data Modelers --- People who create data models
    from Business Requirements
  • ETL Developers --- People who write / develop ETL
    processes
  • DBA --- People who manage and support databases

17
Data Warehousing De-Mystified
  • Tips for Data Modelers
  • Understanding the capabilities of the ETL tool,
    the RDBMS, and the Reporting tool, is just as
    important as understanding the needs of the
    business and the users requirements
  • All Data Warehouse primary keys (PK) should be
    single-column surrogate keys (sequence)
  • Keep Referential Integrity in the DW
  • Use Associative tables to handle MM
  • Order columns in tables by their fill factor

18
Data Warehousing De-Mystified
  • Tips for ETL Developers
  • Embrace the use of multiple Staging Tables
  • Primarily INSERT data in each ETL process step,
    especially into the Staging Tables
  • Invalidate Indexes (other than Primary Keys,
    Unique Keys, and Foreign Keys) before
    loading/updating massive amounts of data. After
    the data is loaded/updated, then rebuild these
    Indexes
  • Be careful in usage of Oracle ROWNUM
  • Test new features, like 10gs MERGE enhancements,
    Pipeline Functions, Grouped Table Outer Joins,
    Conditional Update/Insert/Delete statements, etc.

19
Data Warehousing De-Mystified
  • Tips for Report Developers
  • Avoid using Hints in your SQL
  • Consider using Materialized Views for reporting
    on a common set of pre-processed data
  • Be very careful (in fact, be hesitant) in using
    normal Views in Data Warehouses
  • Consider using the SQL clauses intersect,
    minus, and union all in set processing
  • JOINs are expensive --- NOT, I/O is expensive
  • Test new features, like 10gs Connect_by, MView
    and MERGE enhancements, Pipeline Functions,
    Grouped Table Outer Joins, Time Series Interrow
    Calcs, etc.

20
Data Warehousing De-Mystified
  • Tips for DBAs
  • Use as much of each Data Block as possible
    (PCTFREE, PCTUSED)
  • Set optimizer_index_cost_adj to lower than 100
  • Use ARCHIVELOG mode RMAN for backups
  • Collect Statistics in best way for your Oracle
    Release
  • Test Benefits of Parallel Query and Query
    Re-write
  • Test new features of RDBMS --- 10g features like
    BFT, new Flashback, Security, Partitioning, and
    RMAN features, Improved VLDB support, etc.

21
Data Warehousing De-Mystified
  • Current Debates in Data Warehousing
  • Fresh Data versus Live Data
  • Dimensional Model versus Third-Normal Form
  • Is a Snowflake schema bad for performance?
  • Time spent to physicalize the Data Model?
  • Who is responsible for Data Model?
  • Manage DB via Data Model? (OWB)
  • When to partition and when not to?
  • Acceptance of Level 3 Changing Dimensions

22
Data Warehousing De-Mystified
  • Current Standards DW Tools
  • Database --- Oracle 9i / 10g
  • ERD Data Modeling --- ERwin (Oracle Designer)
  • ETL --- Informatica (OWB)
  • Report Writing --- Oracle Discoverer,
    MicroStrategy, Business Objects, Cognos

23
Data Warehousing De-Mystified
  • Questions
  • And
  • Answer
  • Session

24
Data Warehousing De-Mystified
  • The End
Write a Comment
User Comments (0)
About PowerShow.com