Data-Driven Business Intelligence Systems: Part I - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Data-Driven Business Intelligence Systems: Part I

Description:

IMS3001 BUSINESS INTELLIGENCE SYSTEMS SEM 1 , 2004. Data-Driven ... errors: misspelled names, erroneous birthdates, missing data; inconsistent data formats ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 66
Provided by: SIMS
Category:

less

Transcript and Presenter's Notes

Title: Data-Driven Business Intelligence Systems: Part I


1
Data-Driven Business Intelligence Systems Part I
  • Week 5
  • Dr. Jocelyn San PedroSchool of Information
    Management Systems
  • Monash University

2
Lecture Outline
  • Data-driven BIS
  • Data warehouse
  • Data warehouse architectures
  • Entity-Relationship Modelling
  • Multi-dimensional Modelling
  • Star Schema
  • An Example Retail Trading

3
Learning Objectives
  • At the end of this lecture, the students will
  • Have better understanding of concepts, tools and
    technology underlying data-driven business
    intelligence systems
  • Have knowledge of multidimensional modelling and
    star schema for data modelling for data
    warehouses

4
  • Data-Driven Business
  • Intelligence Systems

5
Data-Driven BIS
  • Data-driven BIS
  • information systems that provide BI through
    access and manipulation of large databases of
    structured data
  • includes tools for
  • drill down for more detailed information
  • drill up for broader, more summarised view
  • slice and dice for a change in data dimensions

6
Data-Driven BIS
7
Data-Driven BIS
8
Data-Driven BIS
Slicing the cube
Product
Salesperson
Time
9
Data-Driven BIS
Dicing the cube
225,764
201,196
162,504
182,500
123,033
72,528
116,963
75,048
68,792
Peacock
Leverling
Fuller
Davolio
Callahan
Suyaman
King
Dodsworth
Buchanan
Q1 Q2 Q3 Q4
10
  • Data Warehouse

11
Data Warehouse
  • A data warehouse is a subject-oriented,
    integrated, time-variant, nonvolatile collection
    of data in support of managements decision
    making process Bill Inmon (1995)
  • Subject-oriented focus is on subjects related to
    business or organisational activity like
    customers, employees, suppliers (instead of
    applications-oriented (finance, marketing,
    production)
  • Integrated data from various databases is stored
    in a consistent format through use of naming
    conventions, domain constraints, physical
    attributes and measurements
  • Time-variant associating data with specific
    points in time
  • Nonvolatile data does not change once it is in
    the data warehouse and stored in data warehouse

12
Data Warehouse
  • Data warehouse is a copy of transaction data
    specifically structured for query and analysis
    Ralph Kimball (1996)
  • Data warehouse is a specific database designed
    and populated to provide decision support in an
    organisation - Gray and Watson (1998)

13
Data Warehouse
  • Data warehousing emerged as result of
  • improvements in database technology relational
    data model and relational database management
    systems (DBMS)
  • advances in computer hardware - emergence of
    affordable mass storage and parallel computer
    architectures
  • emergence of end-user computing, facilitated by
    powerful, intuitive computer interfaces and tools
  • advances in middleware products that enable
    enterprise database connectivity across
    heterogeneous platforms

14
Data Warehouse
  • triggered by recognition of fundamental
    differences between operational (or production)
    systems and informational (or decision support)
    systems
  • Operational system system that is used to run a
    business in real time, based on current data
    e.g. sales order processing, reservation systems,
    patient registration
  • Informational systems designed to support
    decision making based on stable point-in-time or
    historical data for complex read-only queries or
    data mining applications e.g. sales trend
    analysis, customer segmentation, human resources
    planning

15
Data Warehouse
Comparison of Operational and informational
Systems McFadden, Hoffer and Prescott 1999
Characteristic Operational Systems Informational Systems
Primary purpose Run the business on a current basis Support managerial decision making
Type of data Current representation of state of the business Historical or point-in-time (snapshots)
Primary users Clerks, salespersons, administrators Managers, business analysts, customers
Scope of usage Narrow vs. simple updates and queries Broad vs. complex queries and analysis
Design goal performance Ease of access and use
16
Data Warehouse Architectures
  • Generic two-level architecture

17
Data Warehouse Architectures
Three-level architecture
18
Data Warehouse Architectures
  • Data mart
  • a data warehouse that is limited in scope
  • contains selected and summarised data to support
    specific decision support applications of
    specific end-user group
  • e.g., marketing data mart, finance data mart

19
Data Warehouse Architectures
  • Three-layer data architecture

Enterprise data model
20
Data Warehouse Architectures
  • Enterprise data model
  • Presents a total picture explaining the data
    required by an organisation
  • Must be developed prior to designing a data
    warehouse
  • Entity-Relationship Models traditional approach
    in relational database design
  • Multidimensional Models are commonly used in
    data warehouses and data marts for faster
    retrieval for querying and analysis

21
Data Warehouse Architectures
  • Operational Data
  • current or transient, not historical
  • restricted in scope to a particular application
  • poor quality
  • not normalised (there are multi-valued attributes
    or repeating groups, partial dependencies,
    transitive dependencies in data relations)

22
Data Warehouse Architectures
Sample Operational Data from Northwind database
23
Data Warehouse Architectures
  • Reconciled Data
  • Detailed - rather than summarised
  • Historical snapshots, periodic
  • Comprehensive should reflect enterprise-wide
    perspective conform to enterprise data model
  • Quality controlled
  • Normalised 3NF or higher
  • 3NF no multi-valued attributes, no partial
    dependencies, no transitive dependencies

24
Data Warehouse Architectures
Steps in Normalisation
25
Data Warehouse Architectures
Sales relation with sample data
Relation in 3NF
26
Data Warehouse Architectures
  • Derived Data
  • selected, formatted, aggregated
  • provides ease of use for decision support
    applications
  • provides fast response for user queries
  • supports ad-hoc queries and data mining
    applications
  • data model commonly used is star schema

27
Data Warehouse Architectures
  • Metadata
  • data that describe the properties or
    characteristics of other data
  • Operational metadata describe the data in
    various operational systems (as well as external
    data) that feed the EDW
  • EDW metadata describe the reconciled data layer
    as well as the rules for transforming operational
    data to reconciled data
  • Data mart metadata describe the data in derived
    data layer and rules of transforming reconciled
    data to derived data

28
Data Warehouse Architectures
Sample data description
29
Data Warehouse Architectures
Sample data description
30
Data Warehouse Architectures
  • Data Reconciliation Process
  • Stage 1 Initial load, when EDW is first created
  • Stage 2 Subsequent updates
  • Steps in Data Reconciliation Process
  • Capture extract relevant data from source/s

31
Data Warehouse Architectures
  • Scrub clean or upgrade the quality of raw data
    before transformation and loading (using pattern
    recognition, artificial intelligence techniques)
  • Track and correct errors misspelled names,
    erroneous birthdates, missing data inconsistent
    data formats

32
Data Warehouse Architectures
  • Transform - includes
  • converting data format or representation from
    source to target system
  • partitioning data according to predefined
    criteria
  • aggregating data from detailed to summary level

33
Data Warehouse Architectures
  • Load and Index
  • Refresh mode filling the EDW by bulk rewriting
    of target data
  • Update mode only changes in source data are
    written to the data warehouse at periodic
    intervals, data warehouse is rewritten, replacing
    previous contents without overwriting or deleting
    previous contents
  • Create necessary indexes

34
  • Entity-Relationship Modelling

35
Entity-Relationship Modelling
Customer
Region
Customer Type
groups
within
contains
makes
Sale
located at
Product
Store
Product Type
groups
in
within
Period
(based on Kimball (1996), p29, and Simsion-Bowles
(1996), p2)
36
Entity-Relationship Modelling
  • Entities, attributes and relationships
  • Rules of normalisation
  • 3NF is typical
  • Protection of integrity of database by avoiding
    anomalies
  • Every logical thing is represented only once
  • Separate consideration of logical and physical
    aspects

37
Entity-Relationship Modelling
ER Model for the Northwind sample database
38
Entity-Relationship Modelling
  • Large numbers of tables
  • Oracle Financials - 1,800 SAP 7 up to 8,000
  • Commonly used
  • Feels natural once you get used to it
  • Research shows that they are not easily
    understood by IT people
  • Especially concepts like abstraction,
    generalisation, sub-types, etc.

39
  • Multi-dimensional Modelling

40
Multi-dimensional Modelling
  • It is possible to conceptualise data as
    multi-dimensional
  • Difficult to design
  • Easy to use resulting reports
  • Advocated by Ralph Kimball (see his manifesto,
    and a rebuttal, available on the web site).
  • A logical design technique that seeks to present
    data in a standard framework that is intuitive
    and allows for high-performance access.

41
Multi-dimensional Modelling
  • An approach to database design that provides an
    easy to understand and navigate database
  • The aim is to encourage understanding,
    exploration and learning
  • Each number in a database has a set of associated
    attributes
  • What it measures, what point of time it was
    created, what location its from, what product
    its associated with, what promotion, etc.
  • This makes the number meaningful.

42
Multi-dimensional Modelling
  • Each attribute associated with each number
    represents a dimension
  • Measure, time, location, product, location, etc.
  • Resulting views are easy to navigate and move
    around
  • Slice and dice
  • Report template

43
Multi-dimensional Modelling
Example Widget Sales (Million)
One Dimension (State)
43.6
53.4
31.4
27.5
28.3
14.7
Vic
NSW
QLD
WA
SA
TAS
State
48.2
53.4
31.4
28.4
25.1
15.4
2002
Two Dimensions (location x time)
50.1
57.2
33.6
28.1
22.5
16.3
2001
56.3
62.3
35.1
29.4
21.5
13.3
2000
Year
46.2
52.1
29.6
25.1
27.1
18.2
1999
43.6
53.4
31.4
27.5
28.3
14.7
1998
Vic
NSW
QLD
WA
SA
TAS
State
44
Multi-dimensional Modelling
Three Dimensions (location x time x product)
48.2
53.4
31.4
28.4
25.1
15.4
2002
50.1
57.2
33.6
28.1
22.5
16.3
2001
56.3
62.3
35.1
29.4
21.5
13.3
2000
Year
46.2
52.1
29.6
25.1
27.1
18.2
1999
Flanges
43.6
53.4
31.4
27.5
28.3
14.7
Gaskets
1998
Sprockets
Vic
NSW
QLD
WA
SA
TAS
Product
Widgets
State
45
Multi-dimensional Modelling
  • Usually talk about information spaces as cubes,
    or hyper-cubes, or n-cubes
  • Resulting views of databases are easy to navigate
    and move around
  • Slicing and dicing
  • Report Template

46
Multi-dimensional Modelling
  • Slicing and Dicing
  • Select certain dimension values to examine a set
    of data

47
Multi-dimensional Modelling
  • Report Templates
  • One template is produced for a set of slices
  • Data changes, layout doesnt

Location Drop Down Box
Year Drop Down Box
48
Multi-dimensional Modelling
From Traditional Relational to Multi-dimensional
Typical relational data-base
From Pilot Software OLAP White Paper
Same data displayed in two-dimensions Easy!
(The key is to identify the continuous and
discrete variables in the flat file.)
49
  • Star Schema

50
Star Schema
  • Used to implement dimensional analysis using
    standard relational database technology
  • Very common in data warehousing
  • Many variations
  • Two components
  • Fact Table contains measurements of business,
    eg. sales, purchase order, shipment
  • Dimension Tables stores the textual
    descriptions of the dimensions of the business,
    eg. product, customer, vendor, store.

51
Star Schema
  • Fact tables store the hard data
  • Dimension tables store all the information about
    our dimensions.
  • The fact table has a many-to-one relationship
    with each dimension table
  • Each dimension table has a primary key that
    appears as a foreign key in the fact table, whose
    primary key is a concatenation of all of the
    foreign keys.

52
Star Schema
  • Dimension tables in star schemas are denormalised
    resulting in
  • Fewer tables
  • Simpler for users to navigate
  • Reduced number of complex multi-join tables.

53
Star schema
Customer Customer key Name Customer type
Sale Time key Store key Customer key Product
key Dollar sales Unit sales
Product Product key Product type weight
Store Store key Address Region
Time Time key Day Month
Legend Primary Key Foreign key
54
Snowflake schema
Customer Type
Do not snowflake your dimensions, even if very
large. If you do snowflake your dimensions,
prepare to live with poor performance Kimball
(1996)
Customer
Sale
Product
Store
Product Type
Region
Time
55
Star Schema
  • Dimensions can be shared amongst fact tables.

56
Star Schema
  • ER schemas are useful for data mapping to legacy
    systems and for integration of the data warehouse
  • Star schemas are useful for the design of
    warehouse databases as they are efficient and
    easy to understand and use
  • Allow relational databases to support
    multi-dimensional data cubes

57
Star Schema
  • Steps in the design process
  • Choose a business process
  • Choose the grain of the fact table
  • Too fine gt Oversized database
  • Too large gt Loss of meaningful information
  • Choose the dimensions
  • Choose the measured facts
  • (usually numeric, additive quantities)
  • Complete the dimension tables
  • Kimball (1996)

58
Extra steps in the design process
  • Determine strategy for slowly changing dimensions
  • Create aggregations and other physical storage
    components
  • Determine the historical duration of the database
  • Determine the urgency with which the data is to
    be extracted and loaded into the data warehouse.
  • Kimball (1996)

59
An Example Retail Trading
  • A large grocery store with approx. 500 stores
  • Each store has approx. 60,000 products on shelves
  • Need to maximise profit and keep shelves stocked
  • Important decisions concern pricing and promotion
  • Promotion types are
  • Temporary price reductions
  • Newspaper advertisements
  • Shelf and end-aisle displays
  • Coupons

60
An Example Retail Trading
  • Choose a Business Process
  • Daily Item Movement
  • Choose the grain of the fact table
  • Stock Keeping Unit (SKU) by store by promotion by
    day
  • Choose the Dimensions
  • Time, product, store and promotion

61
An Example Retail Trading
Retail Trading Dimensions
Promotion Promotion key Other Promotion attributes
Sale Time key Product key Store Key Promotion
Key Facts to be detailed next
Product Product key Other product attributes
Store Store key Other Store Attributes
Time Time key Other Time Attributes
62
An Example Retail Trading
  • 4. Choose the measured facts

Promotion Promotion key Other Promotion attributes
Sale Time key Product key Store Key Promotion
Key Dollar Sales Unit Sales Dollar Costs Customer
Count
Product Product key Other product attributes
Store Store key Other Store Attributes
Time Time key Other Time Attributes
63
An Example Retail Trading
  • 5. Complete the dimension tables

Promotion Promotion key Other Promotion attributes
Product Product key SKU Description SKU
Number Package Size Brand Sub Category Department
Package Type Diet Type Weight Weight unit of
measure Units per retail case Units per ship
case Cases per pallet
Sale Time key Product key Store Key Promotion
Key Dollar Sales Unit Sales Dollar Costs Customer
Count
Store Store key Other Store Attributes
Time Time key Other Time Attributes
64
References
  • Inmon, W. H. (1996) Building the Data Warehouse
    (2nd ed), Wiley, NY.
  • Kimball, R. (1996) The Data Warehouse Toolkit,
    Wiley, NY.
  • McFadden, F., Hoffer, J. and Prescott, M. (1999)
    Modern Database Management, Addison-Wesley.

65
  • Questions?
  • Jocelyn.sanpedro_at_sims.monash.edu.au
  • School of Information Management and Systems,
    Monash University
  • T1.28, T Block, Caulfield Campus
  • 9903 2735
Write a Comment
User Comments (0)
About PowerShow.com