Title: Chapter 6: Foundations of Business Intelligence - Databases and Information Management
1Chapter 6Foundations of Business Intelligence -
Databases and Information Management
- Dr. Andrew P. Ciganek, Ph.D.
2File Organization Concepts
- Computer system uses hierarchies
- Database Group of related files
- File Group of records of same type
- Record Group of related fields
- Record Describes an entity (person, place,
thing) - Field Group of characters
- Attribute Characteristic describing the entity
- e.g., Date or Grade belong to entity COURSE
3The Data Hierarchy
4Problems With Traditional File Environment
- Files maintained by different departments
- Data redundancy and inconsistency
- Data redundancy Duplicate data in multiple files
- Data inconsistency Same attribute, different
values - Program-data dependence
- Changes in program requires changes to data
accessed by program
5Traditional File Processing
6The Database Approach to Data Management
- Database
- Data organized to serve many applications by
centralizing data and controlling redundant data - Database management system (DBMS)
- Separates logical and physical views of data
- Solves problems of traditional file environment
- Controls redundancy
- Eliminated inconsistency
- Enables central management and security
7Human Resources Database with Multiple Views
8Relational DBMS
- Data as 2-dimension tables called relations or
files - Each table contains data on entity and attributes
- Table Grid of columns and rows
- Rows Records for different entities
- Columns Represents attribute (field) for entity
- Key field Field used to uniquely identify each
record - Primary key Field in table used for key fields
- Foreign key Primary key used in second table as
look-up field to identify records from original
table
9Relational Database Tables
10Relational Database Tables
11Operations of a Relational DBMS
- Basic operations to develop useful sets of data
- SELECT Creates subset of data of all records
that meet stated criteria - JOIN Combines relational tables to provide more
information than available in individual tables
12Basic Relational DBMS Operations
Select Part_Number 137 or 150, Join by
Supplier_Number
13Example of an SQL Query
- Select Statement Query data for specific info
- Conditional Selection ID which rows of a table
are displayed, based on criteria contained in the
WHERE clause - Joining Two Tables Used to combine data from two
or more tables and display the results
14An Access Query
15Designing Databases
- Design process identifies
- Relationships among data elements, redundant
database elements - Most efficient way to group data elements to meet
business requirements, needs of app programs - Normalization
- Minimize redundant data elements
16Normalization of Order
17Using Databases to Improve Performance and
Decision Making
- For very large databases and systems, special
capabilities and tools are required for analyzing
large quantities of data and for accessing data
from multiple systems - Data warehousing
- Data mining
18Database Warehouses
- Store current and historical data from many core
operational transaction systems - Consolidates and standardizes information for use
across enterprise, but data cannot be altered - Provide query, analysis, and reporting tools
19Components of a Data Warehouse
20Business Intelligence
- Tools for consolidating, analyzing, and providing
access to vast amounts of data to help users make
better business decisions - e.g., Harrahs Entertainment analyzes customers
to develop gambling profiles and identify most
profitable customers - Principle tools include
- Software for database query and reporting
- Online analytical processing (OLAP)
- Data mining
21Online Analytical Processing (OLAP)
- Supports multidimensional data analysis
- Gives first glimpse of possible relationships
- Enables viewing data using multiple dimensions
- Each aspect of information (product, pricing,
cost, region, time period) is different dimension - e.g., How many washers sold in East in June?
- OLAP enables rapid, online answers to ad hoc
queries
22Multidimensional Data Model
23Data Mining
- More discovery driven than OLAP
- Finds hidden patterns, relationships in large dbs
- Infers rules to predict future behavior
- The patterns and rules are used to guide decision
making and forecast the effect of those decisions - Popularly used to provide detailed analyses of
patterns in customer data for 11 marketing
campaigns or to identify profitable customers
24Using Databases to Improve Performance and
Decision Making
- Predictive analysis
- Uses data mining techniques, historical data, and
assumptions about future conditions to predict
outcomes of events - e.g., Probability a customer will respond to an
offer or purchase a specific product - Data mining seen as challenge to individual
privacy - Combines information from many diverse sources to
create detailed data image about each of us - e.g., income, driving habits, hobbies, families,
and political interests
25Text MiningFor and Against Exercise
- Read the article and the following statement.
- Summarize the best evidence you can give FOR, or
in support of, the statement. - Summarize the best evidence you can give AGAINST
the statement. - Include only accurate evidence
- The benefits of text mining greatly outweigh the
costs.
26Web Mining
- Discovery and analysis of useful patterns and
information from WWW - e.g., to understand customer behavior, evaluate
effectiveness of Web site, etc. - Web content mining
- Knowledge extracted from content of Web pages
- Web structure mining
- e.g., links to and from Web page
- Web usage mining
- User interaction data recorded by Web server
27Managing Data Resources
- Establishing an information policy
- Information policy Specifies firms rules,
procedures, roles for sharing, standardizing data - Data administration Responsible for specific
policies and procedures data governance - Database administration Database design and
management group responsible for defining,
organizing, implementing, maintaining database
28Ensuring Data Quality
- More than 25 critical data in Fortune 1000
company databases is inaccurate or incomplete - Before new database in place, need to identify
and correct faulty data and establish better
routines for editing data once database in
operation - Most data quality problems stem from faulty input
29Managing Data Resources
- Data quality audit
- Structured survey of the accuracy and level of
completeness of the data in an IS - Data cleansing
- Detecting, and correcting data that are
incorrect, incomplete, improperly formatted, or
redundant - Enforces consistency among different sets of data
from separate IS