Title: Business Intelligence Technologies
1Business Intelligence Technologies Data Mining
2Agenda
- Course Objectives
- Course Logistics
- Case discussion
- Introduction to BI Methods
3Discuss where you see data, and how companies
dealt with the data they have.
4Data is Everywhere
- Data in our daily life
- Retailers
- Manufacturers, supply chain
- Financial services credit card, credit score
- Scientific data
- remote sensors on a satellite, telescopes
scanning the skies - gene expression data
- scientific simulations generating terabytes of
data - Surveillance camera
- Insurance
- Telecommunication cell phone calls
- Social networking
- RFID video
5Course Objectives
- How to uncover business intelligence from data.
- Understand BI process
- Learn popular BI methods
- Master a data mining package
- Connect with business problems
6Agenda
- Course Objectives
- Course Logistics
- Case discussion
- Introduction to BI Methods
7Course Logistics
- Catherine Yang
- yiyang_at_ucdavis.edu
- Gallagher Hall, Room 3418
- 530-754-5967
- Office hours
- Walk-in
- By appointment
- Before and after class
- Call me
8Class Resources
- Class homepage
- http//faculty.gsm.ucdavis.edu/yiyang/teaching/26
9win2011/269win2011.html post slides, additional
articles, announcements, downloads - Text Book Text Pak Articles posted on class
homepage
9Text Book
Data Mining Techniques For Marketing, Sales,
and Customer Relationship Management, Second
Edition Michael Berry and Gordon Linoff, 2004,
Wiley, ISBN 0471-470643
- Course Schedule, Due dates
- Open Syllabus
10Group Term Project
- Group of 2-3 or individual
- Identify a company to study
- Focus Data and Business Intelligence
- Current practice
- Your recommendations
- Two phases
- Phase 1 Describe the chosen company
- Phase 2 Final report class presentation
11Software
- WEKA free
- Used for homework assignments
- Support both Windows and Mac
- Ill demo WEKA in most classes.
- Tutorial available on course website
- Every student is recommended to have a copy in
order to follow class demo. - Microsoft Access is optional
12Grading
- 15 Participation
- 3 Excellent
- 2 Good
- 1 OK
- 0 Absent with good reason and advance
notification - -3 Absent with no reason
- 60 Homework
- 6 assignments
- Problem solving, data analysis and/or case
discussion. - 25 Term Project
- Phase 1 report --- 5
- Final report --- 15
- Class presentation --- 5
13Misc. Issues
- Slides are available before class
- Download or print them before class
- Lectures may be different from the text book
- Some materials in the lectures may not be in the
book, so please focus in class - The book is a great reference book, not a bible
- Finish assigned case readings before each class
- Attendance is required
- In-class random cold call
14Agenda
- Course Objectives
- Course Logistics
- Case discussion
- Introduction to BI Methods
15Case 1 Bank of America
- Discussion Questions
- What is BoA trying to achieve?
- What are the alternative solutions? Pros and cons
of each? - What are the stages of data mining? Describe
each. - What are the data mining techniques used, and
what are the findings from each technique?
16Case 2 A Wireless Company
- Discussion Questions
- What is the company trying to achieve?
- How can data mining help?
- Where did data come from and How are data
processed? - How is the data mining approach evaluated?
17Case 3 SUV
- Discussion Questions
- What is the company trying to achieve?
- How can data mining help?
- What data files are used? What information are
contained in these files? - How is the two data mining technique combined and
why is it more powerful to combine?
18Agenda
- Course Objectives
- Course Logistics
- Case discussion
- Introduction to BI Methods
19Business Intelligence Technologies
- Enabling Technologies
- Simple data summary
- Database queries
- Data Warehouse tools
- Statistics
- Data Mining
20Simple Data Summary
- Histogram
- Distribution
- Average/Max/Min/Sum
21Data Tables in a Database
Take the Following Database
22Using database queries, we can get
The type of queries used to achieve the above
SELECT Description, Location, Sum(Quantity)
FROM Purchases P, Product Pr, Store S WHERE
P.ProdIDPr.ProdID AND P.StoreIDS.StoreID GROUP
BY Description, Location
Other types of questions which can be answered
using queries Return the stores with gt1m
revenue/day. Rank the cities according to sales.
23Data Warehouse Tools
- Managers often dont know how to write complex
database queries to retrieve desired information. - Requesting technical staff prevents managers to
make quick decisions in this competitive world. - Data warehouse tools allow managers to view data
in many ways without writing queries. - Data warehouse and OLAP are terms which are often
used interchangeably. While data in a data
warehouse is composed of the historical data of
the organization stored for end user analysis,
OLAP is a technology that enables a data
warehouse to be used effectively for analysis
using complex queries.
24Make Sure to Use the Right Dimension
An analysis of the number of deaths per month
revealed no patterns in data for a South African
hospital.
However, drilling down to deaths per hour
revealed that, over the past 3 years, more people
were dying on Wednesdays around 9am. The
hospital subsequently discovered that the
cleaning staff had been unplugging the life
support machines to plug in the floor polishing
equipment. (This is a true story.)
24
25Simpsons Paradox
- Simpsons Paradox refers to the reversal of the
direction of a comparison or an association when
data from several groups are combined to form a
single group. - This is caused by the different percentages in
admission in the two tables - they really
shouldn't be combined.
26Statistics Data Mining Methods
- Statistics
- Correlation Analysis, Regression, Time series
analysis - Data Mining Techniques
- Aka. Business Analytics, business intelligence
tech. - Data Mining aims to uncover previously unknown,
valuable, and actionable patterns and trends.
Output is generalized rules or (predictive or
descriptive) models, induced from the data. - Association Rules (beer diaper)
- Clustering (market segmentation)
- Classification (whether a user will buy)
- Others Personalization, Link analysis (Google),
Text mining
27What is data mining?
- Informal definition Finding patterns in data
- More formal definition Non-trivial process of
identifying valid, novel, potentially useful, and
understandable patterns in data - Business Intelligence a process for increasing
the competitive advantage of a business by
intelligent use of available data in decision
making. (one definition)
28What is a pattern?
- Informal definition Any structure that can be
found in the data. e.g. - People with good credit ratings have fewer
accidents - Risk 0.93prior_default 0.23num_cards 1.3
employed - On Friday nights male customers who buy diapers
also tend to buy beer - Not every pattern is desirable
- People with high income buy expensive cars
29Examples from Different Industries
- My consulting projects
- Chinese Supermarket Promotion Planning
- Auto Lead Price Prediction
- Distribution Center
- Newspaper (the Boston Globe)
- Airlines issuing credit cards to learn more about
customers (do they travel a lot, do they use
competitors product). - Financial market (Neural fair value)
- Pfizer pharmaceuticals
- Construct a predictive model which tells patients
their cholesterol risk score. High risk patients
can request Lipitor, Pfizers cholesterol
medication. - Fidelity
- Cross selling, when a customer calls, know what
other services to offer
30An example Building online user profiles What
data is needed?
- Personal information, preferences interests
- Registration data, including demographic data
- Customer ratings
- Purchasing data
- What was bought, when and where
- Browsing visitation data
- Clickstream (Weblog files)
- Build an integrated (3600) view of a customer
- Collect customer data across all the
communication channels
31Data Sources- Explicit vs. Implicit
- Explicit solicited from the user easy to get
but - Demographics, interests, etc.
- Intrusive inconvenience users
- Misleading/deceptive inaccurate information
provided (inadvertently or on purpose) - Static Preferences change over time
- Implicit collected automatically from
touchpoints - Data based on users actions
- Non-intrusive transparent to users
- Accurate/Factual data speaks objectively (a
hope) - Dynamic Changes can be learned and included
- Messy, need to figure out how to utilize these
data - Privacy concerns
32Building Profiles Using Different Techniques
- Factual information (simple summary, queries)
- Demographic (e.g., name, address, age)
- Behavioral (e.g., favorite type of book
adventure, largest transaction - 295) - Things learned from data (stat, data mining)
- Rules, e.g.,
- If customer visits childrens book section of BN
from Amazon, she tends to go back soon - Sequences, e.g.,
- Usually, Joe visits page X, then Y, Z
33Steps for Data-driven Solutions
- Finding information from data is not enough
- Must respond to the information by taking actions
- Turning
- Data into Information
- Information into Action
- Action into Value
- Four-step process
- 1, Identify the business problem
- 2, Analyze data to transform the data into
actionable information - 3, Act on the information
- 4, Measure the results
341, Identify the Business Problem
- Business problems can often be big and vague
- Data analysis tasks need to be more concrete
- Sample business problems
- How to improve response rate to a direct
marketing campaign? - Which ads to place on web pages in order to
maximize ads revenue? - Understanding customer attrition/churn
- Or more specific problems
- What types of customers responded to our last
campaign? - Where do the best customers live?
- Are long waits in check-out lines a cause of
customer attrition? - What products should be promoted with our XYZ
product? - Another goal of this lecture is for you to think
strategically about what business problems can be
addressed using data.
352, Analyze Data to Transform it into Actionable
Information
- Success is making business sense of the data
- Need to figure out the specific data analysis
tasks used to address the business problems
identified in the first step. - Deal with messy data
- Dont expect clean data. Data cleaning accounts
for 70 of efforts - Consolidate data from different sources
- Need to collect additional data? handle missing
value - Transform data to the right format for analysis
- Implementation problems
- What information different techniques can bring
out from the data - What techniques to use?
- How to use the techniques?
363, Take Action
- Taking action is the whole purpose of data
analysis - Now with discovered information from data, we
have better informed decisions. - Examples
- Select customers to target
- Adjusting inventory levels
- Rearrange products on the shelves
- Customize products for different segments
- Adjusting price level
374, Measure Results
- Assess the impact of the action taken
- Often overlooked, ignored, skipped
- Planning for the measurement should begin when
analyzing the business opportunity, not after it
is all over - Assessment questions (examples)
- Did this campaign do what we hoped?
- Did some offers work better than others?
- Lower cost, increase profit?
38Business Value of Data
- Companies invest in data-related hardware,
software and services. - How to quantify the return of the investment.
- Realize value in data by transforming data to
information and information to action - It is not always easy to quantity the exact value
data provides.
39Data Driven Applications and Business Models
- Market Segmentation
- Personalization/product recommendation
- Google
- Capitol One
- BroadVision
- comScore
- Tricision
40Take-Away Messages
- Decisions should be supported by real data. Dont
assume, use real data to backup your decision to
avoid risks. - A lot can be learned from data.
- Innovative business strategies can be derived
from data