Massive Choice Data - PowerPoint PPT Presentation

About This Presentation
Title:

Massive Choice Data

Description:

Massive Choice Data. Co-Chairs: Prasad Naik and Michel Wedel. 7th ... Text data (e.g., product reviews, blogs, complaints) Images, Music. Emerging Data Types ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 9
Provided by: prasa9
Category:
Tags: choice | data | massive

less

Transcript and Presenter's Notes

Title: Massive Choice Data


1
Massive Choice Data
  • Co-Chairs Prasad Naik and Michel Wedel
  • 7th Triennial Choice Symposium
  • Wharton Business School
  • June 13 -17, 2007

2
Impetus for Massive Data?
  • Technological advances (Internet, RFID)
  • Computing advances
  • Methodological advances
  • Detailed data
  • Large sample, N
  • Many variables, p
  • Long time-series, T
  • Several products and SKUs, K

3
Different Types of Massive Data
  • Structured Data
  • Scanner panel, Loyalty card, CRM, Click-stream
  • Unstructured Data
  • Text data (e.g., product reviews, blogs,
    complaints)
  • Images, Music
  • Emerging Data Types
  • RFID, Video, social networks, recommendations,
    auctions, games, eye tracking, semantic Web 2.0

4
Is the data set just getting bigger?
  • What is the qualitatively difference?
  • Sometimes Nothing
  • Just a scale up problem
  • But the bigger size makes it harder to analyze in
    real-time
  • Sometimes Everything
  • Empty space phenomenon
  • Statistical Inference, diagnostics, sparseness
  • Visualization becomes tricky when p gt 10

5
Managers and Models
  • Managers need
  • real-time computation
  • decision optimization
  • Man Machine engagement
  • managerial inputs plus data analyses
  • Models need to be both
  • Simple ? for quick computation (real-time
    decisions),
  • Complex ? for realism in assumptions
  • How?
  • The notion of Workbench
  • Model averaging, forecast combination

6
Estimation and Computation
  • Estimation methods
  • Identified promising approaches for massive data
    analysis
  • Inverse regression methods
  • Regularization techniques (e.g., Lasso)
  • Particle filters
  • Logistic regression or Support Vector Machines
  • Computation power
  • Grid computing is needed
  • waiting for fast computer is not an option
  • Gap between industry and practice
  • Google has 2 Million processors

7
Directions and Action Points
  • Incentives for academics?
  • Industry-Academic partnerships
  • Cross-disciplinary collaborations

8
Thank you for this forum to share ideas!
Credits
Peter Lenk (Michigan) David Madigan
(Rutgers) Alan Montgomery (CMU)
  • Lynd Bacon (LBA Inc)
  • Anand Bodapati (UCLA)
  • Wagner Kamakura (Duke)
  • Jeffrey Kreulen (IBM Research)
Write a Comment
User Comments (0)
About PowerShow.com