Discovering Digital Behavior: Data Mining and the Web - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Discovering Digital Behavior: Data Mining and the Web

Description:

350 million unique telephone numbers. The Data Mining Approach (Pregibon ... AT&T with a unique daily 'snapshot' of US ... ordering Christmas gifts ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 27
Provided by: padh3
Category:

less

Transcript and Presenter's Notes

Title: Discovering Digital Behavior: Data Mining and the Web


1
Discovering Digital BehaviorData Mining and the
Web
  • Padhraic Smyth
  • Information and Computer Science
  • University of California, Irvine
  • and
  • Jet Propulsion Laboratory
  • smyth_at_ics.uci.edu
  • www.ics.uci.edu/smyth/

2
Outline
  • Data Mining
  • techniques for analyzing massive data sets
  • ideas from computer science and statistics
  • Digital Behavior
  • behavior of individuals in a digital environment
  • e.g., Web, Windows, virtual reality,etc
  • Web Mining
  • inferring behavioral patterns from user data
  • applications clustering, prediction,
    personalization

3
Massive Data Sets
1 2 . . N
  • Characteristics
  • very large N (billions)
  • very high dimensionality d (thousands or
    millions)
  • heterogeneous data types
  • dynamic, non-stationary
  • Large N is relatively easy
  • Dimensionality, heterogeneity, non-stationarity
    are hard

4
Data Mining
  • What is data mining?
  • the search for structure and patterns in
    (massive) data sets
  • interdisciplinary typically uses ideas from
    computer science and statistics as appropriate
  • data-driven rather than theory-driven
    EDA-like
  • applications-focused
  • emphasis often on massive

5
Origins of Data Mining
pre 1960 1960s 1970s 1980s 1990s
Hardware (sensors, storage, computation)
Relational Databases
Data Mining
Machine Learning
AI
Pattern Recognition
Flexible Models
EDA
Pencil and Paper
Data Dredging
6
Data Mining Research Communities
Statistics
Databases
KDD and Data Mining
Visualization
Machine Learning
Applications
Artificial Intelligence
7
What do people want to do with their data?
  • Explore
  • visualization, EDA, etc
  • Summarize
  • aggregate patterns, data compression (no notion
    of inference)
  • Understand
  • build generative and descriptive models, e.g.,
    clusters
  • Prediction
  • predictive modeling (classification, regression,
    etc)
  • Change Detection
  • discover trends, unusual patterns, outliers, etc

8
Data Mining of your Telephone Calls
  • Background
  • ATT has about 100 million customers
  • It logs 300 million calls per day, 40 attributes
    each
  • 350 million unique telephone numbers
  • The Data Mining Approach (Pregibon and Cortes,
    KDD 1998)
  • Statistical model trained to adaptively track
    p(businesscalling data)
  • Every call to or from an ATT number is used to
    update the models
  • 350 million models (one per phone number)
  • Significant systems engineering
  • data are downloaded nightly, model updated
  • 20 processors, 6Gb RAM, terabyte disk farm
  • Provides ATT with a unique daily snapshot of
    US calling patterns

9
Digital Behavior
  • Modeling behavior of individuals in a digital
    environment
  • Motivated by availability of massive data sets
  • mouse clicks, key strokes
  • Web navigation
  • search queries
  • biometrics (cameras)
  • Goal develop better models of dynamic behavior
  • population level (aggregate)
  • individual level (personalized)
  • Use these models for improved design, feedback,
    prediction
  • Privacy issues

10
Digital Data Sets
  • Navigation Patterns
  • Server Side Web Access Logs
  • several gigabytes per day is not unusual
  • data can be noisy (difficult to identify users)
  • Client Side Browser Monitoring Software
  • e.g., Alexa.com software assistant which
    downloads all page requests nightly
  • Web Connectivity
  • patterns of connectivity, graphs
  • Demographics
  • background information on the user

11
Models for Digital Behavior
  • Information goals
  • hidden, difficult to assess
  • we can try to infer what we can
  • Behavioral Patterns
  • type of behavior reading, searching, browsing,
    etc
  • dynamics of this process
  • click rates
  • is a function of the users background, general
    characteristics
  • Coupled models
  • behavior is driven by information goals and
    static characteristics
  • vary over time, context-dependent

12
Dynamic Behavior
Population
Group
Individual
Real-Time Behavior
13
Information Goals
Dynamic Behavior
Population
Population
Group
Group
Individual
Individual
Real-Time Behavior
14
Information Goals
Dynamic Behavior
Static Characteristics
Population
Population
Population
Group
Group
Group
Individual
Individual
Individual
Real-Time Behavior
15
Information Goals
Dynamic Behavior
Static Characteristics
Population
Population
Population
Group
Group
Group
Individual
Individual
Individual
Real-Time Behavior
Digital Environment
16
Where Data Mining fits in
  • Observed data
  • the real-time behavior (pages navigated, timing
    between clicks, etc)
  • the environment (Web page content, connectivity,
    etc)
  • static characteristics (perhaps), e.g.,
    demographics of each user.
  • Hidden data
  • information goals
  • behavioral characteristics
  • Data Mining
  • postulate relatively simple but flexible models
    for behavior and information goals
  • discover which models best describe your
    individuals and populations from the data, i.e.,
    fit to the data

17
Examples of Models
  • Information Goal Modeling
  • information retrieval model a document as a bag
    of words
  • term vector vector counting occurrences of
    phrases in a document
  • model the users interests the same way,
  • a weighted term vector, weights depending on
    interest
  • Behavior Modeling
  • static models
  • histograms of pages a user tends to visit
  • dynamic models
  • stochastic finite-state machines, Markov models

18
Behavior Models at the Population Level
  • Huberman et al (Xerox PARC)
  • Science 1997
  • model the value of a page for any user as V
  • Vt Vt-1 e
  • where e is random zero-mean Gaussian noise
  • Expected stopping time seems to match observed
    session lengths for large populations of Web
    users

19
Behavior Models at the Individual Level
  • Zukerman et al
  • User Modeling Conference 1999
  • model each users navigation patterns as a Markov
    model (a finite state machine)
  • learn a users model from observed data over time
  • use the model to predict next page for the user
  • some empirical success
  • intended application is pre-fetching
  • but this type of prediction is very hard indeed

20
Dice Factories and the Reverend Bayes
Population Parameters
Group Parameters
Individual Parameters
Observed Data
Bayesian framework gt infer parameters given data
21
Application of Hierarchical Models
  • Clustering of individuals given their
    page-sequences
  • Clustering Markov models (Smyth, 1997)
  • Generative model mixture of Markov processes
  • each group is characterized by a Markov state
    machine
  • different groups have different navigation
    patterns
  • can use the EM algorithm to learn the different
    Markov models given the data
  • probabilistic model handles different sequence
    lengths naturally
  • Applied to large commercial Web log (with Igor
    Cadez)
  • produced novel insights into user behavior
  • valuable as an exploration tool for massive Web
    logs

22
Hierarchical Models for Online Prediction
  • Model a users information goal as a term-vector
  • infer (from Web page content) that a user is
    interested in French wine
  • can infer (from a population model) that user is
    unlikely to be interested in Bugs Bunny
  • population model acts as a prior

23
Hierarchical Models for Online Prediction
  • Model a users information goal as a term-vector
  • infer (from Web page content) that a user is
    interested in French wine
  • can infer (from a population model) that user is
    unlikely to be interested in Bugs Bunny
  • population model acts as a prior
  • but wait
  • user is really a parent ordering Christmas gifts
  • online data drives us away from initial prior
    (once we see enough data)
  • Bayesian framework provides a robust mechanism
    for these inferences

24
Research on Better Dynamic Models
  • Markov models are not ideal
  • constrain state-durations to be geometric
  • semi-Markov models provide a useful
    generalization
  • Introduce time (not just order)
  • model bursts of activity as a Poisson process
  • modulated by other variables, e.g.,
    age/experience
  • Couple to dynamic evolving information goals
  • use input-driven Markov models
  • inputs are coming from inferred information goals
  • much more realistic

25
Applications
  • Understanding
  • visualization of navigation patterns
  • Discovery
  • population patterns
  • individual behavior
  • Feedback
  • better design, real-time tools for information
    feedback
  • Prediction
  • trend detection, population-level predictions,
    bootstrapping a new Web site, etc

26
Summary
  • Data Mining
  • searching for structure in massive data sets
  • Digital Logs
  • records of individual behavior in digital
    environments
  • Web Mining
  • extracting models of user behavior from Web data
  • relatively early, but expect to see more new
    ideas
  • challenging research problems, useful applications
Write a Comment
User Comments (0)
About PowerShow.com