MINING USERS' NAVIGATION PATTERNS AND PREDICTING THEIR NEXT STEPS Mark Levene School of Computer Science and Information Systems Birkbeck University of London - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

MINING USERS' NAVIGATION PATTERNS AND PREDICTING THEIR NEXT STEPS Mark Levene School of Computer Science and Information Systems Birkbeck University of London

Description:

Need to make sense of practically infinite information ... .net - - [21/Sep/2003:01:02:21 0100] 'GET /~mark/bookshops.html HTTP/1.0' ... – PowerPoint PPT presentation

Number of Views:328
Avg rating:3.0/5.0
Slides: 32
Provided by: georger9
Category:

less

Transcript and Presenter's Notes

Title: MINING USERS' NAVIGATION PATTERNS AND PREDICTING THEIR NEXT STEPS Mark Levene School of Computer Science and Information Systems Birkbeck University of London


1
MINING USERS' NAVIGATION PATTERNS AND PREDICTING
THEIR NEXT STEPS Mark LeveneSchool of Computer
Science and Information SystemsBirkbeck
University of London
2
Observation 1The Web is a Complex Network(Map
of the Internet Bell Labs 1998)
3
Observation 2Need to make sense of practically
infinite information
  • cr008r01-3.sac2.fastsearch.net - -
    21/Sep/2003004840 0100 "GET
    /mark/handheld.html HTTP/1.0" 200 1730 "-"
    "FAST-WebCrawler/3.8 (atw-crawler at fast dot no
    http//fast.no/support/crawler.asp)"
  • cr008r01-3.sac2.fastsearch.net - -
    21/Sep/2003004916 0100 "GET
    /mark/games.html HTTP/1.0" 200 6582 "-"
    "FAST-WebCrawler/3.8 (atw-crawler at fast dot no
    http//fast.no/support/crawler.asp)"
  • cr008r01-3.sac2.fastsearch.net - -
    21/Sep/2003010221 0100 "GET
    /mark/bookshops.html HTTP/1.0" 200 3568 "-"
    "FAST-WebCrawler/3.8 (atw-crawler at fast dot no
    http//fast.no/support/crawler.asp)"
  • cr008r01-3.sac2.fastsearch.net - -
    21/Sep/2003014104 0100 "GET /mark/web.html
    HTTP/1.0" 200 14639 "-" "FAST-WebCrawler/3.8
    (atw-crawler at fast dot no http//fast.no/suppor
    t/crawler.asp)"
  • cr008r01-3.sac2.fastsearch.net - -
    21/Sep/2003014217 0100 "GET
    /mark/download/optdb_plan.pdf HTTP/1.0" 304 -
    "-" "FAST-WebCrawler/3.8 (atw-crawler at fast dot
    no http//fast.no/support/crawler.asp)"
  • ip68-98-199-25.mc.at.cox.net - -
    21/Sep/2003021027 0100 "GET
    /mark/download/optdb_integrity_constraints.pdf
    HTTP/1.0" 200 32768 "http//search.yahoo.com/searc
    h?pdefinitionofsuperkeyseiUTF-8frfp-topn2
    0fl0xwrt" "Mozilla/4.0 (compatible MSIE 6.0
    Windows NT 5.1)"
  • ip68-98-199-25.mc.at.cox.net - -
    21/Sep/2003021028 0100 "GET
    /mark/download/optdb_integrity_constraints.pdf
    HTTP/1.0" 206 158146 "-" "Mozilla/4.0
    (compatible MSIE 6.0 Windows NT 5.1)"
  • adsl-68-74-73-241.dsl.emhril.ameritech.net - -
    21/Sep/2003023929 0100 "GET
    /mark/book.html HTTP/1.1" 200 3373
    "http//www.google.com/search?hlenieUTF-8oeUT
    F-8qrelationaldatabasesbasic" "Mozilla/4.0
    (compatible MSIE 6.0 Windows NT 5.1)"
  • adsl-68-74-73-241.dsl.emhril.ameritech.net - -
    21/Sep/2003023930 0100 "GET
    /mark/front_cover.gif HTTP/1.1" 200 64168
    "http//www.dcs.bbk.ac.uk/mark/book.html"
    "Mozilla/4.0 (compatible MSIE 6.0 Windows NT
    5.1)"
  • crawler14.googlebot.com - - 21/Sep/2003033552
    0100 "GET /mark/games.html HTTP/1.0" 200 6582
    "-" "Googlebot/2.1 (http//www.googlebot.com/bot.
    html)"
  • cr008r01-3.sac2.fastsearch.net - -
    21/Sep/2003041559 0100 "GET
    /mark/download/optdb_table.pdf HTTP/1.0" 304 -
    "-" "FAST-WebCrawler/3.8 (atw-crawler at fast dot
    no http//fast.no/support/crawler.asp)"
  • drone10.sv.av.com - - 21/Sep/2003044709
    0100 "GET /mark/ HTTP/1.0" 200 5183 "-"
    "Scooter/3.3_SF"
  • crawler14.googlebot.com - - 21/Sep/2003044922
    0100 "GET /mark HTTP/1.0" 301 309 "-"
    "Googlebot/2.1 (http//www.googlebot.com/bot.html
    )"
  • cr008r01-3.sac2.fastsearch.net - -
    21/Sep/2003051846 0100 "GET
    /mark/optdb_mailing_list.html HTTP/1.0" 200 622
    "-" "FAST-WebCrawler/3.8 (atw-crawler at fast dot
    no http//fast.no/support/crawler.asp)"
  • pool-68-162-19-184.nwrk.east.verizon.net - -
    21/Sep/2003053501 0100 "GET
    /mark/download/optdb_erd.pdf HTTP/1.1" 200 0
    "http//www.google.com/search?qentityrelationshi
    pconcepthlzh-TWlrieUTF-8oeUTF-8start10
    saN" "Mozilla/4.0 (compatible MSIE 6.0 Windows
    NT 5.1 YComp 5.0.2.6)"
  • pool-68-162-19-184.nwrk.east.verizon.net - -
    21/Sep/2003053502 0100 "GET
    /mark/download/optdb_erd.pdf HTTP/1.1" 206 0 "-"
    "Mozilla/4.0 (compatible MSIE 6.0 Windows NT
    5.1 YComp 5.0.2.6)"
  • pool-68-162-19-184.nwrk.east.verizon.net - -
    21/Sep/2003053507 0100 "GET
    /mark/download/optdb_erd.pdf HTTP/1.1" 206
    275480 "-" "Mozilla/4.0 (compatible MSIE 6.0
    Windows NT 5.1 YComp 5.0.2.6)"
  • crawler14.googlebot.com - - 21/Sep/2003054947
    0100 "GET /mark/ HTTP/1.0" 200 5183 "-"
    "Googlebot/2.1 (http//www.googlebot.com/bot.html
    )"
  • cr008r01-3.sac2.fastsearch.net - -
    21/Sep/2003061412 0100 "GET
    /mark/download/webgraph.pdf HTTP/1.0" 304 - "-"
    "FAST-WebCrawler/3.8 (atw-crawler at fast dot no
    http//fast.no/support/crawler.asp)"

4
Overview
  • Mining of sequential patterns
  • Web mining
  • Prediction
  • Detection of unexpected events
  • Mobile mining
  • A navigation engine
  • What next?

5
Availability of Sequential Log Data
  • Log files (web site, search engine, wireless
    network, ) contain access data (time-stamp, id
    e.g. IP, cookie, tag, location, query,
    clicksteam data, )
  • Time-ordered sessions of sequential clicks or
    movement can be mined.
  • Logs are a valuable source of information for
    understanding what users are doing and how a
    space is being used.
  • Log files are not enough!

6
Mining Navigation Patterns
  • Each user session induces a user trail through
    the site/space
  • A trail is a sequence of accesses (interactions
    with landmarks) followed by a user during a
    session, ordered by time of access.
  • A pattern in this context is a popular trail.
  • Co-occurrence of accesses is important, e.g.
    shopping-basket and checkout.
  • Use a variable length history Markov chain model.
    (It matters where you came from!)

7
Interaction Network (1st Order Markov Chain, no
past history) of Users and Landmarks
8
Interaction Tree (Suffix Tree) Representing a
Variable Length Markov Chain(Record access
information for each node)
9
Web Usage Mining
  • Analyse trails that emerge from a user (or a
    group of users) surfing through a web space.
  • Applications
  • Prefetching and caching web pages
  • Prediction (Recommender systems)
  • Personalisation
  • Clickstream analysis, e.g. eCommrece
  • Web site reorganisation
  • Detection of unexpected accesses

10
Hit and Miss Prediction
  • Try and predict next link the user will follow
    from the longest suffix of a trail that can be
    matched in the suffix tree.
  • Assume that the maximum probability link was
    followed.
  • Count a hit as 1 and a miss as 0.

11
Rank-based Prediction
  • Try and predict the next link as before.
  • Rank the links from 1 to r (the rth link was
    followed) and record the MAE (Mean Absolute
    Error) as r-1.
  • Can generalise to top-n prediction.

12
Probability-based Prediction
  • Try and predict the next link as before.
  • Let p be the maximum probability link.
  • The score recorded is the ignorance defined as
    -log2(p).
  • Ignorance is nonlinear, ranging from zero to
    infinity i.e. the penalty of performing less
    than random is large.

13
Unexpected Accesses
  • Cannot predict rare events but can detect them.
  • If probability of access small, say less or equal
    to alpha, then the access is unexpected.
  • Alternatively, could use the 80/20 rule.
  • If prediction is good for expected events, can
    detect the unexpected.

14
Proof of Concept
  • Temporal evaluation split data into k sequences
    ordered by time, infer the model from seqs 1 to
    i and evaluate prediction on trails from seq i1.
  • Analysis in progress!
  • I Present trends on sample of trails, in
    collaboration with Jose Borges from University of
    Porto.

15
Data Analysis Hit and Misserror against model
order
16
Data Analysis MAE error against model order
17
Data Analysis Ignorance error against model
order
18
Ignorance Probabilities2(-Ignorance)
MSWEB Probabilities All 0
0.01 0.025 0.05 0.1 0.0450
0.0574 0.0938 0.1158 0.1290
0.1875 0.0342 0.0871 0.1186 0.1439
0.1684 0.2263 0.0249 0.1053 0.1357
0.1630 0.1896 0.2478 0.0227 0.1101
0.1426 0.1720 0.1997 0.2596 LTM
Probabilities All 0 0.01
0.025 0.05 0.1 0.0722 0.1220
0.2562 0.3513 0.4670 0.6073 0.0702
0.2085 0.3199 0.4026 0.5059
0.6171 0.0624 0.2502 0.3578 0.4323
0.5280 0.6382 0.0568 0.2807 0.3913
0.4623 0.5506 0.6536
19
Mobile Usage Mining
  • Analyse trails that emerge from a user (or a
    group of users) moving through a physical space.
  • Applications
  • Prediction (Recommender systems)
  • Personalisation (location awareness)
  • Movement analysis
  • Ubiquitous search
  • Space reorganisation
  • Detection of unexpected interactions
  • In collaboration with colleagues from Birkbeck,
    University of London

20
Trail in a Museum Exhibition
21
Trail from entrance to exit in a Zoo
22
Navigation Engine Architecture
23
Query Interface
24
Landmark Query
25
Trail Query
26
Hot Spots
27
Popular Trails
28
What Next?
  • More evaluation.
  • Implement and evaluate prediction and pattern
    detection algorithms within the navigation
    engine.
  • Investigate novel visualisation methods of usage
    within the navigation engine.
  • Thank you!

29
Data Analysis Hit and Miss
30
Data Analysis MAE
31
Data Analysis Ignorance
  • Probabilities
  • LTM MSWEB
Write a Comment
User Comments (0)
About PowerShow.com