Title: Visualising Web Visitations: A Probabilistic Approach
1Visualising Web Visitations A Probabilistic
Approach
2the problem
visualise activity on web sites. not for
particular analysis eg. web designers but to
give a sense of activity, a human-ness to the
web site
www.hcibook.com
3Cyber-space is lonely
- browsing web pages is usually solitary
- like to feel that others have either
- trod these paths (pages) before
- or are currently visiting the site
- ambient displays present information in the
environment or users periphery - CSCW systems promote awareness for collaborative
work (eg. Tower, Awareness Maps)
4Cyber-space is lonely
- browsing web pages is usually solitary
- like to feel that others have either
- trod these paths (pages) before
- or are currently visiting the site
- ambient displays present information in the
environment or users periphery - CSCW systems promote awareness for collaborative
work (eg. Tower, Awareness Maps)
5web logs have (most of) it
- simple record of all requests to a web site
(pages, graphics, document downloads etc) - can count the number of hits to each page
- can obtain paths through a site (based on the
visitors IP address) - (well known problems with caching, proxy servers,
timeouts etc)
6visualising web sites
- numerous tools for studying structure and usage
- discover navigation problems
- categorise users, predict site usage, provide map
- often based on the premise that sites are
hierarchical - in general they are not and paths
through sites are definitely not
MAPA
Narcissus
7human footsteps
but if web site is visualised as a linked
structure then we are more likely to see
8alternative representation of site
- structure in terms of links between pages
- ball spring layout
- tension ? popularity
- only show popular links
- - can show paths
9more self-organising maps
- lots of other pages pulling popular ones
- a page can only be in one place,where do you
place it?
inspiration
- Kohonen Maps
- 2D arrangement of cells
- unsupervised learning neural network
- normally uses feature vectors (multi-dimensional
data) - clusters based on these vectors
- then find cell with closest match for any
particular page - similar pages would end up in same cell
?
?
?
?
?
?
10modified Kohonen
- we use a radically modified version of Kohonen
Maps which - uses a similarity matrix (co-occurrence of highly
used links) - doesn't have feature vectors but uses the pages
themselves - tends to place pages close to, but not in the
same cell (algorithm has a built in choose a
neighbour function)
11bring on Schrodingers equation
inspiration
- from the world of Quantum mechanics, very very
small particles dont necessarily exist at any
one particular point in space - it solves some of the problems if particles are
treated as wave function, and we can talk about
the probability that a particle will be at any
point in space - we could treat our web pages as wave functions
- through the clustering algorithm, calculate the
probability that a page will be in any cell on
the grid - gt pages not restricted to one place
- can also cope with large sites
12Quantum Web Fields (1)
- the darker the square, the higher the probability
that the page wants to be at that point in the
visualisation space - all pages are relative to each other
hci/search.html (444 visits)
home (1015 visits)
derived from a web log of 64110 records from HCI
book (Dix et al) website, pre-processed to 11992
visits by 1266 visitors. The 150 most popular
pages, in terms of adjacent pages in session
paths have been used in this example.
13Quantum Web Fields (2)
hci/ex/chapt7.html (251 visits)
hci/ex/chapt16.html (87 visits)
bimodal - high probability of a visitor coming
from/going to the home page (circled) and also
other exercise pages
14session paths
- Web fields are not particularly useful on their
own - but are the basis for plotting real session paths
- random page placement yet proportionate to
probabilities in its web field - subsequent pages are biased by the distance from
last page in session - each step occupies a new cell
- trail fades with time (numbered to ease
interpretation)
15example session path
- session paths visits cell which tend to be close
to one another but have the occasional jump
across the web field
- fairly compact trail
- commonly browsed pages as confirmed by web log
- (contents page, early chapters of the book,
overviews and resources)
16more session paths
- 23 page session
- browsing common pages
- shows problem with layout algorithm long
session paths - possible solution - fade out?
- 23 page session
- browsing common pages
- shows problem with layout algorithm long
session paths - possible solution - fade out?
- hopping rather than strolling?
- actually a sequential pass through most of the
chapter pages - may be a web crawler?
- if a common user activity then pages would have
been clustered
17multiple page visits
- visitor has alternated between the home page
(circled) and other pages - compare with ball spring visualisation (on
right) - QWF still gives intelligible path and shows
home page is distributed across the Web field
18other paths
- algorithm not constrained by site structure or
size - examples of long wandering session paths
19summary
- designed to give a sense of current human
activity on a web site rather than for analysis - can be installed on a web server the initial
Quantum web fields are calculated from past web
logs (collaborative filtering/interaction
history) - web fields can be updated to reflect changing
user browsing habits - visitor paths can be shown but i) there will be
some latency and ii) probably need to sample - appears to give recognisable paths for normal web
browsing activity
20summary (2)
- can visualise sites where there are many links
from a page - can use for large sites without changing the size
of the grid - have tried with web logs from other sites
- further work
- different ways of showing the paths (transparency
etc) - dealing with multiple paths
- further experiments with tuning the algorithm
- install on a site and run real time
- give to a web site users to display their own
path - effective, scientific evaluation with real users
in a working environment and include some case
studies!
21the end