PowerPointPrsentation - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

PowerPointPrsentation

Description:

... and filename, path, level and referrers. are written into an array ... L path filename referrer. 0 test.de/ index.html. 1 test.de/A a.html index.html. C/c2.html ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 27
Provided by: abbrat
Category:

less

Transcript and Presenter's Notes

Title: PowerPointPrsentation


1
Sitemap Creation
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
2
Outline 1. Task 2. Definition 3. Site
Topology in Web Mining 4. Pre-processing 5.
Rules and Patterns 6. Concept Hierarchies 7.
my Approach
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
3
1. Task
  • automated creation of a sitemap
  • input URL
  • output a sitemap
  • representation of linkage between all sites
    which can
  • be reached from the given URL

Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
4
Outline 1. Task 2. Definition 3. Site
Topology in Web Mining 4. Pre-processing 5.
Rules and Patterns 6. Concept Hierarchies 7.
my Approach
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
5
2. Definition
  • sitemap is a graph, nodes pages, edges
    hyperlinks
  • spanning tree minimal graph
  • includes all nodes but only subset of edges
  • only one path between 2 nodes
  • no cycles
  • no redundancy
  • minimum cost
  • different spanning trees for
  • one graph

Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
6
Outline 1. Task 2. Definition 3. Site
Topology in Web Mining 4. Pre-processing 5.
Rules and Patterns 6. Concept Hierarchies 7.
my Approach
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
7
3. Site Topology in Web Mining
  • automated creation of sitemaps for Web Usage
    Mining
  • Systems
  • difficulties historical development of the web
  • static and single frame sites easy to map
  • problems CGI, JavaScript, Frames, Flash, .
  • personalisation techniques

Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
8
3. Site Topology in Web Mining
  • Web Mining is divided into 3 Categories
  • sitemap creation is part to Web Structure Mining
  • topology is also important in Web Usage Mining
    and
  • Web Content Mining
  • helps solving different problems data
    preparation and
  • analysis

Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
9
3. Site Topology in Web Mining
(Architecture of WEBMINER by Robert Cooley)
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
10
3. Site Topology in Web Mining
  • together with other information we need
    knowledge of
  • site topology for
  • completing the tasks of
  • - user identification
  • - path completion
  • - filtering rules and patterns
  • - exploring users behaviour with concept
    hierarchies

Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
11
Outline 1. Task 2. Definition 3. Site
Topology in Web Mining 4. Pre-processing 5.
Rules and Patterns 6. Concept Hierarchies 7.
my Approach
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
12
4. Pre-processing
  • log-files are mostly the only source for
    information
  • about user behaviour
  • for instance cookies are often seen as privacy
    leaks
  • unfortunately logs dont contain enough
    information
  • sometimes users behaviour has to be guessed
  • sitemaps can help

Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
13
4. Pre-processing user identification
  • - user identification
  • task identifying the unique user
  • which log file entries has been caused by one
    user
  • two log entries with same IP
  • - we assume only one user has caused them
  • - this can be wrong
  • - a possible heuristic
  • - a new user if requested page can not be
  • reached by the already visited pages

Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
14
4. Pre-processing user identification
- example
log file sitemap
IP URL
1 123.456.78.9 A.html
2 123.456.78.9 B.html
3 123.456.78.9 C.html
4 123.456.78.9 Y.html
- only heuristics bookmarks URL is typed in
directly
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
15
4. Pre-processing path completion
  • path completion
  • problems proxies and browser cache
  • - although user requests page, no entries in
    logs
  • - e.g. Back button of Browser
  • - while the user uses this button no log
    entries
  • - if the user requests a page he did not visited
  • before, a log entry is caused
  • - possible heuristic
  • - if the new page can be reached by
  • one of the pages the user visited before
  • ? he used the back button

Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
16
4. Pre-processing path completion
- example
log file sitemap
IP URL
1 123.456.78.9 A.html
2 123.456.78.9 B.html
3 123.456.78.9 C.html
4 123.456.78.9 X.html

- the entries B, A are added to the log between C
and X
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
17
Outline 1. Task 2. Definition 3. Site
Topology in Web Mining 4. Pre-processing 5.
Rules and Patterns 6. Concept Hierarchies 7.
my Approach
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
18
5. Rules and Patterns
  • sitemaps help to compare developers and user
    view
  • developers view sitemap, usage proposal
  • user view real usage, patterns and rules
  • the bigger the difference the more interesting
  • - every rule that confirms the site topology is
  • uninteresting
  • rules that are expected but not present
    important
  • information
  • - for instance head page

Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
19
Outline 1. Task 2. Definition 3. Site
Topology in Web Mining 4. Pre-processing 5.
Rules and Patterns 6. Concept Hierarchies 7.
my Approach
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
20
6. Concept Hierarchies
  • a tree of concepts with increasing level of
    abstraction
  • is called a concept hierarchy

user
Head page index.html
potential customer
high potential customer
non customer
extern links page
customer
Thanks for ordering... page
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
21
6. Concept Hierarchies
  • more abstract view on the page..
  • is used for comparison of intended user
    behaviour and the
  • real behaviour
  • finding ideal navigation paths
  • help to redesign the site to e.g. increase
    user-customer-
  • conversion
  • usability  
  • sitemaps help to map given concepts to all found
    pages

Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
22
Outline 1. Task 2. Definition 3. Site
Topology in Web Mining 4. Pre-processing 5.
Rules and Patterns 6. Concept Hierarchies 7.
my Approach
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
23
7. my Approach
  • input URL ? path and filename
  • output complete sitemap and a spanning tree
  • first file is opened and filename, path, level
    and referrers
  • are written into an array
  • file searched for hyperlinks
  • these links are filtered, URL is pre-processed
  • (direct, relative)
  • file to which the link points is opened,.....

Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
24
7. my Approach
- very simple example
index
L path filename referrer
A/a.html
0 test.de/ index.html
C/c2.html
B/b.html
1 test.de/A a.html index.html
1 test.de/B b.html index.html
C/c1.html
2 test.de/C c1.html B/b.html
2 test.de/C c2.html B/b.html
C/c2.html
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
25
7. my Approach
  • visualisation with H3Viewer by Tamara Munzner
  • spanning tree of the sitemap

Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
26
literature
- Tamara Munzner Interactive Visualization of
Large Graphs and Networks - Robert Cooley,
Bamshad Mobasher, Jaideep Srivastava Data
Preparation for Mining World Wide Web Browsing
Patterns - Robert Cooley The Importance of
Understanding Web Site Structure and Content
when Performing Web Usage Mining - Bettina
Berendt, Myra Spiliopoulou - Analysis of
navigation behaviour in web sites integrating
multiple information systems - Wie werden
Surfer zu Kunden? Navigationsanalyse zur
Ermittlung des Konversionspotenzials
verschiedener Bereiche einer Site
Andrej Wunrau Sitemap Creation Seminar Web
mining
16. 1. 2003
Write a Comment
User Comments (0)
About PowerShow.com