Topical Locality in the Web - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Topical Locality in the Web

Description:

a href='blue.html' blue /a and a href='air.html' contains oxygen /a . /body /html ... anchor text to linked text as a function of amount of anchor text ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 15
Provided by: a15396
Category:
Tags: body | href | locality | outline | topical | web

less

Transcript and Presenter's Notes

Title: Topical Locality in the Web


1
Topical Locality in the Web
  • David J. Manura
  • Lehigh University, Dept. Computer Science and
    Engineering
  • 2002-09-24
  • (A paper presentation B. Davison, Topical
    Locality in the Web, In Proceedings of the 23rd
    International ACM SIGIR Conference on Research
    and Development in Information Retrieval, July
    2000.)

2
The question
  • Proposition 1 Web pages are linked to pages with
    related content
  • Proposition 2 HTML anchors describe the pages to
    which they point
  • To what extent do these hold?

3
Outline
  • Problem statement
  • Motivation
  • Methods
  • Results
  • Summary

4
Problem
lt!-- blue.html --gt lthtmlgtltheadgt lttitlegtColor
Bluelt/titlegt ltmeta namedesc
contentWhat blue is.gt lt/headgtltbodygt Blue is
the color of the sky. lt/bodygt lt/htmlgt
lt!-- sky.html --gt lthtmlgtltheadgt
lttitlegtSkylt/titlegt ltmeta namedesc
contentSky info.gt lt/headgtltbodygt The sky is lta
hrefblue.htmlgt bluelt/agt and lta
hrefair.htmlgt contains oxygenlt/agt. lt/bodygtlt/ht
mlgt
lt!-- air.html --gt lthtmlgtltheadgt lttitlegtAir
Compositionlt/titlegt ltmeta namedesc
contentWhat air is made of.gt lt/headgtltbodygt Air
consists of nitrogenand oxygen. lt/bodygtlt/htmlgt
5
Motivation
  • Web indexing
  • Search ranking
  • Meta-search engines
  • Focused crawlers
  • Intelligent browsing agents

6
Methods Dataset
  • 100 000 pages out of 3 000 000 pages in the
    neighborhood of highly-ranked pages (1999)
  • Two random outgoing links per page

7
Methods Textual Similarity Measures
  • TFIDF cosine similarityQuery term
    probabilityQuery-document overlap

8
Results1
http//a.com/b/c.htmlhttp//a.com/b/d/e.html
Distributions of URL match length.
9
Results2
Similarity of title,description, and
titledescription compared to page text.
10
Results3
Similarity of pages that are random, siblings,
and linked (same and different domains)
11
Results4
Similarities between sibling pages as a function
of distance between referring URLs
12
Results5
Similarities of anchor text to pages that are
random, siblings, and linked (same and different
domains)
13
Results6
Similarities of anchor text to linked text as a
function of amount of anchor text
To order online, click here.
14
Summary
  • Empirical evidence that topical locality mirrors
    spatial locality in web pages
  • Anchor text amount does not greatly affect
    similarities
  • Title, description, and anchor text represent
    target page well
Write a Comment
User Comments (0)
About PowerShow.com