Title: Informetrics, Webometrics and Web Use metrics
1Informetrics, Webometrics and Web Use metrics
Huimin Lu 10/21/2004
2Outline
History
Article 1 Bibliometrics WWW
Article 2 Bibliometrics of the WWW
Article 3 Authoritative Sources
Article 4 ParaSite
Conclusion
3History
Term introduced by Pritchard in 1969.
Pritchards explanation the application of
mathematical and statistical methods to books and
other media of communication.
4A1 Bibliometrics and the World Wide Web By Don
Turnbull
Bibliometrics Bibliometric laws Apply
bibliometric to WWW Metrics design
5A1 Bibliometrics
Classic citation analysis Refined classic
bibliometrics- Standard formula for impact n
journal citations / n citable articles
published- Basic formula for immediacy index of
influence n citations received by article
during the year / total number of citable
articles published
Bibliometric Coupling - Measure the number of
references two papers have in common to test for
similarity
Cocitation Analysis - Measure the relations
between cited documents
Common Errors - multiple authors lost,
self-citation, similar author names, human error,
etc.
6A1 Bibliometric Laws
- Bradfords Law of Scattering
- - clustering method Ran (n from 0 alt1), sum
R/(1-a) - Lotkas Law
- - inverse square
- Zipfs Law
- - familiar words with high frequency (nth word
k/n times)
7A1 Applying Bibliometric to Web
- Web surveys
- - Georgia Tech Graphics, Visualization, and
Usability Web Surveys - Web servers
- Add programming logic
- - Inaccurate data gathered skip standard
procedures, miss state information between usage
hits, server hits themselves dont represent true
usage.
8A1 Metrics Design
Configure Web server to gather comprehensive
metrics
Manage log files - Enhence reliability regular
backup, store log file analysis results and logs,
begin new logs timely, post results and log
information for comparasion. - Log analysis
tools Analog, WWWStat, GetStats, Perl Scripts. -
Standardization Extended Log File Format by WWW
Consortium Standards Committee
Downies attempt analysis user-based, request,
byte-based
Optimal Web content setup External
bibliometric gathering
9A2 Bibliometrics of the World Wide Web An
Exploratory Analysis of the Intellectual
Structure of Cyberspace By Ray R. Larson
Analysis of 30G Web pages collected by Inktomi
Web Crawler Cocitation analysis using DEC
AltaVista search engine
10A2 Growth and Usage of Web
WWW
11A2 Cocitation Analysis of Web
Attempt Map the intellectual structure of
Web Question Can cocitation techniques be
applied to charting the contents of cyberspace?
12A2 Methods
Selection of core set of items for
study Retrieval of cocitation frequency
information Compilation of the raw cocitation
frequency matrix Correlation analysis to convert
the raw frequencies into correlation
coefficients Multivariate analysis of the
correlation matrix Interpretation of the
resulting map and validation
13A2 Results
14A3 Authoritative Sources in a Hyperlinked
Environment By Jon M. Kleinberg
A new method for automatically extracting certain
types of information about a hypermedia
environment from its link structure.
15A3 Goal
- Types of query search and problem
- - Specific queries scarcity problem
- - Broad-topic queries abundance problem
- - Similar-page queries
- Synthesize the unreliable information contained
in the presence of individual links to provide a
set of authoritative pages relevant to an initial
query.
16A3 Common Approaches
Only S - Define S to be the top k pages indexed
by AltaVista - Rank pages according to their
in-degree S -gt T - Define same root set S -
Grow S to a larger base set T - Rank pages by
their in-degree
17A3 Their Approach
Extract small core sets of community of hubs and
authorities from T Authoritative pages - A novel
type of quality measure of the document in
hypermedia by algorithmic means. - Large
in-degree considerable overlap in sets of pages
that point to them Hub Pages - have links to
multiple relevant authoritative pages
18A3 Algorithm and Output
Method Iteratively propagates authority weight
and hub weight across links of the web graph,
converging simultaneously to steady states for
both types of weights Output a pair of sets (X,
Y) (X a small set of authorities, Y a small set
of hubs) referred by authors as community of hubs
and authorities Claim authoritative pages can
be identified as belonging to dense bipartite
communities in the link graph of the WWW via
their algorithm.
19A4 ParaSite Mining Structural Information on
the Web By Ellen Spertus
Varieties of link information on the Web How
the web differs from conventional hypertext How
the links can be exploited to build useful
applications
20A4 Classical Hypertext vs. Web
Classical hypertext - links dont cross site
even document boundaries - documents limited to
a single topic - manual answers each question in
exactly one place or in none - Hardly change
Web - links can cross site and document
boundaries - multiple topics permitted in one
web page - an answer could appear any number of
times on the web - constantly changing
21A4 Mining Links
Naïve Link Geometry - A useful technique for
finding pages on a given set of topics Hypertext
Links example - Categorized into upward,
downward, crosswise, and outward Directory
Links - Directory structure relation in pages in
the absence of hypertext links Structure within a
Page - Page can be considered a tree of nodes,
each with attached text and links embedded in the
text Other - Domain names, relationships between
concepts represented by words and phrases, paths
traveled through Web sites by visitors
22A4 Application
Finding Moved Pages - Exploiting hyperlinks -
Exploiting directory links Finding Related
Pages - Collaborative filtering - When searching
for a related page with similar pages got,
ParaSite can find the page (A) that has maximum
links to the pages user got and return other
pages referneced by A. A Person Finder
23Conclusion
World Wide Web information increase
exponentially and Internet architecture turns to
be more complicated. Applying bibliometrics to
the Web will help us control and manage web
information wisely.
24Example of Hypertext Link
Back to hypertext link