Webometrics - PowerPoint PPT Presentation

About This Presentation
Title:

Webometrics

Description:

Make an impact in your common language community. ... Collections of videos, interviews, presentations, animated graphs, and even ... – PowerPoint PPT presentation

Number of Views:509
Avg rating:3.0/5.0
Slides: 37
Provided by: frl
Category:

less

Transcript and Presenter's Notes

Title: Webometrics


1

Webometrics
  • ??????? ???
  • ??? ???????????? ???
  • ???? ??? ???

2
History
  • Since 2004, the Webometrics ranking is published
    twice a year (January and July).
  • This ranking has a coverage of more than 16,000
    higher education institutions.
  • The most recent ranking is the January 2009
    Edition.

3
Methodology
  • The unit for analysis is the institutional
    domain, so only universities and research centers
    with an independent web domain are considered.
  • University activity is multi-dimensional. So the
    ranking is built based on combining a group of
    indicators of web presence that measures these
    different aspects.

4
Indicators
  • Size the number of pages in a domain (as
    recovered by search engines)
  • Visibility the number of unique external links
    received by a domain
  • Rich File the number of files of certain file
    types in a domain
  • Scholar the number of papers and citations in a
    domain

5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
Metrics
  • For each indicator, the universities are ranked.
  • Then the ranks of four indicators are combined
    according to a formula as follows.

13
Verifiable Data
  • The only source for the data of this ranking is a
    small set of globally available, free access
    search engines.
  • All the results can be duplicated according to
    the described methodologies, taking into account
    the explosive growth of the web contents, their
    volatility and the irregular behavior of the
    commercial engines.

14
Bad Practices
  • The use of link farms and paid backlinks to
    improve the position in this rankings is not
    acceptable.
  • The involved institutions does not have a place
    in this ranking and will not be classified in
    future editions.
  • Random checks are made to ensure the correctness
    of the data obtained.

15
Ranking of Interests
  1. 55 "National_Taiwan_University" 116 87 46 13
  2. 179 "National_Chiao_Tung_University" 90 178 171
    590
  3. 273 "National_Taiwan_Normal_University" 211 270
    400 527
  4. 274 "National_Cheng_Kung_University" 316 421 235
    53
  5. 282 "National_Sun_Yat-Sen_University" 333 405
    348 28
  6. 308 "National_Tsing_Hua_University_Taiwan" 161
    502 220 328
  7. 370 "National_Central_University" 427 576 355
    30
  8. 384 "National_Chung_Cheng_University" 321 446
    390 644
  9. 391 "National_Chengchi_University" 336 414 492
    675
  10. 491 "Tamkang_University" 461 563 806 550
  11. 529 "I-Shou_University" 318 855 397 469
  12. 564 "National_Chung_Hsing_University" 448 713
    529 861

We need to work on size, visibility, and rich
files, while keeping our strength in scholar.
16
Ranking of Interests
  1. 659 "Providence_University" 543 1,067 600 280
  2. 716 "Fu_Jen_Catholic_University" 622 848 610
    1,321
  3. 748 "Feng_Chia_University" 616 1,191 959 156
  4. 772 "Yuan_Ze_University" 409 1,021 574 1,564
  5. 836 "NTUST" 1,068 1,325 563 120
  6. 851 "Shih_Hsin_University" 617 1,356 948 336
  7. 896 "Tunghai_University" 418 1,235 810 1,470
  8. 905 "National_Dong_Hwa_U" 816 1,544 580 200
  9. 914 "Soochow_University_Taiwan" 560 1,393 1,042
    665
  10. 921 "Chaoyang_University_of_T" 885 1,500 803 174
  11. 924 "NYUST" 719 1,470 1,212 109

17
URL Naming
  • Each institution should choose a unique
    institutional domain that can be used by all the
    websites of the institution.
  • Avoid changing the institutional domain as it has
    a devastating effect on the visibility values.
  • The alternative or mirror domains should be
    disregarded.
  • Use of well known acronyms
  • Should consider including descriptive word, like
    the name of the city, in the domain name.
  • Change IP address to domain name!

18
Content Create
  • Allow a large proportion of staff, researchers or
    graduate students to be potential authors.
  • Individual persons or teams should maintain their
    own websites.
  • Libraries, documentation centers and similar
    services can be responsible of large databases,
    including bibliographic ones and large
    repositories (thesis, pre-prints, and reports)
  • Hosting external resources can be interesting for
    third parties and increase the visibility
    Conference websites, software repositories,
    scientific societies and their publications,
    especially electronic journals.

19
Content Convert
  • Important resources available in non-electronic
    format can be converted to web pages easily.
  • Most of the universities have a long record of
    activities that can be published in historical
    web sites.
  • Other resources, as candidate for conversion,
    include past activities reports or pictures
    collections.

20
Interlinking
  • Measuring and classifying the links from others
    can be insightful.
  • You should expect links from your natural
    partners
  • locality or region
  • similar organizations
  • portals covering your topics
  • colleagues or partners personal pages.
  • Make an impact in your common language community.
  • Check for the orphaned pages, i.e. pages not
    linked from another.
  • Most popular pages or directories are relevant.

21
Language
  • The WWW audience is truly global, so one should
    not think locally.
  • Language versions, especially in English, are
    mandatory not only for the main pages, but for
    certain selected sections such as scientific
    documents.

22
Rich Files
  • Although html is the standard format of web
    pages, sometimes it is better to use rich file
    formats.
  • Provide versions of different formats.

23
Search Engine Issues
  • Search engine friendly design
  • Avoid cumbersome navigation menus based on Flash,
    Java or JavaScript that can block the robot
    access.
  • Deep nested directories or complex interlinking
    can block robots too.
  • Databases and even highly dynamic pages can be
    invisible for some search engines, so use
    directories or static pages instead or as an
    option.
  • Plain is good.

24
Archiving
  • Maintain a copy of old or outdated material in
    the site.
  • Archive media materials in web repositories.
    Collections of videos, interviews, presentations,
    animated graphs, and even digital pictures could
    be very useful in the long term.

25
Standards for Sites
  • The use of meaningful titles and descriptive
    meta-tags can increase the visibility of the
    pages.
  • Add authoring info, keywords and other data about
    the web sites.

26
Challenge
  • If the web performance of an institution is below
    the expected position according to their academic
    excellence, university authorities should
    reconsider their web policy, promoting
    substantial increases of the volume and quality
    of their electronic publications.
  • Again, NSYSU needs to improve on size,
    visibility, and rich files, while keeping the
    strength in scholar.

27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
Experiments
  • For each institutional domain, we collect the
    data from search engines, per the description of
    methodology.
  • Then we compare our ranking against the
    Webometrics ranking.
  • We need to verify whether our data agree with
    theirs. It may not agree exactly, but we can
    evaluate the correlation.

31
Size
  • Number of pages recovered from four engines
  • Google, Yahoo, Live Search and Exalead
  • For each engine, results are log-normalized to 1
    for the highest value.
  • For each domain, maximum and minimum results are
    excluded.
  • An institution is assigned a rank according to
    the combined sum.

32
Visibility
  • The total number of unique external links
    received by a site
  • Data gathered from Yahoo, Live and Exalead
    (Google excluded)
  • For each engine, results are log-normalized to 1
    for the highest value.
  • An institution is assigned a rank according to
    the combined sum.

33
Rich Files
  • Four different file formats
  • Adobe Acrobat (pdf)
  • Adobe PostScript (ps)
  • Microsoft Word (doc)
  • Microsoft Powerpoint (ppt)
  • Data (number of files) are extracted using Google
  • Merging the results for each file type after
    log-normalization, in the same way as described
    before

34
Scholar
  • Google Scholar provides the number of papers and
    citations for each academic domain.
  • These results from the Scholar database represent
    papers, reports and other academic items.

35
Number of Swaps
  • For two rankings of domains (institutions), say r
    and s, the number of swaps to bring ranking r to
    s is defined computationally by
  • If the top-rank domain in s, say x, ranks 5th in
    r, then 4 swaps is needed for to bring x to top.
  • Find the second-rank domain of s in r, bring it
    to second.
  • Continue until the entire order is correct.
  • Accumulate the number of swaps, say N.
  • Smaller N is better.

36
Test 1 03/27/2009
  • Scholar (n 23) N 17
  • Size (n 23) N 28
  • Rich files (n 23) N 62
  • Scholar (n 100) N 555
  • Note the worst-case scenario is n(n-1)/2 swaps,
    and a random ranking is around n(n-1)/4.
Write a Comment
User Comments (0)
About PowerShow.com