International Accesses to a Digital Library of ETDs - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

International Accesses to a Digital Library of ETDs

Description:

Catalan or English or Spanish. Spain (Catalunya) UPC. 1. Catalan. Spain (Catalunya) UOC ... They are accessed from different countries ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 40
Provided by: AnaPa1
Category:

less

Transcript and Presenter's Notes

Title: International Accesses to a Digital Library of ETDs


1
ETD 2005
International Accesses to a Digital Library of
ETDs
2
ETD 2005
Ana Pavani Departamento de Engenharia
Elétrica Pontifícia Universidade Católica do Rio
de Janeiro apavani_at_lambda.ele.puc-rio.br http//w
ww.maxwell.lambda.ele.puc-rio.br/
3
  • Presentation outline
  • Profile of the digital library
  • Generation of data
  • Combination and anaysis of data interesting
    results
  • Next steps

4
  • Profile of the digital library
  • Beginning of the collection 2nd semester of
    1995
  • Items to start the collection courseware
    (texts, exercises, technical manuals, tests, etc.)

5
  • The digital library is part of a system that
  • Is a LMS (Learning Management System)
  • Has administrative functions that allow data
    exchange with the universitys administrative
    system
  • Is linked (2 directions) to CNPqs Lattes
    Platform (curricula database with more than 595 K
    CV)
  • Allows the control of series collections
  • Is multilingual and has interfaces in 3 languages

6
  • Evolution of the collection
  • Administrative documents
  • Preprints, published papers online articles
  • Interactive courseware
  • ETDs (2000)
  • Online journals (2003)
  • Senior projects (2003)
  • Online bulletins distributed through mailing
    lists, archived and published automatically
    (2004)
  • Books (Oct. 2005)

7
  • Numbers of titles in the collection
  • Courseware (many types) 2,700
  • Administrative documents 33
  • Technical documents 94
  • ETDs 1873 (PUC-Rio) 31 (UNICAP)
  • Preprints, published papers online articles
    280
  • Senior projects 305
  • Online journals 3 ( 1 in Oct. 2005 1 in Dec.
    2005)
  • Online bulletins 2
  • Books 1 (to be published in Oct. 2005)
  • Total number of digital objects (DOs) 16,400

8
  • Technological characteristics
  • Machine IBM RS/6000
  • Operating system IBM AIX
  • Web server Apache
  • DBMS IBM DB2
  • Apache log contains info on accesses to ALL
    digital contents on the system, besides all
    transaction that users perform (clicking buttons,
    reading posts, reading help pages, etc.) data
    on transactions with contents must be extracted
    from the server log to generate the numbers to be
    analyzed

9
  • Generation of data
  • Data have 2 different natures production and
    accesses
  • Production data come from functions of the system
    that are not related to the Apache server but
    only to the DB

example
10
() PUC-Rio started requiring ETDs in Aug.
2002 () UNICAP does not require ETDs.
11
  • Access data are obtained from both the Apache
    Server log and the DB
  • Logs are mined (according to the following
    definitions) and the results are stored on the DB
  • Mined data are combined with production data
    (metadata) already in the database (types of
    contents, authors, programs, areas of knowledge,
    dates, countries, etc.) to yield results

12
  • Definitions for mining the log
  • When access statistics came into discussion, it
    was necessary to define how data should be mined
    from the log and how it should be combined
    afterwards
  • The definitions follow (M) mining definitions
    and (C) combining definitions

13
  • (M) Visits and complete visits
  • An ETD can have one or many digital objects. The
    number of visits is the sum of all accesses to
    all digital objects in a given month. A complete
    visit is a set of visits to all digital objects
    from a country in a given month.

14
  • (M) Country x IP address
  • The decision to use the country and not the IP
    address to establish a visit was based on the
    fact that the visits to an ETD can be made at
    different times (and reconnecting may assign a
    new IP address) and from different locations
    (with fixed IP addresses).

15
(M) Counting visits from the same IP address
Visits from the same IP are counted
individually due to the fact that networks with
many machines can be identified by the IP address
of a firewall.
16
(M) Counting visits to restricted digital objects
Some ETDs are totally or partially restricted
approximately 30 have some type of permanent
or temporary restriction. Metadata, abstracts
included, are publicly available for all of them.
It was decided that attempts followed by denials
of access would be counted as accesses.
!! This is informed in the help pages of the
system it is suggested that authors should
consider allowing their contents to become public
if many attempts occur.
17
(C) Lines to mine Since the interest was on
access to digital objects, the decision was to
get the lines with extensions .dcr, .doc, .htm,
.pdf, etc. All possible extensions on the
database are considered, as long as the
corresponding item is cataloged on the digital
library, so that an eventual static html system
page is not counted.
18
  • Observations
  • Statistics were planned on a monthly basis. The
    model treats data as sequences of points with
    discrete-time intervals of a month. Past months
    data are unchanged and current month is updated
    according to the Update definition.
  • IPs are resolved using a plug-in called GeoIP
    Free that is available with AWStats.

19
(C) Information to get from a log line The
month and the year are extracted along with
identification of the digital object and the
country of the IP address that accessed the
digital object.
20
(C) Update of the DB The lines are read every
hour at the full hours (0000, 0100, etc.)
incremental lines are mined. Accesses are summed
for each month-year-DO-country, so the table is
not very big in the first 6 months of 2005 the
average number of lines per month was 10,000.
21
(C) When to start computing The log of the
Apache Server started being saved on Jun 01,
2004. So, either this date was used or a later
one, for example Jan 01, 2005. The decision was
to use all available monthly logs.
When the process started, some days of offline
processing were required. Afterwards update
became automatic according to the Update
definition.
22
  • Observations
  • Maybe these were not the best definitions we
    are willing to discuss alternatives!!
  • The (original) logs are stored and saved offline
    in case some change in the minig strategy is
    decided (we have not sunk the ships!!).

23
  • Definitions for computing statistics
  • By author
  • Visited ETDs by year, month and country
  • Visited ETDs by country, month and year
  • 25 most visited ETDs (on the system PUC-Rio
    UNICAP)
  • 20 most visited ETDs by institution

24
  • 10 most visited ETDs by graduate program
  • Visited ETDs by institution, program, year and
    month

25
Initial Results
26
Access to ETDs is increasing (Sep 28, 2005)
ETDs may/sep ? 13 accesses may/sep ?
54.6
27
Number of total visits is increasing (Sep 28,
2005)
ETDs may/sep ? 13 accesses may/sep ?
54.6
28
Accumulated average total visits is increasing
(Sep 28, 2005)
ETDs may/sep ? 13 accesses may/sep ?
54.6
29
Brazil accounts for 55 of the accesses since Jun
01, 2004 (Sep 28, 2005)
But Brazil pt speaking es speaking
75 Brazil US pt speaking es speaking 87
30
On Jun 15, 2007 the numbers of ETDs in Iberian
languages on the NDLTD DB were
Brazilian ETDs were 83 of all ETDs in Iberian
languages (total number 13,369)
31
Percentage of visits from Brazil is decreasing
(Sep 28, 2005)
32
Accumulated percentage averages of visits from
Brazil (Sep 28, 2005)
33
Total accesses top 10 countries (Sep 28, 2005)
identified countries 122 unindentified
countries satellite access host
34
  • Some interesting results
  • Some ETDs are permanent best sellers
  • They are on specific subjects (examples a
    specific phylosopher and history of modern
    architecture in Brazil)
  • They are linked from sites on the subjects
    (examples the first from the US Brazil and the
    second from Germany)
  • They are accessed from different countries
  • Some topics are permanent best sellers
    (example energy)

35
  • Some ETDs are temporary best sellers this
    seems to happen when they are displayed at the
    last published ETDs functions (system and
    graduate program)
  • Some graduate programs are permanent best
    sellers
  • They research topics that are very specific of
    the country (examples education and history of
    culture)
  • They are indexed in other sites and/or digital
    libraries (examples Universia in Spain for
    social sciences and humanities)
  • They are accessed from different countries

36
The 25 most visited ETDs have a large number of
visits
No average is lower than 100 visits per month
37
  • Next steps
  • Find out how readers got to ETDs (BDTD, NDLTD,
    SCIRUS, etc.) an online survey is planned
  • Interview faculty to check if some ETDs are
    recommended reading in courses
  • Gather more data and analyze in a more
    scientific manner (must find a student!!)

38
  • Develop additional functions comparing accesses
    with production
  • Extend to other digital contents (at the moment
    only ETDs and online journals have access
    statistics)

39
Thank you! Muito obrigada!
Write a Comment
User Comments (0)
About PowerShow.com