Prepared by: Stephen Edmonds December 2004 - PowerPoint PPT Presentation

About This Presentation
Title:

Prepared by: Stephen Edmonds December 2004

Description:

A searchable web based directory of research publications and researchers ... number of sources such as the payroll system or the internal telephone directory. ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 39
Provided by: popc
Category:

less

Transcript and Presenter's Notes

Title: Prepared by: Stephen Edmonds December 2004


1
Prepared byStephen EdmondsDecember 2004
  • Developing the Monash Research Directory

2
What is it?
  • A searchable web based directory of research
    publications and researchers at Monash
    University.
  • Developed using perl and open source modules.

3
Search form
4
Author search results
5
Publication search results
6
Author details
7
Publication details
8
Why?
  • Each year the research activities at Monash
    University produce a significant amount of output
    in the form of
  • Journal articles
  • Books
  • Conference papers
  • and more
  • Unfortunately only a limited number of people are
    aware of the full range of output.

9
Why?
  • A publicly available directory could potentially
    raise the profile of research activities at the
    University.
  • Additionally the Monash Research Directory would
    be the first of a series of research oriented
    tools for
  • Researchers at Monash
  • People interested in research

10
Initial requirements
  • Publicly available through the Monash website.
  • Restricted access interface through the my.monash
    staff and student portal.
  • Utilise existing information from systems around
    the University.
  • Present the most up to date information possible.
  • Only display research output generated by current
    staff members of the University.

11
Research Master
  • A commercial product used to track research
    activities around the University.
  • Information regarding the research activities is
    entered by representatives from each faculty
    within the University.
  • Within Research Master one module contains
    details of the research output.

12
Research Master
  • and another contains details of the authors of
    the research output.
  • 30,000 publications covering 8 years.
  • 25,000 distinct authors.
  • The information is stored in an Oracle database
    for use with a client application.

13
Monash Directory Service
  • Contains an entry for each current student or
    member of staff of the University.
  • Automatically updated from a number of sources
    such as the payroll system or the internal
    telephone directory.
  • Staff members have the ability to enter
    additional information into their entry such as
  • Research interests
  • Professional associations
  • Biography
  • Photograph (as a JPEG)
  • A standard LDAP service.

14
Public Monash website
  • Farm of linux boxes running Apache web servers
  • Perl CGI is one of many technologies available.

15
my.monash portal
  • A integrated view of the University for both
    staff members and students.
  • Uses HTMLMason, a dynamic web site authoring
    system written in perl.

16
The problem so far
  • Two backend systems
  • Research Master (Oracle database)
  • Monash Directory Service (LDAP service)
  • Two frontend environments
  • my.monash portal (perl through HTMLMason)
  • Public website (perl CGI)

17
The problem so far
  • Some kind of glue is required between these four
    systems

18
And the answer was
  • A module or set of modules.
  • Written in perl.

19
But how?
  • The preliminary analysis showed that an author
  • Has a variety of details.
  • Relates to one or more publications.
  • While a publication
  • Has a variety of details.
  • Relates to one or more authors.

20
But how?
  • This data can be represented by a simple
    hierarchy

21
But how?
  • This complete encapsulation of business logic
    within classes means that the usage code is
    simply

my research MonashResearchDirectory-new(
... ) if (research-search('name' john
smith)) foreach my author
(research-authors()) print
author-name(), "\n" foreach my
publication (author-publications())
print publication-title(), "\n"

22
Publication data issues
  • The data contained within the Monash Directory
    Service is clearly defined.
  • However the data stored in Research Master for a
    publication can vary from category to category
  • and even from year to year.

23
Publication data issues
24
Publication data issues
  • A solution was to retrieve the field labels from
    the database and then generalise the access
    methods on the publication class

foreach my field (publication-fields())
my (label, value) publication-field(field)
if (value) print name,
"\t", value, "\n"
25
Internals
  • As already stated the act of encapsulating as
    much business logic as possible in the classes
    means that the CGI script and HTMLMason
    component aspects become trivial.
  • At first it appeared to be the opposite case for
    the internals of the classes
  • however it fortunately did not become as
    complicated as feared.

26
Publication title search
  • Walkthrough of some of the interesting part of
    the publication title search process when the
    following call is made

research-search('name' john smith)
27
Querying Research Master
  • Simplified by being able to query the backend
    Oracle database directly.
  • A compromise between performance and maintenance
    resulted in a single SQL query.
  • Unfortunately information is now duplicated in
    the results

28
Querying Research Master
  • which can be selectively ignored during
    processing

while (my row sth-fetchrow_hashref('NAME_lc')
) my author
self-_find_or_create_author(row) my
publication self-_find_or_create_pub
lication(row) author-add_publication(pub
lication) publication-add_author(author)

29
Querying the Monash Directory Service
  • A filter is constructed from the results obtained
    by querying Research Master
  • Which is then used to query the Monash Directory
    Service using NetLDAP

my _at_numbers map _-employeenumber()
() self-authors() my ldap_filter
q( . join q, map qq(employeenumber_)
_at_numbers . q)
30
Correlating results
  • Results from the Monash Directory Service are
    then attached to the appropriate author object

foreach my author (self-authors()) my
entry self-_get_ldap_entry(author-
employeenumber()) author-set_ldap_entry(e
ntry) if entry
31
Correlating results
  • The publications which do not have at least one
    current staff member of the University as an
    author are now removed from the results

foreach my publication (self-publications())
unless (grep _-is_monash()
publication-authors())
self-destroy_publication(publication)
32
Correlating results
  • Finally all the authors without any publications
    are removed from the results

foreach my author (self-authors())
unless (author-publications())
self-remove_author(author)
33
Results
  • At this point the object represents sufficient
    objects to enable the search results to be
    displayed

research-search('name' john
smith) foreach my author (research-authors()
) print author-name(), "\n" foreach
my publication (author-publications())
print publication-title(), "\n"
34
Limitations
  • At no point do the author or publication objects
    in existence represent the entire Research
    Directory.
  • Which means that a fresh search is required for
    the various pages in the interface.
  • Not such of an issue due to the stateless nature
    of the web.

35
Complicated scientific formula in titles
  • Plain text
  • 2
  • Rich text formatted
  • \rtf1\ansi\deff0\fonttbl\f0\fswiss
    Arial\f1\fnil\fcharset2 Symbol
    \viewkind4\uc1\pard\lang1033\f0\fs24 2 \fs18
    Unprecedented \f1\fs24 m-h\up5\fs14 2\up0\fs24
    h\up5\fs14 2\up0\f0\fs18 - pyrazolate
    coordination in \Yb(\f1\fs24 h\up5\f0\fs14
    2\up0\fs18 - \f1\fs24\'a6\f0\fs18 Bu\dn5\fs14
    2\up0\fs18 pz)(\f1\fs24 m\f0\fs18 -\f1\fs24
    h\up5\f0\fs14 2\up0\fs18 \f1\fs24 h\up5\f0\fs14
    2\up0\fs18 -\f1\fs24\'a6\f0\fs18 Bu\dn5\fs14
    2\up0\fs18 pz)(thf)\\dn5\fs14 2\up0\fs18 \par
  • Correctly rendered
  • 2 Unprecedented µ-?2?2- pyrazolate coordination
    in Yb(?2- Bu2pz)(µ-?2?2-Bu2pz)(thf)2

36
Complicated scientific formula in titles
  • Unfortunately this cannot be reliably rendered
    using HTML.
  • The perl module RTFHTMLConverter is able to
    convert the RTF above to
  • 2 Unprecedented m-h2h2- pyrazolate coordination
    in Yb(h2 - Bu2pz)(m-h2h2 -Bu2pz)(thf)2
  • While not perfect it is a significant improvement
    and deemed satisfactory.

37
Conclusion
  • A practical example of how perl can be used to
    draw information from two sources, one a
    commercial application, and present the
    information in two similar but disparate
    environments.
  • All by using two widely used modules
  • DBI (and DBDOracle)
  • NetLDAP
  • And a third publicly available module
  • RTFHTMLConverter

38
Thank you
  • Any questions?
  • The publicly available version of the Monash
    Research Directory is available at
  • http//monash.edu/research/directory/
Write a Comment
User Comments (0)
About PowerShow.com