Title: Prepared by: Stephen Edmonds December 2004
1Prepared byStephen EdmondsDecember 2004
- Developing the Monash Research Directory
2What is it?
- A searchable web based directory of research
publications and researchers at Monash
University. - Developed using perl and open source modules.
3Search form
4Author search results
5Publication search results
6Author details
7Publication details
8Why?
- Each year the research activities at Monash
University produce a significant amount of output
in the form of - Journal articles
- Books
- Conference papers
- and more
- Unfortunately only a limited number of people are
aware of the full range of output.
9Why?
- A publicly available directory could potentially
raise the profile of research activities at the
University. - Additionally the Monash Research Directory would
be the first of a series of research oriented
tools for - Researchers at Monash
- People interested in research
10Initial requirements
- Publicly available through the Monash website.
- Restricted access interface through the my.monash
staff and student portal. - Utilise existing information from systems around
the University. - Present the most up to date information possible.
- Only display research output generated by current
staff members of the University.
11Research Master
- A commercial product used to track research
activities around the University. - Information regarding the research activities is
entered by representatives from each faculty
within the University. - Within Research Master one module contains
details of the research output.
12Research Master
- and another contains details of the authors of
the research output. - 30,000 publications covering 8 years.
- 25,000 distinct authors.
- The information is stored in an Oracle database
for use with a client application.
13Monash Directory Service
- Contains an entry for each current student or
member of staff of the University. - Automatically updated from a number of sources
such as the payroll system or the internal
telephone directory. - Staff members have the ability to enter
additional information into their entry such as - Research interests
- Professional associations
- Biography
- Photograph (as a JPEG)
- A standard LDAP service.
14Public Monash website
- Farm of linux boxes running Apache web servers
- Perl CGI is one of many technologies available.
15my.monash portal
- A integrated view of the University for both
staff members and students. - Uses HTMLMason, a dynamic web site authoring
system written in perl.
16The problem so far
- Two backend systems
- Research Master (Oracle database)
- Monash Directory Service (LDAP service)
- Two frontend environments
- my.monash portal (perl through HTMLMason)
- Public website (perl CGI)
17The problem so far
- Some kind of glue is required between these four
systems
18And the answer was
- A module or set of modules.
- Written in perl.
19But how?
- The preliminary analysis showed that an author
- Has a variety of details.
- Relates to one or more publications.
- While a publication
- Has a variety of details.
- Relates to one or more authors.
20But how?
- This data can be represented by a simple
hierarchy
21But how?
- This complete encapsulation of business logic
within classes means that the usage code is
simply
my research MonashResearchDirectory-new(
... ) if (research-search('name' john
smith)) foreach my author
(research-authors()) print
author-name(), "\n" foreach my
publication (author-publications())
print publication-title(), "\n"
22Publication data issues
- The data contained within the Monash Directory
Service is clearly defined. - However the data stored in Research Master for a
publication can vary from category to category - and even from year to year.
23Publication data issues
24Publication data issues
- A solution was to retrieve the field labels from
the database and then generalise the access
methods on the publication class
foreach my field (publication-fields())
my (label, value) publication-field(field)
if (value) print name,
"\t", value, "\n"
25Internals
- As already stated the act of encapsulating as
much business logic as possible in the classes
means that the CGI script and HTMLMason
component aspects become trivial. - At first it appeared to be the opposite case for
the internals of the classes - however it fortunately did not become as
complicated as feared.
26Publication title search
- Walkthrough of some of the interesting part of
the publication title search process when the
following call is made
research-search('name' john smith)
27Querying Research Master
- Simplified by being able to query the backend
Oracle database directly. - A compromise between performance and maintenance
resulted in a single SQL query. - Unfortunately information is now duplicated in
the results
28Querying Research Master
- which can be selectively ignored during
processing
while (my row sth-fetchrow_hashref('NAME_lc')
) my author
self-_find_or_create_author(row) my
publication self-_find_or_create_pub
lication(row) author-add_publication(pub
lication) publication-add_author(author)
29Querying the Monash Directory Service
- A filter is constructed from the results obtained
by querying Research Master - Which is then used to query the Monash Directory
Service using NetLDAP
my _at_numbers map _-employeenumber()
() self-authors() my ldap_filter
q( . join q, map qq(employeenumber_)
_at_numbers . q)
30Correlating results
- Results from the Monash Directory Service are
then attached to the appropriate author object
foreach my author (self-authors()) my
entry self-_get_ldap_entry(author-
employeenumber()) author-set_ldap_entry(e
ntry) if entry
31Correlating results
- The publications which do not have at least one
current staff member of the University as an
author are now removed from the results
foreach my publication (self-publications())
unless (grep _-is_monash()
publication-authors())
self-destroy_publication(publication)
32Correlating results
- Finally all the authors without any publications
are removed from the results
foreach my author (self-authors())
unless (author-publications())
self-remove_author(author)
33Results
- At this point the object represents sufficient
objects to enable the search results to be
displayed
research-search('name' john
smith) foreach my author (research-authors()
) print author-name(), "\n" foreach
my publication (author-publications())
print publication-title(), "\n"
34Limitations
- At no point do the author or publication objects
in existence represent the entire Research
Directory. - Which means that a fresh search is required for
the various pages in the interface. - Not such of an issue due to the stateless nature
of the web.
35Complicated scientific formula in titles
- Plain text
- 2
- Rich text formatted
- \rtf1\ansi\deff0\fonttbl\f0\fswiss
Arial\f1\fnil\fcharset2 Symbol
\viewkind4\uc1\pard\lang1033\f0\fs24 2 \fs18
Unprecedented \f1\fs24 m-h\up5\fs14 2\up0\fs24
h\up5\fs14 2\up0\f0\fs18 - pyrazolate
coordination in \Yb(\f1\fs24 h\up5\f0\fs14
2\up0\fs18 - \f1\fs24\'a6\f0\fs18 Bu\dn5\fs14
2\up0\fs18 pz)(\f1\fs24 m\f0\fs18 -\f1\fs24
h\up5\f0\fs14 2\up0\fs18 \f1\fs24 h\up5\f0\fs14
2\up0\fs18 -\f1\fs24\'a6\f0\fs18 Bu\dn5\fs14
2\up0\fs18 pz)(thf)\\dn5\fs14 2\up0\fs18 \par
- Correctly rendered
- 2 Unprecedented µ-?2?2- pyrazolate coordination
in Yb(?2- Bu2pz)(µ-?2?2-Bu2pz)(thf)2
36Complicated scientific formula in titles
- Unfortunately this cannot be reliably rendered
using HTML. - The perl module RTFHTMLConverter is able to
convert the RTF above to - 2 Unprecedented m-h2h2- pyrazolate coordination
in Yb(h2 - Bu2pz)(m-h2h2 -Bu2pz)(thf)2 - While not perfect it is a significant improvement
and deemed satisfactory.
37Conclusion
- A practical example of how perl can be used to
draw information from two sources, one a
commercial application, and present the
information in two similar but disparate
environments. - All by using two widely used modules
- DBI (and DBDOracle)
- NetLDAP
- And a third publicly available module
- RTFHTMLConverter
38Thank you
- Any questions?
- The publicly available version of the Monash
Research Directory is available at - http//monash.edu/research/directory/