Title: EPrints Training Course
1Advanced CustomisationScripting EPrints
- EPrints Training Course
- Southampton, May 3-4th 2007
2Taking Control the EPrints API
- EPrints configuration files offer many
opportunities for customisation and control - branding, workflow, controlled vocabs, authority
lists, deposit types, metadata... - EPrints API offers many more opportunities
- the more perl-intensive configuration files
- e.g. eprint_render.pl
- and beyond..
- plugins
- command-line tools
3Roadmap
- Core API
- manipulating your data
- accessing data collections
- searching your data
- Scripting techniques
- essentials putting it all together
- writing export plugins
- writing screen plugins
- writing command-line tools
- writing CGI scripts
4Part 1 Core API
5About This Part of the Talk
- Light on syntax
- object-gtfunction(arg1, arg2)
- Incomplete
- Designed to
- give you a feel for the EPrints data model
- introduce you to the most significant (and
useful!) objects - how they relate to one another
- their most common methods
- act as a jumping off point for exploring
6Finding Documentation
- EPrints modules have embedded documentation
- Extract it using perldoc
- perldoc perl_lib/EPrints/Search.pm
7Core API Manipulating Your Data
8Data Model 3 Core Objects
- EPrint
- single deposit in the repository
- Document
- single document attached to an EPrint
- User
- single registered user
EPrint
Document
User
9Data Model Core Relationships
- 1 User owns (deposits) many EPrints
- 1 EPrint has many documents attached to it
- 1 Document may contain many files, but these are
not part of the API - e.g. PDF 1 file
- e.g. HTML images many files
1
1
EPrint
Document
User
10Data Model DataObj
- All data objects inherit from DataObj
- Provides common interface to data
DataObj
1
1
EPrint
Document
User
11Accessing Data DataObj interface
- get_id()
- get_url()
- EPrint abstract page
- User user summary page
- Document document download
- get_type()
- EPrint article, book, thesis...
- User user, editor, admin
- Document pdf, html, word...
12Manipulating Data DataObj Interface
- get_value(fieldname)
- get the value of the named data field
- eprint-gtget_value( title )
- set_value(fieldname, value)
- set the value of the named field
- doc-gtset_value( format, pdf )
- is_set(fieldname)
- true if the named field has a value
- user-gtis_set( email )
13Manipulating Data DataObj Interface (2)
- commit()
- write any changes made to the object through to
the database - e.g. after using set_value
- remove()
- erase the object from the database
- also removes any sub-objects and files
- e.g. eprint-gtremove
- removes EPrint and associated Documents from DB
- removes Document files from filesystem
14Getting Hold of Existing Data Objects
- new(session, id)
- returns data object for an existing record
- EPrintsDataObjEPrint-gtnew(session, 1)
- EPrintsDataObjUser-gtnew(session, 1)
- EPrintsDataObjDocument-gtnew(session, 1)
- User object has extra options
- user_with_email(session, email)
- user_with_username(session, username)
15Creating New Data Objects
- Slightly different for each data object
- EPrint
- create(session, dataset, data)
- User
- create(session, user_type)
- Document
- create(session, eprint)
16Specific Methods
- Each data object also has specific methods for
manipulating their data
17EPrint Methods
- get_user()
- get a User object representing the user to whom
the EPrint belongs - get_all_documents()
- get a list of all the Document objects associated
with the EPrint - generate_static()
- generate the static abstract page for the eprint
- useful when youve modified the eprint values!
- in a multi-language archive this will generate a
page in each language
18User Methods
- get_eprints(dataset)
- get a list of EPrints owned by the user
- mail(subject, message)
- send an email to the user
19Document Methods
- get_eprint()
- get the EPrint object the document is associated
with - local_path()
- get the full path of the directory where the
document is stored in the filesystem - files()
- get a list of (filename, file size) pairs
20Document Methods Main File
- get_main()
- set_main(main_file)
- get/set the main file for the document
- this is the file that gets linked to
- in majority of cases, Document will have 1 file
- e.g. PDF
- but there may be some cases where a Document has
many file - e.g. HTML document .html files, images,
stylesheets - set main to top level index.html
21Document Methods Adding Files
- add_file(file, filename)
- upload(filehandle, filename)
- both add a file to the document
- add_file uses full path to file
- upload uses file handle
- in both cases the document will be named filename
22Document Methods Adding Files (2)
- upload_url(url)
- grab file(s) from given URL
- in the case of HTML, only relative links will be
followed - add_archive(file, format)
- add files from a .zip or .tar.gz file
- remove_file(filename)
- remove the named file
23Other Data Objects
- Subject
- a node in the subjects tree
- SavedSearch
- a saved search associated with a User
- History
- an event that took place on another data object
- e.g. change to eprint metadata
- Access
- a Web access to an object
- e.g. document download
- Request
- a request for a (restricted) document
- Explore these using perldoc
24Core APIAccessing Data Collections
25Accessing Data Collections
- Weve looked at individual data objects
- but a repository holds many eprints and documents
and has many registered users - 2 key ways to manipulate data objects
collectively - built-in datasets
- large fixed sets of data objects
- by searching the repository
- set of data objects matching specific criteria
26Datasets
- All data objects in the repository are part of a
collection called a dataset - 3 core datasets
- eprint
- all eprints
- user
- all registered users
- document
- all documents
27Datasets (2)
- Also 4 subsets within eprint dataset which
collect eprints in same state - archive
- all eprints in live archive
- inbox
- all eprints which users are still working on
- buffer
- all eprints submitted for editorial review
- deletion
- all eprints retired from live archive
28The DataSet Object
- Gives access to all the data objects in a
particular dataset - Also
- tells us which data fields apply to that dataset
- recall get_value and set_value methods
- a repositorys metadata is configurable so this
gives us a way to find out - which fields are available in a particular
repository - the properties of individual fields
29Accessing DataSets
- count(session)
- get the number of items in the dataset
- get_item_ids(session)
- get the IDs of the objects in the dataset
- map(function, args)
- apply function to each object in the dataset
- function is called with args
- (session, dataset, dataobj, args)
30Fields in a DataSet
- has_field(fieldname)
- true if the dataset has a field of that name
- get_field(fieldname)
- get a MetaField object describing the named field
- get_fields()
- get list of MetaField objects describing all
fields in the dataset
31Datasets and MetaFields
- A MetaField
- is a single field in a dataset
- tells us properties of the field
- get_property(name)
- set_property(name, value)
- e.g. name, type, input_rows, maxlength,
multiple... - but not the field value
- the value is specific to the individual data
object - e.g. eprint-gtget_value(title)
32Core APISearching the Repository
33Searching the Repository
- The Search object allows us to search datasets
for data objects matching specific criteria - Provides access to the results
34Starting a New Search
- new(options)
- create a new search expression
- must specify which dataset to search in
- search new Search(
- session gt session,
- dataset gt dataset,
- custom_order gt title )
- many other options can be specified
- explore with perldoc
35Adding Search Fields
- add_field(metafield, value)
- add a new search field with the given value
(search text) to the search expression - add as many fields as you like to the search
criteria
36Adding Search Fields Example
- Example full text search
- search-gtadd_field(
- dataset-gtget_field(title),
- routing,
- IN,
- ALL )
37Adding Search Fields Example (2)
- Example full text search which matches word in
title or abstract - search-gtadd_field(
- dataset-gtget_field(title),
- dataset-gtget_field(abstract) ,
- routing,
- IN,
- ALL )
38Adding Search Fields
- Example date search
- search-gtadd_field(
- dataset-gtget_field(date),
- 2000-2004,
- EQ,
- ALL )
39Processing Search Results
- Carry out a search using
- list search-gtperform_search()
- Returns a List object which gives access to
search results
40The List Object
- Any ordered collection of data objects
- usually the results of a search
41Processing Lists
- count()
- get the number of results
- get_ids(offset, count)
- get_records(offset, count)
- get an array if data objects, or just their ids
- optionally specify a range using count and offset
- map(function, args)
- apply the function to each data object in the list
42Manipulating Lists
- newlist list-gtreorder( neworder )
- newlist list-gtunion( list2 )
- newlist list-gtintersect( list2 )
- newlist list-gtremainder( list2 )
43Part 2 Scripting Techniques
44Roadmap
- Core API
- manipulating your data
- accessing data collections
- searching your data
- Scripting techniques
- essentials putting it all together
- writing export plugins
- writing screen plugins
- writing command-line tools
- writing CGI scripts
45Scripting TechniquesEssentials
46Putting it all together
- Two essential objects
- Session
- connects to the repository
- many useful methods
- Repository
- provides access to
- datasets
- session-gtget_repository-gtget_dataset(archive)
- configuration settings
- Explore using perldoc
47Scripting for the Web
- API provides lots of methods to help you build
Web pages and display (render) data - these methods return (X)HTML
- but not strings!
- XML DOM objects
- DocumentFragment, Element, TextNode...
- Build pages from these nodes
- node1-gtappendChild(node2)
- why? its easier to manipulate a tree than to
manipulate a large string
48XML DOM vs. Strings
- p make_element(p)
- text make_text( Hello World )
- p-gtappendChild(text)
- p ltpgt
- p Hello World
- p lt/pgt
- ltpgtHello Worldlt/pgt
Can manipulate tree to add extra text, elements
etc.
p
Difficult to make changes to the string would
need to find the right position first
Hello World
49Render Methods Session
- Session provides many useful Web page building
blocks - make_doc_fragment()
- create an empty XHTML document
- fill it with things!
- make_text(text)
- create an XML TextNode
- make_element(name, attrs)
- create an XHTML Element
- make_element("p", align gt "right")
- ltp alignright /gt
50Render Methods Session (2)
- render_link(uri, target)
- create an XHTML link
- link session-gt render_link(http//www.eprints.o
rg) - text session-gtmake_text(EPrints")
- link-gtappendChild(text)
- lta hrefhttp//www.eprints.orggt
- EPrintslt/agt
51Render Methods Session (3)
- html_phrase(phraseid, inserts)
- render an XHTML phrase in the current language
- looks up phraseid from the phrases files
- inserts can be used to add extra information to
the phrase - must be a corresponding ltepcpingt in the phrase
- lteppphrasegtNumber of results ltepcpin
namecount/gtlt/eppphrasegt
52Render Methods Session (4)
- Many methods for building input forms, including
- render_form(method, dest)
- render_option_list(params)
- render_hidden_field(name, value)
- render_upload_field(name)
- render_action_buttons(buttons)
- ...
53Rendering Methods Data Objects
- render_citation(style)
- render_citation_link(style)
- create an XHTML citation for the object
- if style is set then use the named citation style
- render_value(fieldname)
- get an XHTML fragment containing the rendered
version of the value of the named field - in the current language
54Rendering Methods MetaFields
- render_name(session)
- render_help(session)
- get an XHTML fragment containing the
name/description of the field in the current
language
55Rendering Methods Searches
- render_description()
- get some XHTML describing the search parameters
- render_search_form(help)
- render an input form for the search
- if help is true then this also renders the help
for each search field in current language
56Getting User Input (CGI parameters)
- Session object also provides useful methods for
getting user input - e.g. from an input form
- have_parameters
- true if parameters (POST or GET) are available
- param(name)
- get the value of a named parameter
57Scripting TechniquesWriting Export Plugins
58Plugin Framework
- EPrints provides a framework for plugins
- registration of plugin capabilities
- standard interface which plugins need to
implement - Several types of plugin interface provided
- import and export
- get data in and out of the repository
- interface screens
- add new tools and reports to UI
- input components
- add new ways for users to enter data
59Plugin Framework (2)
- Not just a plugin framework for 3rd party
extensions! - Used extensively by EPrints itself
- majority of (dynamic) Web pages you see are
screen plugins - search, deposit workflow, editorial review, item
control page, user profile, saved searches,
adminstration tools... - all import/export options implemented as plugins
- all input components in deposit workflow are
plugins - subject browser input, file upload...
60Plugin Framework (3)
- EPrints is really a generic plugin framework
- with a set of plugins that implement the
functions of a repository - Gives plugin developers many examples to work
from - find a plugin that does something similar to what
you want to achieve and explore how it works
Plugins
Plugin Framework
Backend (data model)
61Writing Export Plugins
- Typically a standalone Perl module in
- perl_lib/EPrints/Plugin/Export/
- Writing export plugins
- register plugin
- define how to convert data objects to an
output/interchange format
62Export Plugin Registration
- Register
- name
- the name of the plugin
- visible
- who can use it
- accept
- what the plugin can convert
- lists of data objects or single data objects (or
both) - type of record (eprint, user...)
- suffix and mimetype
- file extension and MIME type of format plugin
converts to
63Registration Example BibTeX
- self-gtname "BibTeX"
- self-gtaccept 'list/eprint',
'dataobj/eprint' - self-gtvisible "all"
- self-gtsuffix ".bib"
- self-gtmimetype "text/plain"
- Converts lists or single EPrint objects
- Available to all users
- Produces plain text file with .bib extension
64Registration Example FOAF
- self-gtname "FOAF Export"
- self-gtaccept 'dataobj/user'
- self-gtvisible "all"
- self-gtsuffix ".rdf"
- self-gtmimetype "text/xml"
- Converts single User objects
- Available to all users
- Produces XML file with .rdf extension
65Registration Example XML
- self-gtname "EP3 XML"
- self-gtaccept 'list/', 'dataobj/'
- self-gtvisible "all"
- self-gtsuffix ".xml"
- self-gtmimetype "text/xml"
- Converts any data object
- Available to all users
- Produces XML file with .xml extension
66Export Plugin Conversion
- For a straight conversion plugin, this usually
includes - mapping data objects to output/interchange format
- serialising the output/interchange format
- e.g. EndNote conversion section
- data-gtK dataobj-gtget_value( "keywords" )
- data-gtT dataobj-gtget_value( "title" )
- data-gtU dataobj-gtget_url
67Export Plugin Conversion (2)
- But export plugins arent limited to straight
conversions! - Explore
- Google Maps export plugin
- plot location data on map
- http//files.eprints.org/224/
- Timeline export plugin
- plot date data on timeline
- http//files.eprints.org/225/
68Export Plugin Template
- Register
- subclass EPrintsPluginExport
- inherits all the mechanics so you dont have to
worry about them - could subclass existing plugin e.g. XML, Feed
- define name, accept, visible etc.
- in constructor new() of plugin module
- Conversion
- define output_dataobj function
- will be called by plugin subsystem for every data
object that needs to be converted
69Writing Import Plugins
- Typically a standalone Perl module in
- perl_lib/EPrints/Plugin/Import/
- Reading input can be harder than writing output
- need to detect and handle errors in input
- many existing libraries available for parsing a
wide variety of file formats - Writing import plugins
- register
- define how to convert input/interchange format
into data objects - reverse of export
70Scripting TechniquesWriting Screen Plugins
71Plugins Writing Screen Plugins
- One or more Perl modules in
- perl_lib/EPrints/Plugin/Screen/
- may be bundled with phrases, config files,
stylesheets etc. - Writing screen plugins
- register
- where it appears in UI
- who can use it
- define functionality
72Screen Plugin Registration
- Register
- actions
- the actions the plugin can carry out (if any)
- appears
- where abouts in the interface the plugin and/or
actions will appear - named list
- position in list
- will be displayed as link, button or tab
73Registration Example Manage Deposits
- self-gtappears
- place gt "key_tools", position gt 100,
-
key_tools list
74Registration Example EPrint Details
- self-gtappears
- place gt "eprint_view_tabs", position gt 100,
, -
eprint_view_tabs list (each tab is a single
screen plugin)
75Registration Example New Item
- self-gtappears
- place gt item_tools", position gt 100,
- action gt create, ,
-
item_tools list (create action will be invoked
when button pressed)
76Screen Plugin Define Functionality
- 3 types of screen plugin
- Render only
- define how to produce output display
- examples AdminStatus, EPrintDetails
- Action only (no output display)
- define how to carry out action(s)
- examples AdminIndexerControl, EPrintMove,
EPrintNewVersion - Combined (interactive)
- define how to produce output/carry out action(s)
- examples EPrintRejectWithEmail, EPrintEdit,
UserEdit
77Screen Plugins Displaying Messages
- Action plugins produce no output display but can
still display messages to user - add_message(type, message)
- register a message that will be displayed to the
user on the next screen they see - type can be
- error
- warning
- message (informational)
78Screen Plugin Template Render Only
- Register
- subclass EPrintsPluginScreen
- inherits all the mechanics so you dont have to
worry about them - could subclass existing plugin e.g. EPrint, User
- define where plugin appears
- in constructor new() of plugin module
- define who can view plugin (if required)
- can_be_viewed function
- e.g. check user privileges
- Define functionality
- define render function
- produce output display using API render_ methods
79Screen Plugin Template Action Only
- Register
- subclass EPrintsPluginScreen
- define actions supported
- define where actions appear
- define who can use actions
- allow_ACTION function(s)
- Define functionality
- define action_ACTION function(s)
- carry out the action
- use add_message to show result/error
- redirect to a different screen when done
80Screen Plugin Template Combined
- render function usually displays links/buttons
which invoke the plugins actions - e.g. EPrintRemove
- registers remove and cancel actions
- render function displays Are you sure? screen
- OK/Cancel buttons invoke remove/cancel actions
81Scripting TechniquesWriting Command Line Scripts
82Command Line Scripts
- Usually stored in bin directory
- Add batch/offline processes to your repository
- e.g. duplicate detection compare each record to
every other record - e.g. file integrity - check stored MD5 sums
against actual MD5 sums
83Connecting to the Repository
- Command line scripts (and CGI scripts) must
explicitly connect to the repository by creating
a new Session object - new(mode, repositoryid)
- set mode to 1 for command line scripts
- set mode to 0 for CGI scripts
- And disconnect from the repository when complete
- terminate()
- performs necessary cleanup
84Using Render Functions
- XHTML is good for building Web pages
- but not so good for command line output!
- often no string equivalent
- use tree_to_utf8()
- extracts a string from the result of any
rendering method - tree_to_utf8(
- eprint-gtrender_citation)
85Search and Modify Template
- Common pattern for command line tools
- Connect to repository
- Get desired dataset
- Search dataset
- Apply function to matching results
- modify result
- commit changes
- Disconnect from repository
86Example lift_embargos
- Removes access restrictions on documents with
expired embargos - Connect to repository
- Get document dataset
- Search dataset
- embargo date field earlier than todays date
- Apply function to matching results
- remove access restriction
- clear embargo date
- commit changes
- Disconnect from repository
87Scripting TechniquesWriting CGI Scripts
88CGI Scripts
- Usually stored in cgi directory
- Largely superceded by screen plugins but can
still be used to add e.g. custom reports to your
repository - Similar template to command-line scripts but
build Web page output using API render_ methods
89Building Pages
- In Screen plugins, mechanics of sending Web pages
to the users browser are handled by the plugin
subsystem - need to do this yourself with CGI scripts
- methods provided by the Session object
- build_page(title, body)
- wraps your XHTML document in the archive template
- send_page()
- flatten page and send it to the user
90Summary
- Use the core API to manipulate data in the API
- individual data objects
- EPrint, Document, User
- sets of data objects
- DataSet, List, Search
- Wrap this in a plugin or script
- Session, Repository
- Web output using render_ methods
- templates
- Next hands-on exercises designed to get you
started with these techniques