EPrints Training Course - PowerPoint PPT Presentation

About This Presentation
Title:

EPrints Training Course

Description:

... modules have embedded documentation. Extract it using ... many documents attached to ... holds many eprints and documents and has many registered users ... – PowerPoint PPT presentation

Number of Views:191
Avg rating:3.0/5.0
Slides: 90
Provided by: epri1
Learn more at: https://www.eprints.org
Category:

less

Transcript and Presenter's Notes

Title: EPrints Training Course


1
Advanced CustomisationScripting EPrints
  • EPrints Training Course
  • Southampton, May 3-4th 2007

2
Taking Control the EPrints API
  • EPrints configuration files offer many
    opportunities for customisation and control
  • branding, workflow, controlled vocabs, authority
    lists, deposit types, metadata...
  • EPrints API offers many more opportunities
  • the more perl-intensive configuration files
  • e.g. eprint_render.pl
  • and beyond..
  • plugins
  • command-line tools

3
Roadmap
  • Core API
  • manipulating your data
  • accessing data collections
  • searching your data
  • Scripting techniques
  • essentials putting it all together
  • writing export plugins
  • writing screen plugins
  • writing command-line tools
  • writing CGI scripts

4
Part 1 Core API
5
About This Part of the Talk
  • Light on syntax
  • object-gtfunction(arg1, arg2)
  • Incomplete
  • Designed to
  • give you a feel for the EPrints data model
  • introduce you to the most significant (and
    useful!) objects
  • how they relate to one another
  • their most common methods
  • act as a jumping off point for exploring

6
Finding Documentation
  • EPrints modules have embedded documentation
  • Extract it using perldoc
  • perldoc perl_lib/EPrints/Search.pm

7
Core API Manipulating Your Data
8
Data Model 3 Core Objects
  • EPrint
  • single deposit in the repository
  • Document
  • single document attached to an EPrint
  • User
  • single registered user

EPrint
Document
User
9
Data Model Core Relationships
  • 1 User owns (deposits) many EPrints
  • 1 EPrint has many documents attached to it
  • 1 Document may contain many files, but these are
    not part of the API
  • e.g. PDF 1 file
  • e.g. HTML images many files

1

1

EPrint
Document
User
10
Data Model DataObj
  • All data objects inherit from DataObj
  • Provides common interface to data

DataObj
1

1

EPrint
Document
User
11
Accessing Data DataObj interface
  • get_id()
  • get_url()
  • EPrint abstract page
  • User user summary page
  • Document document download
  • get_type()
  • EPrint article, book, thesis...
  • User user, editor, admin
  • Document pdf, html, word...

12
Manipulating Data DataObj Interface
  • get_value(fieldname)
  • get the value of the named data field
  • eprint-gtget_value( title )
  • set_value(fieldname, value)
  • set the value of the named field
  • doc-gtset_value( format, pdf )
  • is_set(fieldname)
  • true if the named field has a value
  • user-gtis_set( email )

13
Manipulating Data DataObj Interface (2)
  • commit()
  • write any changes made to the object through to
    the database
  • e.g. after using set_value
  • remove()
  • erase the object from the database
  • also removes any sub-objects and files
  • e.g. eprint-gtremove
  • removes EPrint and associated Documents from DB
  • removes Document files from filesystem

14
Getting Hold of Existing Data Objects
  • new(session, id)
  • returns data object for an existing record
  • EPrintsDataObjEPrint-gtnew(session, 1)
  • EPrintsDataObjUser-gtnew(session, 1)
  • EPrintsDataObjDocument-gtnew(session, 1)
  • User object has extra options
  • user_with_email(session, email)
  • user_with_username(session, username)

15
Creating New Data Objects
  • Slightly different for each data object
  • EPrint
  • create(session, dataset, data)
  • User
  • create(session, user_type)
  • Document
  • create(session, eprint)

16
Specific Methods
  • Each data object also has specific methods for
    manipulating their data

17
EPrint Methods
  • get_user()
  • get a User object representing the user to whom
    the EPrint belongs
  • get_all_documents()
  • get a list of all the Document objects associated
    with the EPrint
  • generate_static()
  • generate the static abstract page for the eprint
  • useful when youve modified the eprint values!
  • in a multi-language archive this will generate a
    page in each language

18
User Methods
  • get_eprints(dataset)
  • get a list of EPrints owned by the user
  • mail(subject, message)
  • send an email to the user

19
Document Methods
  • get_eprint()
  • get the EPrint object the document is associated
    with
  • local_path()
  • get the full path of the directory where the
    document is stored in the filesystem
  • files()
  • get a list of (filename, file size) pairs

20
Document Methods Main File
  • get_main()
  • set_main(main_file)
  • get/set the main file for the document
  • this is the file that gets linked to
  • in majority of cases, Document will have 1 file
  • e.g. PDF
  • but there may be some cases where a Document has
    many file
  • e.g. HTML document .html files, images,
    stylesheets
  • set main to top level index.html

21
Document Methods Adding Files
  • add_file(file, filename)
  • upload(filehandle, filename)
  • both add a file to the document
  • add_file uses full path to file
  • upload uses file handle
  • in both cases the document will be named filename

22
Document Methods Adding Files (2)
  • upload_url(url)
  • grab file(s) from given URL
  • in the case of HTML, only relative links will be
    followed
  • add_archive(file, format)
  • add files from a .zip or .tar.gz file
  • remove_file(filename)
  • remove the named file

23
Other Data Objects
  • Subject
  • a node in the subjects tree
  • SavedSearch
  • a saved search associated with a User
  • History
  • an event that took place on another data object
  • e.g. change to eprint metadata
  • Access
  • a Web access to an object
  • e.g. document download
  • Request
  • a request for a (restricted) document
  • Explore these using perldoc

24
Core APIAccessing Data Collections
25
Accessing Data Collections
  • Weve looked at individual data objects
  • but a repository holds many eprints and documents
    and has many registered users
  • 2 key ways to manipulate data objects
    collectively
  • built-in datasets
  • large fixed sets of data objects
  • by searching the repository
  • set of data objects matching specific criteria

26
Datasets
  • All data objects in the repository are part of a
    collection called a dataset
  • 3 core datasets
  • eprint
  • all eprints
  • user
  • all registered users
  • document
  • all documents

27
Datasets (2)
  • Also 4 subsets within eprint dataset which
    collect eprints in same state
  • archive
  • all eprints in live archive
  • inbox
  • all eprints which users are still working on
  • buffer
  • all eprints submitted for editorial review
  • deletion
  • all eprints retired from live archive

28
The DataSet Object
  • Gives access to all the data objects in a
    particular dataset
  • Also
  • tells us which data fields apply to that dataset
  • recall get_value and set_value methods
  • a repositorys metadata is configurable so this
    gives us a way to find out
  • which fields are available in a particular
    repository
  • the properties of individual fields

29
Accessing DataSets
  • count(session)
  • get the number of items in the dataset
  • get_item_ids(session)
  • get the IDs of the objects in the dataset
  • map(function, args)
  • apply function to each object in the dataset
  • function is called with args
  • (session, dataset, dataobj, args)

30
Fields in a DataSet
  • has_field(fieldname)
  • true if the dataset has a field of that name
  • get_field(fieldname)
  • get a MetaField object describing the named field
  • get_fields()
  • get list of MetaField objects describing all
    fields in the dataset

31
Datasets and MetaFields
  • A MetaField
  • is a single field in a dataset
  • tells us properties of the field
  • get_property(name)
  • set_property(name, value)
  • e.g. name, type, input_rows, maxlength,
    multiple...
  • but not the field value
  • the value is specific to the individual data
    object
  • e.g. eprint-gtget_value(title)

32
Core APISearching the Repository
33
Searching the Repository
  • The Search object allows us to search datasets
    for data objects matching specific criteria
  • Provides access to the results

34
Starting a New Search
  • new(options)
  • create a new search expression
  • must specify which dataset to search in
  • search new Search(
  • session gt session,
  • dataset gt dataset,
  • custom_order gt title )
  • many other options can be specified
  • explore with perldoc

35
Adding Search Fields
  • add_field(metafield, value)
  • add a new search field with the given value
    (search text) to the search expression
  • add as many fields as you like to the search
    criteria

36
Adding Search Fields Example
  • Example full text search
  • search-gtadd_field(
  • dataset-gtget_field(title),
  • routing,
  • IN,
  • ALL )

37
Adding Search Fields Example (2)
  • Example full text search which matches word in
    title or abstract
  • search-gtadd_field(
  • dataset-gtget_field(title),
  • dataset-gtget_field(abstract) ,
  • routing,
  • IN,
  • ALL )

38
Adding Search Fields
  • Example date search
  • search-gtadd_field(
  • dataset-gtget_field(date),
  • 2000-2004,
  • EQ,
  • ALL )

39
Processing Search Results
  • Carry out a search using
  • list search-gtperform_search()
  • Returns a List object which gives access to
    search results

40
The List Object
  • Any ordered collection of data objects
  • usually the results of a search

41
Processing Lists
  • count()
  • get the number of results
  • get_ids(offset, count)
  • get_records(offset, count)
  • get an array if data objects, or just their ids
  • optionally specify a range using count and offset
  • map(function, args)
  • apply the function to each data object in the list

42
Manipulating Lists
  • newlist list-gtreorder( neworder )
  • newlist list-gtunion( list2 )
  • newlist list-gtintersect( list2 )
  • newlist list-gtremainder( list2 )

43
Part 2 Scripting Techniques
44
Roadmap
  • Core API
  • manipulating your data
  • accessing data collections
  • searching your data
  • Scripting techniques
  • essentials putting it all together
  • writing export plugins
  • writing screen plugins
  • writing command-line tools
  • writing CGI scripts

45
Scripting TechniquesEssentials
46
Putting it all together
  • Two essential objects
  • Session
  • connects to the repository
  • many useful methods
  • Repository
  • provides access to
  • datasets
  • session-gtget_repository-gtget_dataset(archive)
  • configuration settings
  • Explore using perldoc

47
Scripting for the Web
  • API provides lots of methods to help you build
    Web pages and display (render) data
  • these methods return (X)HTML
  • but not strings!
  • XML DOM objects
  • DocumentFragment, Element, TextNode...
  • Build pages from these nodes
  • node1-gtappendChild(node2)
  • why? its easier to manipulate a tree than to
    manipulate a large string

48
XML DOM vs. Strings
  • p make_element(p)
  • text make_text( Hello World )
  • p-gtappendChild(text)
  • p ltpgt
  • p Hello World
  • p lt/pgt
  • ltpgtHello Worldlt/pgt

Can manipulate tree to add extra text, elements
etc.
p
Difficult to make changes to the string would
need to find the right position first
Hello World
49
Render Methods Session
  • Session provides many useful Web page building
    blocks
  • make_doc_fragment()
  • create an empty XHTML document
  • fill it with things!
  • make_text(text)
  • create an XML TextNode
  • make_element(name, attrs)
  • create an XHTML Element
  • make_element("p", align gt "right")
  • ltp alignright /gt

50
Render Methods Session (2)
  • render_link(uri, target)
  • create an XHTML link
  • link session-gt render_link(http//www.eprints.o
    rg)
  • text session-gtmake_text(EPrints")
  • link-gtappendChild(text)
  • lta hrefhttp//www.eprints.orggt
  • EPrintslt/agt

51
Render Methods Session (3)
  • html_phrase(phraseid, inserts)
  • render an XHTML phrase in the current language
  • looks up phraseid from the phrases files
  • inserts can be used to add extra information to
    the phrase
  • must be a corresponding ltepcpingt in the phrase
  • lteppphrasegtNumber of results ltepcpin
    namecount/gtlt/eppphrasegt

52
Render Methods Session (4)
  • Many methods for building input forms, including
  • render_form(method, dest)
  • render_option_list(params)
  • render_hidden_field(name, value)
  • render_upload_field(name)
  • render_action_buttons(buttons)
  • ...

53
Rendering Methods Data Objects
  • render_citation(style)
  • render_citation_link(style)
  • create an XHTML citation for the object
  • if style is set then use the named citation style
  • render_value(fieldname)
  • get an XHTML fragment containing the rendered
    version of the value of the named field
  • in the current language

54
Rendering Methods MetaFields
  • render_name(session)
  • render_help(session)
  • get an XHTML fragment containing the
    name/description of the field in the current
    language

55
Rendering Methods Searches
  • render_description()
  • get some XHTML describing the search parameters
  • render_search_form(help)
  • render an input form for the search
  • if help is true then this also renders the help
    for each search field in current language

56
Getting User Input (CGI parameters)
  • Session object also provides useful methods for
    getting user input
  • e.g. from an input form
  • have_parameters
  • true if parameters (POST or GET) are available
  • param(name)
  • get the value of a named parameter

57
Scripting TechniquesWriting Export Plugins
58
Plugin Framework
  • EPrints provides a framework for plugins
  • registration of plugin capabilities
  • standard interface which plugins need to
    implement
  • Several types of plugin interface provided
  • import and export
  • get data in and out of the repository
  • interface screens
  • add new tools and reports to UI
  • input components
  • add new ways for users to enter data

59
Plugin Framework (2)
  • Not just a plugin framework for 3rd party
    extensions!
  • Used extensively by EPrints itself
  • majority of (dynamic) Web pages you see are
    screen plugins
  • search, deposit workflow, editorial review, item
    control page, user profile, saved searches,
    adminstration tools...
  • all import/export options implemented as plugins
  • all input components in deposit workflow are
    plugins
  • subject browser input, file upload...

60
Plugin Framework (3)
  • EPrints is really a generic plugin framework
  • with a set of plugins that implement the
    functions of a repository
  • Gives plugin developers many examples to work
    from
  • find a plugin that does something similar to what
    you want to achieve and explore how it works

Plugins
Plugin Framework
Backend (data model)
61
Writing Export Plugins
  • Typically a standalone Perl module in
  • perl_lib/EPrints/Plugin/Export/
  • Writing export plugins
  • register plugin
  • define how to convert data objects to an
    output/interchange format

62
Export Plugin Registration
  • Register
  • name
  • the name of the plugin
  • visible
  • who can use it
  • accept
  • what the plugin can convert
  • lists of data objects or single data objects (or
    both)
  • type of record (eprint, user...)
  • suffix and mimetype
  • file extension and MIME type of format plugin
    converts to

63
Registration Example BibTeX
  • self-gtname "BibTeX"
  • self-gtaccept 'list/eprint',
    'dataobj/eprint'
  • self-gtvisible "all"
  • self-gtsuffix ".bib"
  • self-gtmimetype "text/plain"
  • Converts lists or single EPrint objects
  • Available to all users
  • Produces plain text file with .bib extension

64
Registration Example FOAF
  • self-gtname "FOAF Export"
  • self-gtaccept 'dataobj/user'
  • self-gtvisible "all"
  • self-gtsuffix ".rdf"
  • self-gtmimetype "text/xml"
  • Converts single User objects
  • Available to all users
  • Produces XML file with .rdf extension

65
Registration Example XML
  • self-gtname "EP3 XML"
  • self-gtaccept 'list/', 'dataobj/'
  • self-gtvisible "all"
  • self-gtsuffix ".xml"
  • self-gtmimetype "text/xml"
  • Converts any data object
  • Available to all users
  • Produces XML file with .xml extension

66
Export Plugin Conversion
  • For a straight conversion plugin, this usually
    includes
  • mapping data objects to output/interchange format
  • serialising the output/interchange format
  • e.g. EndNote conversion section
  • data-gtK dataobj-gtget_value( "keywords" )
  • data-gtT dataobj-gtget_value( "title" )
  • data-gtU dataobj-gtget_url

67
Export Plugin Conversion (2)
  • But export plugins arent limited to straight
    conversions!
  • Explore
  • Google Maps export plugin
  • plot location data on map
  • http//files.eprints.org/224/
  • Timeline export plugin
  • plot date data on timeline
  • http//files.eprints.org/225/

68
Export Plugin Template
  • Register
  • subclass EPrintsPluginExport
  • inherits all the mechanics so you dont have to
    worry about them
  • could subclass existing plugin e.g. XML, Feed
  • define name, accept, visible etc.
  • in constructor new() of plugin module
  • Conversion
  • define output_dataobj function
  • will be called by plugin subsystem for every data
    object that needs to be converted

69
Writing Import Plugins
  • Typically a standalone Perl module in
  • perl_lib/EPrints/Plugin/Import/
  • Reading input can be harder than writing output
  • need to detect and handle errors in input
  • many existing libraries available for parsing a
    wide variety of file formats
  • Writing import plugins
  • register
  • define how to convert input/interchange format
    into data objects
  • reverse of export

70
Scripting TechniquesWriting Screen Plugins
71
Plugins Writing Screen Plugins
  • One or more Perl modules in
  • perl_lib/EPrints/Plugin/Screen/
  • may be bundled with phrases, config files,
    stylesheets etc.
  • Writing screen plugins
  • register
  • where it appears in UI
  • who can use it
  • define functionality

72
Screen Plugin Registration
  • Register
  • actions
  • the actions the plugin can carry out (if any)
  • appears
  • where abouts in the interface the plugin and/or
    actions will appear
  • named list
  • position in list
  • will be displayed as link, button or tab

73
Registration Example Manage Deposits
  • self-gtappears
  • place gt "key_tools", position gt 100,

key_tools list
74
Registration Example EPrint Details
  • self-gtappears
  • place gt "eprint_view_tabs", position gt 100,
    ,

eprint_view_tabs list (each tab is a single
screen plugin)
75
Registration Example New Item
  • self-gtappears
  • place gt item_tools", position gt 100,
  • action gt create, ,

item_tools list (create action will be invoked
when button pressed)
76
Screen Plugin Define Functionality
  • 3 types of screen plugin
  • Render only
  • define how to produce output display
  • examples AdminStatus, EPrintDetails
  • Action only (no output display)
  • define how to carry out action(s)
  • examples AdminIndexerControl, EPrintMove,
    EPrintNewVersion
  • Combined (interactive)
  • define how to produce output/carry out action(s)
  • examples EPrintRejectWithEmail, EPrintEdit,
    UserEdit

77
Screen Plugins Displaying Messages
  • Action plugins produce no output display but can
    still display messages to user
  • add_message(type, message)
  • register a message that will be displayed to the
    user on the next screen they see
  • type can be
  • error
  • warning
  • message (informational)

78
Screen Plugin Template Render Only
  • Register
  • subclass EPrintsPluginScreen
  • inherits all the mechanics so you dont have to
    worry about them
  • could subclass existing plugin e.g. EPrint, User
  • define where plugin appears
  • in constructor new() of plugin module
  • define who can view plugin (if required)
  • can_be_viewed function
  • e.g. check user privileges
  • Define functionality
  • define render function
  • produce output display using API render_ methods

79
Screen Plugin Template Action Only
  • Register
  • subclass EPrintsPluginScreen
  • define actions supported
  • define where actions appear
  • define who can use actions
  • allow_ACTION function(s)
  • Define functionality
  • define action_ACTION function(s)
  • carry out the action
  • use add_message to show result/error
  • redirect to a different screen when done

80
Screen Plugin Template Combined
  • render function usually displays links/buttons
    which invoke the plugins actions
  • e.g. EPrintRemove
  • registers remove and cancel actions
  • render function displays Are you sure? screen
  • OK/Cancel buttons invoke remove/cancel actions

81
Scripting TechniquesWriting Command Line Scripts
82
Command Line Scripts
  • Usually stored in bin directory
  • Add batch/offline processes to your repository
  • e.g. duplicate detection compare each record to
    every other record
  • e.g. file integrity - check stored MD5 sums
    against actual MD5 sums

83
Connecting to the Repository
  • Command line scripts (and CGI scripts) must
    explicitly connect to the repository by creating
    a new Session object
  • new(mode, repositoryid)
  • set mode to 1 for command line scripts
  • set mode to 0 for CGI scripts
  • And disconnect from the repository when complete
  • terminate()
  • performs necessary cleanup

84
Using Render Functions
  • XHTML is good for building Web pages
  • but not so good for command line output!
  • often no string equivalent
  • use tree_to_utf8()
  • extracts a string from the result of any
    rendering method
  • tree_to_utf8(
  • eprint-gtrender_citation)

85
Search and Modify Template
  • Common pattern for command line tools
  • Connect to repository
  • Get desired dataset
  • Search dataset
  • Apply function to matching results
  • modify result
  • commit changes
  • Disconnect from repository

86
Example lift_embargos
  • Removes access restrictions on documents with
    expired embargos
  • Connect to repository
  • Get document dataset
  • Search dataset
  • embargo date field earlier than todays date
  • Apply function to matching results
  • remove access restriction
  • clear embargo date
  • commit changes
  • Disconnect from repository

87
Scripting TechniquesWriting CGI Scripts
88
CGI Scripts
  • Usually stored in cgi directory
  • Largely superceded by screen plugins but can
    still be used to add e.g. custom reports to your
    repository
  • Similar template to command-line scripts but
    build Web page output using API render_ methods

89
Building Pages
  • In Screen plugins, mechanics of sending Web pages
    to the users browser are handled by the plugin
    subsystem
  • need to do this yourself with CGI scripts
  • methods provided by the Session object
  • build_page(title, body)
  • wraps your XHTML document in the archive template
  • send_page()
  • flatten page and send it to the user

90
Summary
  • Use the core API to manipulate data in the API
  • individual data objects
  • EPrint, Document, User
  • sets of data objects
  • DataSet, List, Search
  • Wrap this in a plugin or script
  • Session, Repository
  • Web output using render_ methods
  • templates
  • Next hands-on exercises designed to get you
    started with these techniques
Write a Comment
User Comments (0)
About PowerShow.com