Scripting EPrints - PowerPoint PPT Presentation

About This Presentation
Title:

Scripting EPrints

Description:

EPrints modules have embedded documentation. Extract it using perldoc ... but an EPrints archive holds many eprints and documents, has many registered users etc. ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 89
Provided by: epri1
Learn more at: https://www.eprints.org
Category:

less

Transcript and Presenter's Notes

Title: Scripting EPrints


1
Scripting EPrints
2
About This Talk
  • Light on syntax
  • object-gtfunction(arg1, arg2)
  • Incomplete
  • Designed to
  • give you a feel for the EPrints data model
  • introduce you to the most significant objects
  • how they relate to one another
  • their most common methods
  • act as a jumping off point for exploring

3
Finding Documentation
  • EPrints modules have embedded documentation
  • Extract it using perldoc
  • perldoc perl_lib/EPrints/EPrint.pm

4
EPrints 3.0
  • This talk based on EPrints 2.3 series
  • 3.0 API still being finalised
  • tidies up object hierarchy
  • resolves some of 2.3s naming clashes
  • lots of extra functionality
  • but core data model remains the same
  • EPrints 3.0 is fully back-compatible
  • 2.3 scripts will work with EPrints 3.0

5
Roadmap
  • Data
  • EPrints, Users, Documents, Subjects,
    Subscriptions
  • Data collections
  • DataSets, MetaFields
  • Searching your data
  • SearchExpressions
  • Scripting your archive
  • Archives, Session

6
1. Data
  • EPrints, Users, Documents, Subjects, Subscriptions

7
Data Model Sketch
EPrint
8
Data Model Sketch
Document
PDF
EPrint
all documents
Document
HTML
HTML
HTML
9
Data Model Sketch
Document
PDF
EPrint
all documents
Document
owner
HTML
User
HTML
HTML
10
Data Model Sketch
Document
PDF
EPrint
all documents
EPrint
Document
owned eprints
owner
HTML
User
HTML
HTML
11
Data Model Sketch
Document
PDF
EPrint
all documents
EPrint
Document
owned eprints
owner
HTML
User
HTML
HTML
subscriptions
Subscription
Subscription
12
Data Model Sketch
Document
PDF
EPrint
Subject
all documents
EPrint
Document
owned eprints
owner
HTML
User
HTML
HTML
subscriptions
Subscription
Subscription
13
Data Model Sketch
Subject
Document
child
PDF
EPrint
Subject
all documents
EPrint
Document
owned eprints
parent
owner
Subject
HTML
User
HTML
HTML
subscriptions
Subscription
Subscription
14
Data Model Sketch
Subject
EPrint
Document
child
posted eprints
PDF
EPrint
Subject
all documents
EPrint
Document
owned eprints
parent
owner
Subject
HTML
User
HTML
HTML
subscriptions
Subscription
Subscription
15
EPrint
  • An EPrint object represents a single deposit in
    your EPrints archive
  • has some metadata fields
  • has one or more documents
  • is owned by a user

16
Creating EPrints
  • new(session, id)
  • create an EPrint object for an existing deposit
  • create(session, dataset, data)
  • create a new EPrint object
  • More on sessions and datasets later!

17
Introducing DataObj
  • EPrint is a subclass of DataObj
  • DataObj provides common methods for
  • accessing metadata
  • rendering XHTML output

18
Inherited from DataObj
  • get_id
  • get_url(staff)
  • get the URL of an EPrint
  • e.g. URL to the abstract page of an eprint in the
    archive
  • if staff is true then returns the URL to the
    staff view, which shows more detail
  • get_type()
  • get the EPrint type
  • e.g. article, book, thesis, conference paper...

19
Inherited from DataObj
  • get_value(fieldname)
  • get the value of the named field
  • set_value(fieldname, value)
  • set the value of the named field
  • Remember to call commit() to make changes in
    database!
  • is_set(fieldname)
  • true if the named field has a value

20
EPrint Methods
  • remove()
  • erase the eprint and any associated records/files
    from the database and filesystem
  • this should only be called on EPrints in the
    "inbox" or "buffer" datasets
  • commit()
  • commit any changes made to the database
  • datestamp()
  • set the last modified date to today

21
Moving EPrints Around
  • move_to_deletion()
  • transfer the eprint to the deletion dataset
  • should only be called on eprints in the archive
    dataset
  • See also
  • move_to_inbox()
  • move_to_buffer()
  • move_to_archive()

22
Rendering EPrints
  • generate_static()
  • generate the static abstract page for the eprint
  • in a multi-language archive this will generate a
    page in each language

23
Rendering - Inherited from DataObj
  • render_citation(style)
  • create an XHTML citation for the EPrint
  • if style is set then use the named citation style
  • defined in citations-en.xml
  • render_citation_link(style)
  • as above, but citation is linked to the EPrints
    abstract page

24
Rendering - Inherited from DataObj
  • render_value(fieldname, showall)
  • get an XHTML fragment containing the rendered
    version of the value of the named field
  • in the current language
  • if showall is true then all languages are
    rendered
  • usually used for staff viewing (checking) data

25
Rendering Tips
  • Most rendering methods return XHTML
  • but not a string!
  • XML Node objects
  • DocumentFragment, Element, TextNode...
  • In your scripts, build a document tree from these
    nodes
  • e.g. node1-gtappendChild(node2)
  • then flatten it to a string
  • Why? Its easier to manipulate a tree than to
    manipulate a large string

26
More Rendering Tips
  • XML Node objects are not part of EPrints
  • XMLDOM or XMLGDOME libraries
  • explore these libraries using perldoc
  • XHTML is good for building Web pages
  • but not so good for command line output!
  • use tree_to_utf8()
  • extracts a string from the result of any
    rendering method
  • tree_to_utf8(
  • eprint-gtrender_citation)

27
Navigating to Related Objects
  • get_user()
  • get a User object representing the user to whom
    the EPrint belongs
  • get_all_documents()
  • get a list of all the Document objects associated
    with the EPrint
  • We will look at these objects next...

28
User
  • A User object represents a single registered user
  • Also a subclass of DataObj
  • inherits metadata access methods
  • get_url get_type get_value set_value is_set
  • inherits rendering methods
  • render_citation render_citation_link render_value
  • Also has commit and remove
  • inherited from DataObj in 3.0

29
Creating Users
  • new(session, id)
  • create a User object from an existing user record
  • user_with_email(session, email)
  • user_with_username(session, username)
  • create_user(session, access_level)
  • create a new User

30
User Accessors
  • get_editable_eprints()
  • get a list of EPrints that the user can edit
  • get_owned_eprints(dataset)
  • get a list of EPrints owned by the user in the
    dataset
  • is_owner(eprint)
  • true if the user is the owner of the EPrint
  • get_subscriptions()
  • get a list of Subscriptions associated with the
    user

31
Document
  • A single document associated with an eprint
  • may actually contain one or more physical files
  • PDF 1 file
  • HTML images many files
  • Another subclass of DataObj

32
Creating a Document Object
  • new(session, docid)
  • create a Document object from an existing record
  • create(session, eprint)
  • create a new Document object for the given EPrint

33
Document Accessors
  • get_eprint()
  • get the EPrint object the document is associated
    with
  • local_path()
  • get the full path of the directory where the
    document is stored in the filesystem
  • files()
  • get a list of (filename, file size) pairs

34
Main File and Format
  • get_main()
  • set_main(main_file)
  • get/set the main file for the document
  • e.g. if the document is multipage HTML with
    images, the main file needs to be set to the top
    index.html file
  • when rendering document links, EPrints always
    links to the main file in the document
  • set_format(format)
  • sets the document format

35
Adding Files to Documents
  • upload(filehandle, filename)
  • uploads the contents of the given file handle
  • adds the file to the document (using the given
    filename)
  • add_file(file, filename)
  • adds a file to the document (using the given
    filename)
  • file is the full path to the file

36
Adding Files to Documents
  • upload_url(url)
  • grab file(s) from given URL
  • in the case of HTML, only relative links will be
    followed
  • add_archive(file, format)
  • add files from a .zip or .tar.gz archive
  • remove_file(filename)
  • remove the named file from the Document

37
Subject
  • A single subject from the subject hierarchy
  • Another subclass of DataObj

38
Creating Subjects
  • new(session, subjectid)
  • create a Subject object from an existing subject
  • create(session, id, name, parent, depositable)
  • create a new Subject
  • depositable specifies whether or not users can
    deposit eprints in the subject

39
Subject Accessors
  • children()
  • get a list of Subjects which are the children of
    the subject
  • get_parents()
  • get a list of Subjects which are the parents of
    the subject
  • subject_label(session, subject_tag)
  • get the full label of a subject, including parents

40
Subject Accessors
  • count_eprints(dataset)
  • get the number of eprints associated with the
    subject
  • posted_eprints(dataset)
  • get a list of EPrints associated with the subject

41
Rendering Subjects
  • render_with_path(session, topsubjid)
  • get a DocumentFragment containing the subject
    path
  • example of a subject path
  • H Social Sciences gt HD Industries. Land use.
    Labor gt HD28 Management. Industrial Management

42
Subscription
  • A stored search which is performed every
    day/week/month on behalf of a user
  • get_user()
  • get the User who owns the subscription
  • Another subclass of DataObj

43
Creating Subscriptions
  • new(session, id)
  • create a Subscription object from an existing
    subscription
  • create(session, userid)
  • create a new Subscription object for the given
    user

44
Processing Subscriptions
  • send_out_subscription()
  • search for new items matching the subscription
    settings
  • email them to the user owning the subscription

45
DataObj Hierarchy
46
So Far..
  • Weve looked at individual data objects
  • but an EPrints archive holds many eprints and
    documents, has many registered users etc.
  • how do we access them collectively?
  • Weve seen the get_value and set_value methods
    for metadata
  • but an archives metadata is configurable
  • so how do we know what metadata fields an EPrint,
    User etc. has?
  • how do we access properties of the fields?

47
2. Data Collections
  • DataSets and MetaFields

48
Dataset
  • A collection of data items
  • Tells us all the possible types in the collection
  • e.g. EPrints may be article, thesis
  • Tells us the fields in each type
  • e.g. article has title, authors, publication...
  • e.g. conference_item has title, authors,
    event_title, event_date..
  • Can also tell us all the fields that apply to a
    dataset
  • title, authors, publication, event_title..

49
Dataset Configuration
  • ArchiveMetadataFieldsConfig.pm
  • fields in each dataset
  • additional system fields defined in EPrint.pm,
    User.pm etc.
  • metadata-types.xml
  • types in each dataset
  • fields that apply to each type

50
Datasets in EPrints
  • archive
  • EPrints that are live in the main archive
  • buffer
  • EPrints that have been submitted for editorial
    approval
  • deletion
  • EPrints that have been deleted from the archive
  • inbox
  • EPrints which users are still working on
  • eprint
  • All EPrints from archive, buffer, deletion and
    inbox

51
Datasets in EPrints
  • user
  • all registered Users
  • subject
  • all Subjects in the subject tree
  • document
  • the Documents belonging to all EPrints in the
    archive
  • subscription
  • the Subscriptions which Users have requested

52
DataSet Accessors
  • id()
  • get the id of the dataset
  • count(session)
  • get the number of items in the dataset
  • get_item_ids(session)
  • get a list of ids of the items in the dataset

53
Datasets and MetaFields
  • Many Dataset methods return MetaField objects
  • A MetaField
  • is a single field in a dataset
  • tells us properties of the field
  • e.g. name, type, input_rows, maxlength, multiple
    etc.
  • configured in ArchiveMetadataFieldsConfig.pm
  • but not the field value
  • the value is specific to the individual EPrint,
    User etc.
  • e.g. eprint-gtget_value(title)

54
MetaField Methods
  • get_name()
  • get the field name
  • get_type()
  • get the field type
  • get_property(name)
  • set_property(name, value)
  • get/set the named property to the given value

55
MetaField Type Hierarchy
56
DataSet Accessors
  • has_field(fieldname)
  • true if the dataset has a field of that name
  • get_field(fieldname)
  • get a MetaField object describing the named field

57
DataSet Accessors
  • get_fields()
  • get a list of MetaFields belonging to the dataset
  • get_types()
  • get a list of all types in the dataset
  • e.g. EPrint types article, book, book_section,
    conference_item, monograph, patent, thesis, other
  • e.g. User types user, editor, admin
  • get_type_name(session, type)
  • get a string containing a human-readable name for
    the specified type in current language

58
DataSet Accessors
  • get_type_fields(type)
  • get a list of MetaFields belonging to the given
    type
  • get_required_type_fields(type)
  • get a list of the MetaFields which are required
    for the given type
  • field_required_in_type(field, type)
  • true if given field is required in given type

59
Rendering DataSets
  • render_name(session)
  • get an XHTML fragment containing the name of the
    dataset in the language of the current session
  • render_type_name(session, type)
  • get an XHTML fragment containing the name of the
    given type in the language of the session

60
Rendering MetaFields
  • render_name(session)
  • get an XHTML fragment containing the name of the
    field in the current language
  • e.g. from phrases-en.xml
  • ltepphrase ref"eprint_fieldname_title"gt
    Titlelt/epphrasegt

61
Rendering MetaFields
  • render_input_field(session, value)
  • get some XHTML containing input controls that
    will allow a user to input data to the field
  • value is the default value

62
Rendering MetaFields
  • render_help(session, type)
  • get some XHTML containing help text for a user
    inputting some data for the field
  • if an optional type is specified then specific
    help for that type will be used if available
  • e.g. from phrases-en.xml
  • ltepphrase ref"eprint_fieldhelp_title"gtThe title
    of the item.lt/epphrasegt
  • ltepphrase ref"eprint_fieldhelp_title.book"gtThe
    title of the book, usually found on the title
    page.lt/epphrasegt

63
So Far...
  • We know how to access data objects in EPrints
  • EPrint, User, Document ...
  • We know how to access collections of these
    objects
  • Datasets
  • MetaFields
  • Now, how do we search for items?

64
Searching Your Archive
  • SearchExpressions

65
SearchExpression
  • The conditions of a single search
  • new(data)
  • create a new search expression from the given
    data
  • se new SearchExpression(
  • session gt session,
  • dataset gt dataset,
  • custom_order gt title )
  • sorted by title, ascending

66
Adding Search Fields
  • add_field(metafield, value)
  • add a new search field with the given value
    (search text) to the search expression
  • if the search field already exists in the search
    expression, its value is replaced

67
Adding Search Fields
  • Example full text search
  • searchexp-gtadd_field(
  • dataset-gtget_field(title),
  • routing,
  • IN,
  • ALL )

68
Adding Search Fields
  • Example full text search
  • matches word in title OR abstract
  • searchexp-gtadd_field(
  • ds-gtget_field(title),
  • dataset-gtget_field(abstract) ,
  • routing,
  • IN,
  • ALL )

69
Adding Search Fields
  • Example date range search
  • searchexp-gtadd_field(
  • dataset-gtget_field(date),
  • 2000-2004,
  • EQ,
  • ALL )

70
Serialising Searches
  • serialise()
  • get a text representation of the search
    expression, for persistent storage
  • from_string(string)
  • unserialises the contents of string
  • but only into the fields already existing in the
    SearchExpression

71
Rendering SearchExpressions
  • render_description()
  • get some XHTML describing the current parameters
    of the search expression
  • render_search_form(help)
  • render an input form for the search expression
  • if help is true then this also renders the help
    for each search field in current language

72
Processing Results
  • Carry out a search using
  • perform_search()
  • The results can then be accessed
  • count()
  • get the number of results
  • get_records(offset, count)
  • get_ids(offset, count)
  • get a list of DataObjs (e.g. EPrint, User)
    representing the result set, or just their ids
  • optionally specify a range of results to return
    from result set using count and offset

73
Processing Results
  • map(function, args)
  • using get_records to get results uses a lot of
    memory if there are 1000s of results
  • apply the function to each result without
    overhead
  • function is called with args
  • (session, dataset, dataobj, args)
  • the DataSet object also has a map function
  • creates a SearchExpression over dataset
  • sets allow_blank 1
  • passes args to searchexp-gtmap

74
Aside Lists in EPrints 3.0
  • In EPrints 3.0, searches return a List
  • ordered collection of DataObjs
  • In fact, any 2.3 function which returns a list
    (array) of DataObjs returns a List in 3.0
  • list-gtreorder( neworder )
  • list-gtunion( list2 )
  • list-gtintersect( list2 )
  • list-gtremainder( list2 )
  • map over items in the list
  • even arbitrarily constructed ones

75
Scripting Your Repository
  • Archives and Sessions

76
Archive
  • One EPrints installation can host multiple
    archives
  • An Archive object is a single EPrints archive
  • access archive-specific configuration
  • Dont confuse the Archive object with the archive
    DataSet!
  • archive-gtget_dataset(archive)
  • renamed Repository in 3.0

77
Archive Accessors
  • get_id()
  • get the id string of the archive.
  • get_conf(key, subkeys)
  • get a named configuration setting
  • probably set in ArchiveConfig.pm
  • get_conf( "stuff", "en", "foo" )

78
Calling Archive Subs
  • call(cmd, params)
  • calls the subroutine named cmd specified in the
    archive configuration (ArchiveConfig.pm etc.)
    with the given parameters and returns the result
  • can_call(cmd)
  • true if the named cmd exists in the archive
    configuration
  • lets you delegate processing to user space

79
Session
  • Not a session in the traditional Web sense
  • not stateful (although it might be in future!)
  • 3.0 introduces cookie-based authentication
  • global object which provides access to
  • current language
  • generic rendering functions
  • CGI parameters (input from forms etc.)
  • http request
  • Always create a session object at the beginning
    of your script
  • dont forget to terminate it at the end

80
Creating Ending a Session
  • new(mode, param)
  • set mode to 0 for online session (CGI script)
  • uses language from cookie, http headers, or
    default language
  • set mode to 1 for offline session (cmd line
    script)
  • param is the id of archive, uses default language
  • terminate()
  • terminate session, performing necessary cleanup

81
Web Page Building Blocks
  • make_doc_fragment()
  • create an empty XHTML document
  • fill it with things!
  • make_text(text)
  • create an XML TextNode
  • make_element(name, attrs)
  • create an XHTML Element
  • make_element("p", align gt "right")
  • ltp alignright /gt

82
Web Page Building Blocks
  • render_link(uri, target)
  • create an XHTML link
  • link session-gt render_link("foo.html",
    "frame1")
  • link-gtappendChild(session-gt make_text("Foo"))
  • lta hreffoo.html targetframe1gtFoolt/agt

83
Web Page Building Blocks
  • Many methods for building input forms, including
  • render_form(method, dest)
  • render_option_list(params)
  • render_hidden_field(name, value)
  • render_upload_field(name)
  • render_action_buttons(buttons)
  • ...

84
Web Pages
  • build_page(title, body)
  • wraps your XHTML document in the archive template
  • send_page()
  • flatten page and send it to the user

85
Language
  • change_lang(langid)
  • change the session language to the given language
    ID
  • phrase(phraseid, inserts)
  • get given phrase (as a string) in the current
    language
  • looks up phraseid in language-specific phrase
    file
  • e.g. phrases-en.xml
  • lets look at an example of the inserts
    parameter...

86
Language
  • html_phrase(phraseid, inserts)
  • render an XHTML phrase in the current language
  • session-gthtml_phrase( 'link_to_google', link gt
    session-gt render_link(http//www.google.com))
  • gets phrase link_to_google from phrases-en.xml
  • ltepphrasegtlteppin ref"link"gtSearch
    Googlelt/eppingtlt/epphrasegt
  • lta href"http//www.google.com"gtSearch Googlelt/agt

87
User Input
  • have_parameters
  • true if parameters (POST or GET) were passed to
    the CGI script (e.g. from an input form)
  • param(name)
  • get the value of a parameter passed to the CGI
    script
  • get_action_button()
  • get the id of the button the user pressed
  • client
  • get the name of the users browser

88
Navigating the API
Write a Comment
User Comments (0)
About PowerShow.com