Building Greenstone Collections from the Command Line - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Building Greenstone Collections from the Command Line

Description:

Type 'setup.bat' (for Windows users) or 'setup.sh' for (Unix/Linux ... html' 'Majestic, The' '2001' 'Jim Carrey, Bob Balaban, Jeffrey DeMunn' 'Frank Darabont' ... – PowerPoint PPT presentation

Number of Views:178
Avg rating:3.0/5.0
Slides: 28
Provided by: mong6
Category:

less

Transcript and Presenter's Notes

Title: Building Greenstone Collections from the Command Line


1
Building Greenstone Collections from the Command
Line
2
Basic commands
  • Type setup.bat (for Windows users) or
    setup.sh for (Unix/Linux users) when youre in
    the Greenstone installation directory
  • To create a collection, type perl S mkcol.pl
    creator youremail_at_somewhere.com collection_name
  • To import documents into a collection, type perl
    S import.pl collection_name
  • To build a collection, type perl S buildcol.pl
    collection_name
  • For further details, read page 9 19 of the
    developers guide

3
Building A Collection In Greenstone
XML documents
Browsing and full text
Web
Import
Archives
Index
import.pl (plugins)
build.pl (classifiers)
4
Importing documents
  • Plugins are used to process source documents in
    different formats and associate the corresponding
    metadata to them
  • The output of this process is XML documents
    encoded in the Greenstone Archive format
    specified by the following DTD
  • lt!DOCTYPE GreenstoneArchive
  • lt!ELEMENT Section (Description,Content,Section)gt
  • lt!ELEMENT Description (Metadata)gt
  • lt!ELEMENT Content (PCDATA)gt
  • lt!ELEMENT Metadata (PCDATA)gt
  • ltATTLIST Metadata name CDATA REQUIREDgt
  • gt

5
Automating collection building tasks
  • Batch files can automate many of the tasks
  • You can create a batch file to import and rebuild
    a collection
  • Try copy and paste the following lines into a
    batch file named rebuild.bat
  • Perl S import.pl removeold 1
  • Perl S buildcol.pl 1
  • Execute the batch file by typing rebuild.bat
    collection_name
  • There are many commands that you can combined in
    a batch file

6
Importing documents (cont.)
  • An example
  • ltSectiongt
  • ltDescriptiongt
  • ltMetadata name"gsdlsourcefilename"gtec158e.txtlt/Me
    tadatagt
  • ltMetadata name"Title"gtFreshwater Resources in
    Arid Landslt/Metadatagt
  • ltMetadata name"Identifier"gtHASH0158f56086efffe592
    636058lt/Metadatagt
  • ltMetadata name"gsdlassocfile"gtcover.jpgimage/jpe
    glt/Metadatagt
  • ltMetadata name"gsdlassocfile"gtp07a.pngimage/png
    lt/Metadatagt
  • lt/Descriptiongt
  • ltSectiongt
  • Note gsdlsourcefile is the original file from
    which the Greenstone archive file was generated,
    and gsdlassocfile is File associated with the
    document (e.g. an image file)

7
Document Metadata
  • Greenstone Plugins recognize only a small set of
    metadata tags
  • There are three ways to assign metadata to
    documents in a collection 1) index.txt, 2)
    metadata.xml and 3) modify an existing Greenstone
    plugin
  • An index.txt file is a space separated file that
    assigns a list of metadata to documents in a
    collection. It should be placed in the collection
    import directory

8
Document Metadata (cont.)
  • To inform Greenstone about the existence of this
    file, include the IndexPlug plugin in your
    collect.cfg file or add this plugin to your
    plugin list in GLI
  • An example of the index.txt file is as follows
  • key Title Date Cast Director
  • "analyze.html" "Analyze That" "2002" "Robert De
    Niro, Billy Crystal, Lisa Kudrow" "Harold Ramis
  • "majestic.html" "Majestic, The" "2001" "Jim
    Carrey, Bob Balaban, Jeffrey DeMunn" "Frank
    Darabont
  • Each of the fields in this file are seperated by
    a space and enclosed in double quotes. Their
    offsets are matched with the listing of fields
    shown in the first lien of the file
  • Note that the first field of this listing must be
    the filename of a document
  • The trailers collection uses this approach to
    assign metadata to documents in a collection

9
Document Metadata (cont.)
  • The second approach uses an XML file to assign
    metadata to documents in a collection
  • To inform Greenstone that you would like to use
    the metadata.xml file, include the string plugin
    RecPlug -use_metadata_files in your collect.cfg
    file or check the use_metadata_files flag after
    clicking on the configure plugin button in the
    GLI
  • The benefits of using an XML file over the
    previous approach is that the browser can perform
    tag checking for you

10
Document Metadata (cont.)
  • lt?xml version"1.0" ?gt
  • ltDirectoryMetadatagt
  • ltFileSetgt
  • ltFileNamegtMARTYN_DR_02002066.htmllt/FileNamegt
  • ltDescriptiongt
  • ltMetadata name"PlayerID"gtMARTYN_DR_02002066lt/Met
    adatagt
  • ltMetadata name"PlayerProfile"gtlt/Metadatagt
  • ltMetadata name"PlayerName"gtDamien Richard
    Martynlt/Metadatagt
  • ltMetadata name"FullSizeImage"gthttp//www-usa.cri
    cket.org//perl/picture.cgi/030730lt/Metadatagt
  • ltMetadata name"ThumbnailImage"gthttp//www-usa.cr
    icket.org//perl/picture.cgi/030730/inline?alt1lt/M
    etadatagt
  • ltMetadata name"CoverImage"gtMARTYN_DR_02002066.jp
    glt/Metadatagt
  • ltMetadata name"Country"gtAustralialt/Metadatagt
  • ltMetadata name"BattingStyle"gtRight Hand
    Batlt/Metadatagt
  • ltMetadata name"BowlingStyle"gtRight Arm
    Mediumlt/Metadatagt
  • lt/Descriptiongt
  • lt/FileSetgt
  • ltFileSetgt
  • ltFileNamegtPOTHECARY_JE_03001137.htmllt/FileName
    gt
  • ltDescriptiongt

11
Document Metadata (cont.)
  • Heres the answer
  • ltDirectoryMetadatagt
  • ltFileSetgt
  • ltFileNamegttext lt/FileNamegt
  • ltDescriptiongt
  • ltMetadata namename1"gtsome textlt/Metadatagt
  • ltMetadata name" name 2"gt some text lt/Metadatagt
  • other Metadata tags
  • lt/Descriptiongt
  • lt/FileSetgt
  • other FileSet tags
  • ltDirectoryMetadatagt
  • Note that XML is case sensative
  • The cricket collection uses the metadata.xml to
    assign metadata to the documents

12
Document Metadata (cont.)
  • We can also customize a plugin to extract
    metadata from a document
  • We will look at modifying the TextPlug to extract
    Ratings, Genre and Subject from a few documents
    in the trailers collection

13
Structuring Documents into Sections
  • Sometimes source documents have to be structured
    into sections and subsections
  • This can be done easily by incorporating the
    following HTML tags into your documents
  • lt!--
  • ltSectiongt
  • ltDescriptiongt
  • ltMetadata name"Title"gt Realizing human rights
    for poor
  • people Strategies for achieving the
    international
  • development targets lt/Metadatagt
  • lt/Descriptiongt
  • --gt
  • (text of section goes here)
  • lt!--
  • lt/Sectiongt
  • --gt
  • You can also embed subsections within another
    section by embedding another level of ltSectiongt
    before the lt/Sectiongt tag
  • Look at one of the HTML files in the demo
    collection for an example

14
Browsing Indexes
15
Types of Browsing Indexes
  • SectionList
  • AZList
  • AZSectionList
  • DateList
  • Hierarchy

16
Creating Browsing Indexes
  • Certain classifiers generate browsing structures
    that are hierarchical
  • They are useful for subject classifications and
    organization hierarchies
  • Therefore specific hierarchies will have to be
    provided using the flag hfile ltfilenamegt when
    the classifier is defined in the collect.cfg file
  • For example
  • classify Hierarchy hfile sub.txt metadata
    Subject sort Title

17
Creating Browsing Indexes (cont.)
  • Note that sub.txt has to reside in the /etc
    directory
  • Certain classifiers dont require explicit
    hierarchies to be defined. For instance, the
    AZList, DateList and List classifiers that
    generates a selection list of the corresponding
    metadata
  • classify List metadata Howto
  • classify AZList metadata Title

18
Creating Browsing Indexes (cont.)
  • Explicit hierarchies have to be define according
    to the following format
  • ltidentifiergt ltposition in hierarchygt ltnamegt
  • For example
  • 1 1 General reference
  • 1.2 1.2 Something else
  • 2 2 .
  • What this means is that the metadata type
    associated to the current classifier will be
    assigned to the first classification if it has
    the value 1 within the document
  • Look at the demo collections for examples

19
Creating Browsing Indexes (cont.)
  • Documents are treated internally as tree nodes by
    Greenstone
  • There are three types of nodes Vlist, Hist and
    Datelist
  • For example, an AZList consists of a collection
    of Vlist nodes that represent documents
  • Arguments accepted by various classifiers are in
    page 48 of the developers guide

20
Formatting Browsing Indexes
  • Each classifier has an implicit name from its
    position in the collect.cfg file. For example,
    the third classifier specified in the file is
    called CL3
  • Tags in the formatting strings
  • Text document text
  • link /link link to the document itself
  • icon icon representing the resource
  • metadata-name value of the metadata
    associated to this document

21
Formatting Browsing Indexes (cont.)
  • For example
  • format CL4Vlist ltbrgtlinkHowto/link
  • Conditional statements are supported in the
    formatting string. They are enclosed by the
    and characters in these formats
  • Ifmetadata, then clause, else clause
  • Oraction, another-action, another-action,
    etc
  • The If statement is the same as most program
    languages
  • The Or statement evaluates the items in the
    list and stops when one of them is non-null. Its
    value is sent to the output and evaluation is
    terminated.

22
Formatting Browsing Indexes (cont.)
  • For example
  • format VList "lttd valigntopgtlinkltimg
    src_httpprefix_/collect/cricket/images/PlayerID
    .jpg border0gtlt/linkgtlt/tdgtlttdgtlinkTitle/link
    lt/tdgtlttdgtIf HasAudio,lta hrefaudioURLgtltimg
    src_httpprefix_/collect/cricket/images/wav.jpg
    border0gtlt/agtlt/tdgt"

23
Customizing the look and feel of Greenstone
24
Customizing the look and feel of Greenstone
  • Involved files are in gsdl/macros directory
  • Base.dm global macros, such as custom buttons
  • English.dm text for the corresponding language
  • Home.dm The main GSDL page
  • Gsdl.dm About Greenstone page
  • Style.dm Page layout
  • Query.dm Query form layout

25
Customizing the look and feel of Greenstone
(cont.)
  • Background image (chalk.gif)
  • Base.dm
  • _httpiconchalk_ _httpimg_/chalk.gif
  • _widthchalk_ 2000
  • _heightchalk_ 10
  • Custom Button
  • Base.dm
  • _Genrewidth_ _widthtGenrex_
  • _imageGenre_ _gsimage_(_httpbrowseGenre_,_httpico
    ntGenreof_,_httpicontGenreon_,Genre,_textimageGenr
    e_)
  • _icontabGenregreen_ ltimg
  • src"_httpicontGenregr_" width_widthtGenrex_
    border0gt
  • _icontabGenregreen_v1 _texticontabGenregreen_

26
Customizing the look and feel of Greenstone
(cont.)
  • Document.dm
  • _textGenrepage_ _texticonhGenre_
  • _iconGenrepage_ ltimg src"_httpiconhGenre_"
    width"_widthhGenre_"
  • height"_heighthGenre_"gt
  • _iconGenrepage_ v1 lth2gt_texticonhGenre_lt/h2gt

27
Customizing the look and feel of Greenstone
(cont.)
  • English.dm
  • _textimageGenre_ Browse by Genre
  • _texticontabGenregreen_Genre
  • _httpicontGenregr__httpimg_/tGenregr.gif
  • _httpicontGenreon__httpimg_/tGenreon.gif
  • _httpicontGenreof__httpimg_/tGenreof.gif
  • _widthtGenrex_ 114
  • _texticonhGenre_ Genre
  • _httpiconhGenre_ _httpimg_/h\_Genre.gif
  • _widthhGenre_ 250
  • _heighthGenre_ 57
  • _textGenreshort_ access publications by Genre
  • _textGenrelong_ ltpgtYou can ltigtaccess my
    documents by
Write a Comment
User Comments (0)
About PowerShow.com