FITS: The File Information Tool Set - PowerPoint PPT Presentation

About This Presentation
Title:

FITS: The File Information Tool Set

Description:

FITS: The File Information Tool Set Background FITS is part of the second generation Harvard University Library Digital Repository Service(DRS2), which supports ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 14
Provided by: Spen195
Category:

less

Transcript and Presenter's Notes

Title: FITS: The File Information Tool Set


1
FITS The File Information Tool Set
2
Background
  • FITS is part of the second generation Harvard
    University Library Digital Repository
    Service(DRS2), which supports content models and
    METS/PREMIS object descriptors.
  • Developed Fall 2008
  • First public release Spring 2009
    http//fits.googlecode.com

3
Why?
  • Needed an automatic way to identify and extract
    metadata for a wide range of file types
  • No single file analysis tool satisfied our needs

4
Design Goals
  • Act as a wrapper around other open source tools
  • Extensible
  • Needs to be a standalone command line tool and
    also provide an API
  • Allow priority setting for tools
  • Open source

5
The Tools
  • Current tools
  • Jhove 1.5
  • Exiftool
  • National Library of New Zealand Metadata
    Extractor (NLNZ)
  • DROID
  • FFIdent
  • File Utility
  • 3 Categories
  • File Identification (all of them)
  • Metadata Extraction (Jhove, Exiftool, NLNZ)
  • format Validation (Jhove)

6
Process

7
Features
  • Conflict management
  • Value normalization
  • inches vs 2
  • Tool prioritization
  • Format tree for understanding more specific
    format identities.
  • PDF/A is a more specific version of PDF

8
Example Output
  • ltfitsgt
  • ltidentificationgt
  • ltidentity format"Graphics Interchange
    Format" mimetype"image/gif"gt
  • lttool toolname"Jhove" toolversion"1.5" /gt
  • ...
  • lt/identitygt
  • lt/identificationgt
  • ltfileinfogt
  • ltsize toolname"OIS File Information"
    toolversion"0.1" status"SINGLE_RESULT"gt40149lt/si
    zegt
  • ltmd5checksum toolname"OIS File
    Information" toolversion"0.1"
  • status"SINGLE_RESULT"gt265c9345ebf93c89
    d472766fda095de4lt/md5checksumgt
  • ...
  • lt/fileinfogt
  • ltfilestatusgt
  • ltwell-formed toolname"Jhove"
    toolversion"1.5" status"SINGLE_RESULT"gttruelt/wel
    l-formedgt
  • ltvalid toolname"Jhove" toolversion"1.5"
    status"SINGLE_RESULT"gttruelt/validgt
  • lt/filestatusgt
  • ltmetadatagt
  • ltimagegt

9
Configuration
  • All settings are in the fits.xml config file
  • Enable/disable tools (available in the API too)
  • Prevent tools from processing files with specific
    file extensions
  • Set tool priority
  • Add new tools
  • Use your own consolidator code
  • Report or ignore conflicts
  • Options to display original tool output

10
Sample Configuration File
  • ltfits_configurationgt
  • lt!-- Order of the tools determines preference
    --gt
  • lttoolsgt
  • lt!-- exclude-exts attribute is a comma
    delimited list of file extensions that the tool
    should not try to process --gt
  • lttool class"edu.harvard.hul.ois.fits.tools.jhov
    e.Jhove" exclude-exts"dng,mbx"/gt
  • lttool class"edu.harvard.hul.ois.fits.tools.fil
    eutility.FileUtility" exclude-exts"dng,wps"/gt
  • lttool class"edu.harvard.hul.ois.fits.tools.exi
    ftool.Exiftool" exclude-exts"txt,wps,vsd"/gt
  • lttool class"edu.harvard.hul.ois.fits.tools.dro
    id.Droid" exclude-exts"dng"/gt
  • lttool class"edu.harvard.hul.ois.fits.tools.nln
    z.MetadataExtractor" exclude-exts"dng,zip,odb,ott
    ,odg,otg,odp,otp,ods,ots,odc,otc,odi,oti,odf,otf,o
    dm,oth"/gt
  • lttool class"edu.harvard.hul.ois.fits.tools.oisf
    ileinfo.FileInfo"/gt
  • lttool class"edu.harvard.hul.ois.fits.tools.oisf
    ileinfo.XmlMetadata"/gt
  • lttool class"edu.harvard.hul.ois.fits.tools.ffid
    ent.FFIdent" exclude-exts"dng,wps,vsd"/gt
  • lt/toolsgt
  • ltoutputgt
  • ltdataConsolidator class"edu.harvard.hul.ois.fit
    s.consolidation.OISConsolidator"/gt
  • ltdisplay-tool-outputgttruelt/display-tool-outputgt
  • ltreport-conflictsgttruelt/report-conflictsgt

10
11
Some Limitations...
  • Speed
  • Technical metadata only returned if the tool that
    reported it is in the first ltidentitygt block
  • FITS considers a successful identification to be
    a combination of the format name and mime type

12
Future Plans
  • More tools
  • Apache Tika (text document formats)
  • Jhove 2
  • Aduna Aperture (text, documents, email formats)
  • Mediainfo (audio and video formats)
  • Better audio and video format support as we add
    object support for them to DRS2

13
Wrap Up
  • http//fits.googlecode.com
  • http//ots-schemas.googlecode.com
  • Java library for reading and writing METS
    (limited support), MODS, PREMIS, MIX, TextMD,
    DocumentMD, and soon AES audio metadata
  • More information on DRS2 http//hul.harvard.edu/o
    is/systems/drs/enhancements.html
Write a Comment
User Comments (0)
About PowerShow.com