D2D: Digital Archive to MPEG21 DIDL - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

D2D: Digital Archive to MPEG21 DIDL

Description:

An atomic unit of information in digital form. a document. an image. a video ... But, here we go the little saga for this episode as it unravels. ... – PowerPoint PPT presentation

Number of Views:11
Avg rating:3.0/5.0
Slides: 57
Provided by: giridharm
Category:

less

Transcript and Presenter's Notes

Title: D2D: Digital Archive to MPEG21 DIDL


1
D2D Digital Archive to MPEG-21 DIDL
  • Suchitra Manepalli

2
Overview
  • Digital Preservation
  • Analysis Techniques
  • MPEG-21 DIDL
  • D2D Tool
  • Qualitative Measures
  • Demo

3
Digital Item
  • An atomic unit of information in digital form
  • a document
  • an image
  • a video
  • Item may refer external entities
  • not part of the digital item
  • Current context
  • preservation intent
  • accessible for analysis
  • digital archive

4
Digital Archive
  • Collection of digital items
  • collective meaning may exist
  • Archive exists
  • a discrete set of items
  • directory of a file system
  • cohesively with help of external programs
  • tar, gunzip etc

5
Digital Preservation
  • Preserving the digital items
  • byte stream
  • metadata
  • reproduce the original digital item
  • Continued access of digital item contents
    invariant of time and technology

6
Metadata
  • Perform various analysis
  • to gather information
  • to verify the integrity
  • to understand clearly
  • Prepare a container
  • to store digital item
  • to store metadata

7
PDI
  • Preservation Description Information
  • Provenance
  • Context
  • Reference
  • Fixity

8
Provenance
  • History of a digital item
  • When
  • Who
  • Maintain the integrity of the digital items past

9
Context
  • Scope of a digital item
  • Environment
  • How

10
Reference
  • Identification systems used for digital item
  • Typical reference includes assigned identifiers

11
Fixity
  • Maintains the integrity of the digital content
  • Refrains modification of content information from
    unauthorized alterations

12
Example - PDI
13
Analyzation
  • Applying various analyzation techniques on the
    digital items
  • produces metadata
  • Metadata - primary source for understanding the
    content

14
Archive Content Analysis
  • JHOVE
  • FRED
  • File
  • Strings
  • Checksum

15
JHOVE
  • JHOVE is JSTOR/Harvard Object Validation
    Environment
  • Format specific identification
  • format of the digital item
  • Format validation
  • conformance to standards of the format
  • Format characterization
  • significant characteristics of the digital item

16
JHOVE Output
17
JHOVE PDI
18
File
  • Digital item classified
  • purpose content storage block
  • type machine-language file
  • format file format or the language
  • file command on UNIX machines
  • file system tests
  • magic number tests
  • language tests

19
File Output
20
File PDI
21
Raw Characters
  • Digital item consists
  • bit stream
  • Character set or the encoding
  • ASCII (7-bit), UTF-8, UTF-16, octet-stream
  • strings command on UNIX
  • printable characters from the specified file,
  • regardless of the file format

22
Strings Output
23
Strings PDI
24
Checksums
  • MD5
  • a cryptographic hash algorithm
  • produces a 128-bit hash value
  • idempotency of the message digest
  • integrity of the original data object content
  • UNIX command line utility
  • md5sum is used to compute

25
Checksums Output
26
Checksums PDI
27
FRED
  • Format Registry Demonstration
  • demonstrates a simple format registry service
  • proof of concept prototype for GDFR
  • brother of TOM
  • Provides information regarding registry formats

28
FRED Output
29
FRED PDI
30
MPEG-21
  • Open standards-based multimedia framework
  • Support interoperability of content across
    communities
  • Managed, described, exchanged, collected
  • Content Metadata Digital Item
  • MPEG-21 Digital Item Declaration
  • data model describe the set of abstract terms, or
    concepts to define digital item

31
DID Data Model
  • Container
  • Item
  • Component
  • Anchor
  • Descriptor
  • Condition
  • Choice
  • Selection
  • Annotation
  • Assertion
  • Resource
  • Fragment
  • Statement
  • Predicate

32
Abstract Entities
  • Container
  • Multiple containers or items or combination of
    both
  • Item
  • Encapsulates the items or containers
  • Component
  • Encapsulates Resources
  • Specify control information for resources defined

33
Abstract Entities
  • Descriptor
  • Express relationship between entities
  • Define the specific entity
  • Resource
  • Star of DID
  • Digital Object or a Physical Object
  • Statement
  • Not part of resource
  • Control, descriptive, identification information

34
DID Data Model
35
DAAP Data Model
  • Signature
  • specifies the type of analyzer
  • environment
  • versioning information
  • Provenance
  • type and value associated with each digital
    items history
  • Context
  • metadata about digital items existence
  • Reference
  • digital items identity

36
DAAP Data Model
  • Fixity
  • information that remains invariant to time.
  • Raw output
  • capture the analysis from the techniques as is
  • Type
  • elements defined for provenance, context,
    reference and fixity.
  • Value
  • associated with the corresponding type.

37
DIDL View
38
D2D Tool - Overview
  • Implemented in Java
  • Command line interface
  • Tool performs functions in three discrete realms
  • Submission
  • Analysis
  • Dissemination

39
Tool Usage
  • Usage
  • D2D ltarchivegt ltarchive formatgt ltcomma separated
    analyzersgt ltdidl namegt
  • Supported Analyzers
  • Jhove Strings FRED
  • File checksum
  • Archive Formats
  • tar tar.gz
  • gz tar.bz
  • bz zip

40
Design
41
API Structure - Submission
  • Identify submitted archive format
  • supports a few archiving formats namely tar, zip,
    gunzip, bunzip etc.
  • Explode the archive
  • Initializes Analysis Manager
  • driven by various analysis techniques passed as
    a parameter
  • Initialize the digital item objects
  • run-time identity
  • referencing the file system location
  • identifying the file size
  • assigning the identifier

42
API Structure Analysis
  • Analysis manager performs the requested
    techniques
  • Presently supports JHOVE, FRED, File, Checksum,
    Strings
  • Easy addition of new analysis technique
  • the manager automatically applies
  • Responsible for producing the timing report
  • individual analyzers
  • overall analysis
  • Serialize the analyzed digital item
  • creation of MPEG-21 DIDL components
  • synching them to disk

43
API - Dissemination
  • Serialized digital items are read
  • Concatenated to produce the complete DIDL
  • DIDL produced is well-formed and valid

44
Performance of D2D Tool
  • Individual analysis timing
  • Aggregate timing

45
Performance of D2D Tool
  • Individual analysis timing
  • Aggregate timing

46
JHOVE
encountered the XML files
47
File
48
Raw Characters
49
Performance of D2D Tool
  • Individual analysis timing
  • Aggregate timing

50
Archive Size
51
Number of Files
52
Performance
  • Number of files Time
  • Archive size Time

53
Demo
54
Conclusion
  • Analysis on the digital archive
  • valuable for digital preservation
  • D2D tool
  • proof-of-concept for other future implementations
  • Timing analysis performed
  • understand the amount of computational resources

55
Future Work
  • Other facets of performance should be explored
  • Preserve archive
  • Existing XML APIs are slower for processing XML
    data
  • Tool should be extended to a multiple-node
    environment like JINI

56
Questions Comments
Write a Comment
User Comments (0)
About PowerShow.com