Digital Library Content Model - PowerPoint PPT Presentation

About This Presentation
Title:

Digital Library Content Model

Description:

Organizations (focus of organization directories) Events (focus of developing 'event gazetteers' ... Computer programs (focus of software directories or libraries) ... – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 64
Provided by: defau391
Category:

less

Transcript and Presenter's Notes

Title: Digital Library Content Model


1
Digital Library Content Model
  • Dagobert Soergel
  • College of Information Studies University of
    Maryland
  • Department of Library and Information Studies
    University at Buffalo

2
The Problem
  • Digital libraries must
  • Store a wide variety of often complex information
    objects and display these objects on different
    platforms. This requires modeling information
    objects, their internal structure, and
    relationships among them.
  • Provide data that support discovery,
    interpretation, use, and management of
    information objects. This requires a good
    metadata model
  • Support annotation of information objects.
    Annotations turn out to be surprisingly diverse.
    An annotation my refer to only a part of an
    information object. This requires an elegant
    model that can deal with many cases.

3
Purpose of the talk
  • To reexamine a number of basic notions regarding
    the content of a digital library (or, more
    generally, any information system) to achieve
    sound definitions
  • Developed in the framework of the
  • DELOS Digital Library Reference Model
  • a framework for describing digital libraries,
    their content, users, and functions and, for
    each, their qualities and associated policies

4
Premisses
  • Modeling the content domain is complex and much
    thinking is muddled
  • Need to be able to handle both data and
    documents
  • Any reference model
  • needs to be abstract and must not commit to any
    particular standard or design decision
  • rather, it must provide a framework for
    specifying the commitments of any particular DL
    (or information system)

5
Issues
  • 0 Scope of this talk and modeling constructs
  • 1a Content in the overall context of a DL
    reference model
  • 1b Modeling information objects
  • 1c Levels, versions, and relationships
  • 1d Composite information objects / resources
  • 1e Resource identifiers
  • 2 Metadata, including provenance, context, usage
  • 3 Annotation

6
Scope of this talk
  • A reference model for a broadly conceived digital
    library will be able to model most any
    information system, thus will be useful very
    broadly.
  • The focus on digital libraries is in the
    application, especially the type of collection,
    to which the model is applied.

7
Scope level of abstraction
  • The reference model should stay on an abstract
    level. It should not require specific standards
    but rather allow for plugging in any standard,
    such as RDA or DC.
  • A DL should indicate to the users what standard
    it uses for things like time, place, type of
    relationship, type of resource
  • The reference model should not require design
    choices but rather provide a framework for
    specifying design choices,such as selectivity of
    the collection. A DL will then indicate whether
    its collection is selective or fully inclusive

8
Modeling constructs
  • The reference model should be based on an
    entity-relationship model (E-R model).
  • Second-order logic relationship instances are
    resources that can in turn be related to
    anything. Apply pragmatically for useful
    navigation and common-sense inferences stay away
    from types of reasoning that run into problems
    with second order logic.
  • Must add mechanisms for indicating the degree of
    precision or the degree of certainty of
    statements.

9
Issues
  • 1a Content in the overall context of a DL
    reference model
  • 1b Modeling information objects
  • 1c Levels, versions, and relationships
  • 1d Composite information objects / resources
  • 1e Resource identifiers
  • 2 Metadata, including provenance, context, usage
  • 3 Annotation

10
Content in the overall context of a DL reference
model
  • Resources
  • Structured data
  • Unstructured data, text
  • Uses of data

11
Everything is a resource
  • W3C definition
  • A resource is anything that can be identified or
    named. Any resource is represented by a resource
    identifiern
  • Resource includes ? external (non-digital)
    objects or events and ? digital object or
    event, wherever that digital object or event may
    reside or occur.
  • Same as topic in topic maps
  • In an E-R model, entity types, entity instances
    (entity values), relationship types, and
    relationship instances are all resources
  • In RDA Resource restricted to information
    object.Advantages of broader definition will
    become clear.

12
Structured data statements
  • Resource 1 ltrelationshipgt Resource 2
  • SoftwareModule ltcreatedBygt LegalEntity
  • SoftwareModule ltannotatedBygt Information object
  • Event lthappenedIngt (Date1, Date2)
  • Multi-way relationships, frames
  • Statements are information objects, that is, they
    are resources that can in turn be related to
    anything
  • Statement also called proposition or assertions
    (or fact)

13
More on structured data
  • Data consist of statements about resources.
  • Such statements can be conceived as relationship
    instances in which the resource in focus
    occupies one argument slot. A simple statement
    using a binary relationship or a multi-way
    relationship (a frame instance with slots
    filled) (objects in an object-oriented database)

14
More on structured data
  • Slot fillers are also known as data values.
  • A data value makes sense only when it is seen in
    relation to one or more resources, for example as
    a slot filler in a frame.
  • Examples
  • The value 55 makes sense only in the right
    context, such as in the success slot of a drug
    treatment frame
  • The value 185 cm makes sense only if we know it
    is the height of a person or the length of a pair
    of skis.

15
  • There are two ways to communicate such
    statements.
  • 1. Structured dataOne learns what one wants to
    know about the resource in focus immediately from
    a relationship instance.
  • Hamlet ltauthoredBygt Shakespeare
  • The drug treatment frame on Taxoteer
  • The actual data of interest are represented in a
    database

16
  • There are two ways to communicate such
    statements.
  • Unstructured dataOne needs to extract what one
    wants to know from a text or image that is
    related to the resource in focus.
  • Shakespeare schrieb den Hamlet im Jahre 1625
  • Hamlet wurde von Shakespeare verfasst
  • Taxoteer ist effektiv in der Behandlung von
    Krebsen die keine Rezeptoren fuer Estrogen
    haben. In aelteren Personen liegt die
    Erfolgsrate bei 50.
  • The data of interest are stored in what is
    commonly known as document.

17
Functions of data
  • Data about a resource may serve any of the
    following functions
  • learn about the resource and its various
    characteristics
  • learn about the history and context of the
    resource
  • learn how to use the resource
  • manage the resource
  • preserve the resource
  • The sections about metadata (roughly data about
    an information object) will specialize this list

18
Relationship as the basic modeling construct
  • Important principle
  • Many concepts in a DL reference model are best
    modeled based on relationships rather than based
    on entities
  • For example, annotation-hood resides not in an
    information object but in the relationship
  • InformationObjectA ltannotatesgt InformatioObjectB
  • InformationObject B ltannotatedBygt
    InformationObjectA

19
Resource type examples
  • Information objectsIncl. documents, data
    streams, databases, queries and their results
    (virtual information objects, such as database
    reports, virtual collections)
  • Actors that can search for, create, and manage
    resources
  • Functions and services
  • Software modules
  • Policies
  • Languages
  • Ideas, concepts

20
Inheritance
  • Many reference model constructs are specified at
    the level of resource.
  • They inherit down to the different resource
    types, especially information objects
  • For example, the following statement types are
    valid for Resource
  • Resource ltidentifiedBygt Identifier
  • Resource ltcharacterizedBygt QualityParameter
  • Resource ltregulatetBygt Policy
  • Therefore, they are also valid for
    InformationObject or Actor or Policy

21
Issues
  • 1a Content in the overall context of a DL
    reference model
  • 1b Modeling information objects
  • 1c Levels, versions, and relationships
  • 1d Composite information objects / resources
  • 1e Resource identifiers
  • 2 Metadata, including provenance, context, usage
  • 3 Annotation

22
Information objects 1
  • A formal relationship instance (such a row in a
    table or a structured data record)
  • A document (written or spoken text, image, sound)
    from which a human reader can learn about the
    resource in focus or about the relationships
    among several resources.
  • Information extraction document ? formal
    relationship instances.
  • A collection of information objects is in turn
    an information object
  • a table in a relational database a collection
    of rows, each representing a relationship
    instance or a collection of relationship
    instances
  • a collection of documents

23
Information objects 2
  • An information object may be a close
    representation of an external object or event,
    for example
  • An image (photograph or painting) of a building.
    There may be many such images taken from
    different angles etc.
  • A video recording of a soccer game. There may be
    several such video recordings, each capturing
    different scenes, or capturing the same scene
    from different angles, or following different
    players, etc. These are different information
    objects representing the same external event.

24
Real world objects, concepts, ideas
  • To provide full access to the information objects
    it contains, a digital library must manage data
    about any kind of object (real world objects,
    concepts, ideas) in its subject domain.
  • Why?
  • The DL may represent data in the form of a
    database
  • Users look for information objects that deal with
    or are digital representations of any kind of
    object.
  • This idea underlies Topic Maps which were
    originally designed to improve access to
    documents by relating the topics discussed in
    these documents.

25
Real world objects, concepts, ideas
  • Examples (these are all resources)
  • People (focus of biographical reference tools)
  • Organizations (focus of organization directories)
  • Events (focus of developing "event gazetteers")
  • Places (focus of gazetteers)
  • Dates
  • Mathematical theorems (focus of mathematical
    encyclopedias)
  • Concepts, ideas
  • Problems and proposed solutions
  • Computer programs (focus of software directories
    or libraries)
  • The reference model should have a more complete
    list and indicate sources dealing with these

26
Issues
  • 1a Content in the overall context of a DL
    reference model
  • 1b Modeling information objects
  • 1c Levels, versions, and relationships
  • 1d Composite information objects / resources
  • 1e Resource identifiers
  • 2 Metadata, including provenance, context, usage
  • 3 Annotation

27
Levels, versions, and relationships
  • Work, manifestation, item (individual copy)
  • Linked through relationships

28
Work
  • Intellectual or artistic entity, as the abstract
    essence or as a text, image, or piece of music.
  • Range
  • A basic story or theme
  • the story of Faust
  • the myth of the Great Flood
  • A text telling the story, such as
  • Goethe's Faust
  • the account of the Great Flood in the Bible
    (original Hebrew)
  • the account of the same myth in another culture
  • A specific version of the account in the Hebrew
    Biblea Latin translation of the account in the
    Hebrew Bible

29
Manifestation
  • A specific rendering of a work by means of a
    graphical image or sound, taken in the abstract
    the idea of such a rendering.
  • Examples
  • The text of Goethe's Faust printed in a
    particular typeface and layoutA performance at
    which the text is recited also renders the text
    but is more properly considered a separate, but
    related, work.
  • A specific score of a given version of Schubert's
    Fifth. A performance of that version of
    Schuberts Fifth also renders the piece of music
    but is considered a separate, but related, work.
  • Also the rendering of a work in the form of
    digital storage that can be transformed to a
    graphical image or sound, again taken as the
    abstract pattern of digital signals.

30
Item, individual copy
  • The embodiment of a manifestation in a physical
    object
  • We can perceive the content of an manifestation
    only through an individual copy of it (unless we
    have memorized the visual expression manifest in
    a manifestation and can conjure it up from
    memory).
  • There are works that have only one manifestation
    of which there is only one copy.

31
Relationships among information objects
  • The story of Faust ltdealsWithgt Pact with the
    devil
  • The story of Faust ltisToldIngt Marlows Faust
  • The story of Faust ltisToldIngt Goethes Faust
  • Goethes Faust ltauthoredBygt Goethe, Johann
    Wolfgang von
  • Goethes Faust lthasManifestationgt R1231
  • R1231 ltpublishedBygt Cotta
  • R1231 lthasDategt 1871
  • R1232 ltisCopyOfgt R1231
  • R1232 ltownedBygt (HRieth, 1896, 1956)
  • R1232 ltownedBygt (DSoergel, 1956, )

32
Hierarchical inheritance
  • Data about a work inherit to all works below it
    along ltisToldIngt, lthasVersiongt etc. Therefore
  • Goethe' Faust ltdealsWithgt Pact with the devil
  • Data about a work inherit to all its
    manifestations. Therefore
  • R1231 ltauthoredBygt Goethe, Johann Wolfgang von
  • Data about a manifestation inherit to all its
    items
  • Hierarchical inheritance increases efficiency
  • More efficient catalog input
  • More efficient catalog storage
  • More efficient representation and reading of
    search results

33
More relationships
  • R271 The man I killed, by Michael Halliday
  • R519 The man I killed, play by Christopher Wern
  • R519 ltisBasedOngt R271
  • R315 Handbook of commercial geography, by Robert
    Chisholm
  • R783 Chisholm's handbook of commercial geography,
    entirely rewritten by L. Dudley Stamp and S.
    Carter Gilmour.
  • R783 ltentirelyRewrittenFromgt R315

34
Relationship to FRBRNotes on Terminology
  • The FRBR distinction between work and expression
    should be rethought. It is unclear and
    consequently poorly understood, and it may not be
    necessary. Just have work.The intuition FRBR
    tries to capture in this distinction is better
    handled through relationships among works as
    defined here.
  • Following FRBR I use the term manifestation.
    Other term edition (in the sense of German
    Ausgabe), but edition also means German Auflage,
    so use of the term edition can be confusing.
  • It would be nice to be able to use graphic
    expression as a synonym for rendering, but to
    avoid any further confusion with FRBR it is best
    not to use the term expression at all.

35
Version control
  • Important, but not elaborated here

36
Issues
  • 1a Content in the overall context of a DL
    reference model
  • 1b Modeling information objects
  • 1c Levels, versions, and relationships
  • 1d Composite information objects / resources
  • 1e Resource identifiers
  • 2 Metadata, including provenance, context, usage
  • 3 Annotation

37
Composite information objects / resources
  • Examples
  • Book divided into chapters, sections, paragraphs,
    words (XML Document Object Model, DOM or
    TEI)Each part can be seen as a separate
    information object
  • Movie with images, soundtrack, close captions,
    script, all coordinated (MPEG-7)
  • A medical record with patient data, test data,
    images, live monitoring data streams, diagnoses,
    drugs prescribed, etc.

38
Composite information objects / resources
  • Abstractly Each component is a separate
    information object, composition expressed through
    relationships
  • In practice
  • Many document models for composite (or compound)
    documents supporting presentation
  • DL needs to allow specification, for each
    document, of the particular document model used

39
Issues
  • 1a Content in the overall context of a DL
    reference model
  • 1b Modeling information objects
  • 1c Levels, versions, and relationships
  • 1d Composite information objects / resources
  • 1e Resource identifiers
  • 2 Metadata, including provenance, context, usage
  • 3 Annotation

40
Identifying information objects
  • 1 Initial definition upon entry into the digital
    library.
  • 2 Definition on the spot
  • ExamplesAnnotate a specific segment of a text
    document or a region of an image or sound
    document orAnchor an annotation to a specific
    location in a document.
  • The segment or anchor is a new information object
    that is included in the original information
    object, and this new information object is linked
    with any of several annotation relationships to a
    new information object created by the user.
  • Related to composite objects. More on this under
    annotation

41
Issues
  • 1a Content in the overall context of a DL
    reference model
  • 1b Modeling information objects
  • 1c Levels, versions, and relationships
  • 1d Composite information objects / resources
  • 1e Resource identifiers
  • 2 Metadata, including provenance, context, usage
  • 3 Annotation

42
Data about information objects
  • Metadata data about information objects if
    used for discovering, interpreting, and using
    information objects
  • Relate information objects to other types of
    resources. Examples
  • InformationObject lthasCreatorgt Actor
  • InformationObject ltdealsWithgt Actor
  • InformationObject ltcontainsTextgt Text (or, more
    specifically Word)
  • Relate a word in a text to the concept that is
    the meaning in which the word is used in this
    particular position.
  • InformationObjectA lthasAbstractgt
    InformationObjectB
  • InformationObjectA lthasCriticalCommentarygt Informa
    tionObjectC
  • InformationObjectD lthasSupportiveCommentarygt Infor
    mationObjectC

43
More on defining metadata
  • The metadata-hood of an information object does
    not reside in the information object, but in its
    relationship to another information object and,
    more specifically, in its use
  • A piece of data
  • is used as metadata
  • if it is used for the purpose of discovering,
    interpreting, and using information objects,
    which then give the ultimate data wanted.
  • The same piece of data may fill the ultimate need
    to of the user in one situation and be used as
    metadata in another situation.

44
Not metadata
  • Data about resources that are not information
    objects are not metadata even if they are similar
    in form.
  • Data about information objects are not always
    used as metadata. For example, using author data
    to count a faculty members publications or
    citation data to compute impact
  • Extensive discussion of what exactly is the
    definition of metadata is not a good use of
    resources. A system should provide the data
    that are useful to a user for whatever purpose
    what each piece of data is called is less
    important.

45
Metadata typologies
  • Metadata (and data in general) can be divided
    into categories from several perspectives, and
    within each perspective there exist several
    approaches. Some examples of how to categorize
    metadata
  • by purposes or use. Since the same unit of
    metadata can be used for several purposes, the
    resulting categories overlap.
  • by source, for example, extracted, assigned by
    cataloger, assigned by user (social tagging),
    from usage tracking
  • by intrinsic characteristics, for example data
    about provenance or about the format of the
    information object

46
Some metadata uses
  • A Learn about information objects and interpret
    them this includes
  • A1 Learn about the identity and characteristics
    of information objects (descriptive metadata)
  • A2 Learn about the history and other features of
    the context of the information object
    (contextual metadata)
  • B Learn how to use an information object,
    including
  • B1 Learn how to gain legal access (access and
    rights metadata)
  • B2 Learn how to gain technical access to the
    information object (what machinery and software
    is needed to access the information object for
    a given purpose, such as assimilation by a
    person or processing by a computer program)
  • C Manage information objects (administrative
    metadata), in particular
  • C1 Manage the preservation of information
    objects (preservation metadata).

47
Usage data
  • Data on usage of resourcesand on usage rights,
    usage history, future use / preservation
    important for discovering, interpreting, and
    using resources as well as managing resources
  • Some of these data can be collected automatically
  • If the resource in question is an information
    object, this kind of data is often used as
    metadata

48
Issues
  • 1a Content in the overall context of a DL
    reference model
  • 1b Modeling information objects
  • 1c Levels, versions, and relationships
  • 1d Composite information objects / resources
  • 1e Resource identifiers
  • 2 Metadata, including provenance, context, usage
  • 3 Annotation

49
Annotation
  • InformationObjectA ltannotatedBygt
    InformationObjectB
  • InformationObjectB may be created on the spot in
    order to annotate A (InformationObjectB and the
    annotation relationship have the same author) or
    B may preexist (the annotation relationship
    between A and B is introduced by a third party)
  • Specific type of annotation expressed by
    specializing the annotatedBy relationship, for
    example
  • InformationObjectA ltcriticizedBygt InformationObjec
    tB
  • InformationObjectA lthasCriticalCommentarygt Informa
    tionObjectC
  • InformationObjectD lthasSupportiveCommentarygt
    InformationObjectC
  • InformationObjectE ltisPartOfSpeechgt PartOfSpeech
  • Annotation-hood is in the relationship, not in
    the information object

50
Annotation
  • Annotation-hood is in the relationship, not in
    the information object
  • There is a wide range of relationship types that
    are called annotations. Linguists think of
    annotations differently than scholars making
    comments on a text.
  • Rather than trying to define exactly what
    annotation means, the reference model should
    include a comprehensive list of relationship
    types that might be considered annotation by
    somebody so that anybody can define their meaning
    of annotation by giving the appropriate subset of
    annotation relationship types.
  • The same thought applies to metadata, discussed
    on a later slide.

51
Special resource types for annotations
  • Some annotations require special types of
    resources.
  • Examples
  • Annotate a text with part-of-speech indications
    annotated resource a one-word fragment of the
    textannotating resource a value from a list of
    parts of speech
  • Annotate a text with meaning for word sense
    disambiguation annotated resource a word or
    phrase in the textannotating resource a value
    from a list of meanings defined in some way
  • Annotation through underlining or other
    marksannotated resource a fragment of text or
    other information objectannotating resource a
    pair (sign, meaning), e.g. (underline,
    important) or (?, check this out) or (X,
    nonsense)
  • The annotated resource and the annotating
    resource may be very short

52
Annotation and metadata
  • Metadata and annotation data overlap, and
    different communities and individuals have
    different definitions of what is included in
    metadata and what is included in annotations.
  • The precise nature of a unit of data about an
    information object is determined by the
    relationship type and the resource that is linked
    to. The interpretation of each type of data is
    in the eye of the beholder.
  • Need an inventory of relationship types (a type
    of ontology)For example, the CIDOC Content
    Reference Model (CIDOC/CRM) is an inventory of
    broad relationship types.
  • In such an inventory, one could indicate who
    considers a given relationship type as usable as
    metadata and/or as belonging to annotation.

53
Take-home message 1
  • The entity-relationship model (E-R model)
    provides the unifying principle for a digital
    library content model
  • The E-R model allows representation of structured
    data of any complexity on a conceptual level.
  • Defining relationships between information
    objects handles
  • Modeling information objects
  • Levels, versions, and relationships
  • Composite information objects / resources
  • Metadata
  • Annotation
  • Many notions are captured better through
    relationships than fine distinctions of entity
    types

54
Take-home message 2
  • Any reference model
  • needs to be abstract and must not commit to any
    particular standard or design decision
  • rather, it must provide a framework for
    specifying the commitments of any particular DL
    (or information system)
  • A reference model provides a systematic
    framework for description and analysis, not a
    prescription

55
  • Dagobert Soergel
  • dsoergel at umd.edu
  • www.dsoergel.com

56
Omitted slides
57
Construction process
  • Need to be sure all applicable concepts from
    various sources such as the 5S model and FRBR/CRM
    are included, either in the skeleton model or in
    a list of values / choices, as appropriate
  • There is still work to be done to pull reference
    model subject matter out of the reference
    architecture document, and vice versa.

58
Construction process
  • We should have an online version of the reference
    model document with the following properties
  • Links to discussion of issues and underlying
    rationale, capturing some of the discussion in
    the group.
  • Links from the reference model to the appropriate
    section of the reference architecture
  • The Wiki page may not quite do it.

59
  • There are two ways to communicate such
    statements.
  • One learns what one wants to know about the
    resource in focus immediately from a relationship
    instance. Hamlet ltauthoredBygt Shakespeare
    The drug treatment frame on TaxoteerThe
    actual data of interest are represented in a
    database that captures these statements
    (relationship instances), such as
  • a collection of Prolog statements
  • a relational database
  • an object-oriented database
  • One needs to consult an information object that
    is related to the resource in focus.Shakespeare
    schrieb den Hamlet im Jahre 1625Hamlet wurde
    von Shakespeare verfasstTaxoteer ist effektiv
    in der Behandlung von Krebsen die keine
    Rezeptoren fuer Estrogen haben. In aelteren
    Personen liegt die Erfolgsrate bei 50

60
  • The DL designer must decide how to identify the
    new resource that is a part of an existing
    resource and the new text object created by the
    annotator and how to store the link between
    these two information objects

61
Identifying information objectsArchitecture
issues
  • Definition on the spot, options
  • (1) use completely independent identifiers
    and store the relationship explicitly
  • (2) use dependent identifiers
  • The part of a document can be identified by
    document identifier followed by information that
    uniquely identifies the part. The part relation
    is implied by the structure of the identifier.
  • The annotation information object could be
    identified by the identifier of the resource
    being annotated followed by a short string that
    identifies the nth annotation of this resource
    (like a footnote). The relationship between the
    resource and the resource annotating it would be
    implied by the identifier (however, the specific
    type of the annotation relationship would not be
    captured this way). The resource that annotates
    still can be referenced from any other context.
  • Implicit representationEmbedded annotations The
    annotation is embedded in the document, linked to
    a point in a text that is identified only by the
    place of the annotation. This could be converted
    to an explicit representation.

62
Some metadata uses
  • This is a specialization of the functions of data
    given above
  • A learn about other data, that is, information
    objects, and understand them this includes
  • A1 learn about the identity and characteristics
    of information objects (descriptive metadata)
  • A2 learn about the history and other features of
    the context of the information object
    (contextual metadata)
  • B learn how to use an information object (source
    of data), including
  • B1 learn how to gain legal access to the
    information object (access and use rights
    metadata)
  • B2 learn how to gain technical access to the
    information object (what machinery and software
    is needed to access the information object for
    a given purpose, such as assimilation by a person
    or processing by a computer program)
  • C manage information objects (administrative
    metadata), in particular
  • C1 manage the preservation of information
    objects (preservation metadata).

63
Metadata in the reference model
  • When describing a DL using the reference model,
    need to be able to indicate any typology of
    metadata used in the DL
Write a Comment
User Comments (0)
About PowerShow.com