Faceted Semantic Subject Annotation System - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Faceted Semantic Subject Annotation System

Description:

http://www.peterme.com/archives/00000063.html ... Harvester. IRs and. Domain DLs. Word Net. Lexical Database. POPSI Schema in SKOS/RDF. SVM ... – PowerPoint PPT presentation

Number of Views:2517
Avg rating:3.0/5.0
Slides: 34
Provided by: sss52
Category:

less

Transcript and Presenter's Notes

Title: Faceted Semantic Subject Annotation System


1
Faceted Semantic Subject Annotation System
  • Anand Kumar Pandey
  • Junior Research Fellow
  • Documentation Research Training Centre
  • Indian Statistical Institute, Bangalore, India

2
(No Transcript)
3
Commentary
  • Faceted classification is one of the most
    powerful, yet least understood, methods of
    organizing information.
  • Peter Merholz Innovations in Classification.
    http//www.peterme.com/archives/00000063.html

I personally find the term facet to be
confusing. I prefer the terms attributes and
attribute values. These terms are used in both
the database world and the artificial
intelligence world, to describe a very similar
functionality, sometimes the exact same
functionality. ReSIGIA-l Faceted approach
applied to content From Donna M.
FritzscheDate Fri Nov 14 2003 - 135423 EST
http//www.info-arch.org/lists/si
gia-l/0311/0161.html
My complaint is that there is a lot of talk about
facets, but little of any substance. Most of it
won't help you build your own faceted
classification scheme. It amounts to saying the
grass is greener on the other (faceted) side, but
fails to give you a map explaining how to get
there and what obstacles you'll face along the
way. And the academic literature doesn't help
much either. It's too dense and I can't recommend
it to the practitioner (not the stuff I've seen).
May 27, 2004 Gordon Luk http//www.getluky.net/ar
chives/000052.html making reference to Christina
Wodtkes posting on her blog Elegant Hack
http//www.eleganthack.com/MT/mt-tb.cgi/2
Faceted classification serves up multiplepure
classification schemes rather than a single
motley Taxonomy. Rosenfeld, L Morville, P.
(2002). Information Architecture for the World
Wide Web. 2nd Ed. Cambridge, MA OReilly.
4
Overview
  • Present state of document annotation
  • Alternative approach
  • Discussion about Facets
  • Faceted Subject Indexing-POPSI
  • Elementary Categorize of POPSI
  • SKOS (Simple Knowledge Organization System)
  • Model of the proposed Faceted Semantic Annotation
    System
  • Conclusion

5
Present state of document annotation
  • In present scenario, the subject metadata are
    assigned in order to express the subject of the
    document.
  • Limitation But they are not always in context.
  • For ex. In KIM Named Entities (NEs) are
    identified and relationships are established
  • Limitation The context may change as per the
    use of the NEs in a document
  • Plant in agriculture
  • Steel Plant

6
Alternative Approach
  • By representing the basic constituent elements of
    the subject content. In other words, by providing
    the context to keywords.

7
Efficient information retrieval language
  • Which should be capable of
  • Dealing with the complex structure of knowledge
  • Provide for the sequencing of a set of selected
    terms according to probable relevance to a
    particular topic
  • Contextualizing the concept
  • Giving aid to the searcher in choosing the right
    keywords for searching
  • Mixing the searching and browsing facilities to
    work in co-ordination

Vickery, B.C. (2006). Structure and Function in
Retrieval Language, Journal of Documentation
,Vol. 62 No. 1, 2006 pp. 7-20
8
Why Faceted Subject Indexing language ?
  • It uses the Faceted Classification structure
    which
  • Uses logical structure to organize
  • Uses a standard set of categories to analyze the
    concepts and these categories are not locked but
    are left free to combine with each other in
    fullest freedom
  • Breaks free from the restriction of traditional
    classification to the hierarchical, genus-species
    relations. By combining terms in compound
    subjects it introduces new logical relations
    between them, thus better reflecting the
    complexity of knowledge

9
What is the Facet?
  • A generic term used to denote any component- be
    it a basic subject or an isolate- of a Compound
    subject...Facets inhere in the subjects
    themselves, whether we sense them or not. S.R
    Ranganathan.
  • A homogeneous group or category derived according
    to the principles of facet analysis

10
What is the Facet?
  • Near synonyms
  • Small components of larger entities/units,
  • Properties, Attributes, Characteristics,
    category,
  • attribute, class, group, concept, and dimension
  • Facets are flat faces on diamond
  • which reflect the underlying
  • symmetry of the crystal
  • structure.

11
Quick recipe for building faceted Classification
  • Define the subject field What entities are of
    interest to the intended user of the system
  • Formulate Facets Sort the terms and arrange them
    in homogeneous groups known as Facets
  • Structure each facets Following the postulates
    and principles given by Ranganathan.
  • Arrangement of the facets

12
Buildings
13
Facet Analysis
  • Fundamental concepts are analyzed and grouped
    together as facets (Following the principles and
    postulates give by Ranganathan)
  • Hunter, E. (2002) Classification made simple.
    Ashgate
  • Building Facets
  • Location
  • Composition
  • Purpose
  • Date/Period constructed
  • Performance
  • Style
  • Associated persons
  • ETC. . .

14
What is the Faceted Subject Indexing?
  • Subject indexing is the technique which indicates
    the location of the resources according to their
    specific subject and it has two-fold job-
  • Translating the name of the subject of the
    document (NL) into a preferred system of
    artificial language
  • Translating the users queries (NL) to the
    systems language
  • Faceted Subject Indexing is the system which uses
    Facet Analytico Theory in order to bring the
    context to the indexing system.

15
Postulate based Permuted Subject Indexing (POPSI)
  • It is a generalized model for the representation
    of the thought content of information resource as
    well as to model a particular subject domain.
  • It consists of
  • Four elementary categories (Fundamental
    Categories)
  • Modifiers

Bhattacharyya, G. (1979), "POPSI its
fundamentals and procedure based on a general
theory of subject indexing languages", Library
Science with a Slant to Documentation, Vol. 16
No. 1, March, pp. 1-34.
16
Elementary Categories of POPSI
  • Discipline
  • It includes the conventional field of studies
    or any aggregate of such fields
  • Entity
  • It includes any manifestation which is the
    core of the subject, be it, concrete or abstract
    as contrasted with their properties or action
    performed on or by them.

17
Elementary Categories of POPSI
  • Action
  • It includes the manifestation denoting the
    concept doing. It includes the processes and
    steps of doing. The action may be self action or
    external action.
  • Property
  • It includes the manifestation denoting the
    concept of attribute.

18
Modifiers in POPSI--
  • Are divided in two categories
  • Dependent Modifiers
  • Independent Modifiers / Common Isolate
  • Dependent modifiers are used in conjunction with
    the elementary categories so that they can
    sharpen the particular facet.
  • For ex. Romantic in Romantic Love
  • Infections in Infectious Disease

19
Common Modifiers/Common Isolates
  • These modifiers have the capability of modifying
    or
  • sharpening any of the elementary categories.
    Some
  • of them are
  • Space Modifiers
  • Time Modifiers
  • Language Modifiers
  • Form Modifiersand so on

20
Taking Care of the Complex Subjects
  • Phase Relations
  • General Relation
  • Bias Relation
  • Comparison
  • Similarity
  • Difference
  • Application Relation
  • Influence relation

21
EXAMPLE 1
  • In Medical Science, Treatment of Infectious
    Disease of Lungs.

Discipline Medical Science Entity
Lung Property Infectious Disease Action
Treatment
22
EXAMPLE 2
  • In Medical Science, A Report on the Treatment of
    Infectious Disease of Lungs in India during
    1950-1965.
  • Discipline Medical Science
  • Entity Lung
  • Property Infectious Disease
  • Action Treatment
  • Space Modifier India
  • Time Modifier 1950-1960
  • Form Modifier Report

23
Expressing the POPSI in SKOS
  • SKOS (Simple Knowledge organization System)-
  • claims to provide a simple, machine-understandable
    , representation framework for Knowledge
    Organisation Systems (KOS)
  • has the flexibility and extensibility to cope
    with the variation found in KOS idioms
  • is fully capable of supporting the publication
    and use of KOS within a decentralised,
    distributed, information environment such as the
    world wide (semantic) web.

http//www.w3.org/2004/02/skos/
24
SKOS cont..
  • In scope
  • controlled vocabularies
  • thesauri
  • taxonomies
  • classification schemes
  • subject heading systems
  • Grey area
  • terminologies (sensu ISO TC37 SC4)
  • wordnets
  • lexical databases
  • synonym rings
  • glossaries
  • dictionaries
  • ontologies
  • folksonomies

25
POPSI Classes Properties (1/2)
ElementaryCategory Discipline Entity Property
Action
Property Classes
form time subPropertyOf DAML/Time
(TemporalEntity) place subPropertyOf DAML/Place
phaseRelation general biasedBy influenceBy
comparisonWith similarityWith
differenceWith application tool
26
POPSI Classes Properties (2/2)
-ElementaryCategory -Discipline
-Entity -Property -Action
-Form -Environment -place
-Time -modifier -type
-discipline (hasDiscipline, isDisciplineOf)
-entity (hasEntity, isEntityOf) -property
(hasProperty, isPropertyOf)
-action(hasAction, actionOn)
-phaseRelation -general -bias
(biased, biasing) -influence (influenced,
influencing) -comparison (comparedWith)
-difference (differencedBy, differencing)
-application -tool
27
Facetizing Concepts
  • (Discipline) Medicine,
  • (Entity) Human body,
  • (property of Entity) disease,
  • (action on property) treatment,
  • (type of action) radiation therapy,
  • (Entity of action) X-ray,
  • (method of action) treatment using Rotation
    technique,
  • (action of action) determination
  • (application of action) depth dose,
  • (tool of action) Ionized packet chamber

28
POPSI in RDF
lt?xml version"1.0"?gt ltrdfRDF xmllang"en"
xmlnspopsi"http//drtc.isibang.ac.in/guha/popsi
/popsi-skos" xmlnsrdf"http//www.w3.org/1999/02
/22-rdf-syntax-ns" xmlnsrdfs"http//www.w3.org/
2000/01/rdf-schema" xmlnsskos"http//www.w3.org
/2004/02/skos/core"gt ltrdfDescription
rdfabout"http//hdl.net/1849/234"gt
ltpopsiElementaryCategorygt ltrdfsOrderedCollec
tiongt ltpopsiDisciplinegtMedicinelt/popsiDisci
plinegt ltpopsiEntitygtHuman
Bodylt/popsiEntitygt ltpopsiPropertygtDiseaselt/
popsiPropertygt ltpopsihasActiongtTreatmentlt/
popsihasActiongt ltpopsitypegtRadiation
Therapylt/popsitypegt ltpopsihasEntitygtX-raylt/po
psihasEntitygt ltpopsiapplicationgtRotat
ion Techniquelt/popsiapplicationgt
ltpopsitoolgtIonized packet chamberlt/popsitoolgt
lt/rdfsOrderedCollectiongt
lt/popsiElementaryCategorygt lt/rdfDescriptiongt lt
/rdfRDFgt
29
Graphical Representation
http//hdl.net/1849/234
popsiEntity
popsiDiscipline
Human body
Medicine
popsiProperty
Disease
popsihasAction
treatment
popsitypeOf
Radiation Therapy
popsitool
popsihasEntity
popsiapplication
Ionized Packet Chamber
Rotation Technique
X-Ray
30
Faceted Semantic Annotation System
  • It will consist of two parts
  • The Classaurus
  • It will be arranged in two parts-
  • Hierarchical Display of all the facets arranged
    in elementary categories and modifier classes
  • Alphabetical listing of the keywords (word
    Clouds)
  • The Associative index
  • It will be inverted index of classaurus
    facets.

31
faceted Semantic Annotation System
32
Further Research
  • Better algorithm and model for automatic text
    categorization
  • Inclusion of the Faceted Semantic Subject
    Annotation model in existing Annotation Systems
  • Formalization of the process of Facet Analysis
  • Bringing the Associative effect in index

33
Thank You
Write a Comment
User Comments (0)
About PowerShow.com