Title: Faceted Semantic Subject Annotation System
1Faceted Semantic Subject Annotation System
- Anand Kumar Pandey
- Junior Research Fellow
- Documentation Research Training Centre
- Indian Statistical Institute, Bangalore, India
2(No Transcript)
3Commentary
- Faceted classification is one of the most
powerful, yet least understood, methods of
organizing information. - Peter Merholz Innovations in Classification.
http//www.peterme.com/archives/00000063.html
I personally find the term facet to be
confusing. I prefer the terms attributes and
attribute values. These terms are used in both
the database world and the artificial
intelligence world, to describe a very similar
functionality, sometimes the exact same
functionality. ReSIGIA-l Faceted approach
applied to content From Donna M.
FritzscheDate Fri Nov 14 2003 - 135423 EST
http//www.info-arch.org/lists/si
gia-l/0311/0161.html
My complaint is that there is a lot of talk about
facets, but little of any substance. Most of it
won't help you build your own faceted
classification scheme. It amounts to saying the
grass is greener on the other (faceted) side, but
fails to give you a map explaining how to get
there and what obstacles you'll face along the
way. And the academic literature doesn't help
much either. It's too dense and I can't recommend
it to the practitioner (not the stuff I've seen).
May 27, 2004 Gordon Luk http//www.getluky.net/ar
chives/000052.html making reference to Christina
Wodtkes posting on her blog Elegant Hack
http//www.eleganthack.com/MT/mt-tb.cgi/2
Faceted classification serves up multiplepure
classification schemes rather than a single
motley Taxonomy. Rosenfeld, L Morville, P.
(2002). Information Architecture for the World
Wide Web. 2nd Ed. Cambridge, MA OReilly.
4Overview
- Present state of document annotation
- Alternative approach
- Discussion about Facets
- Faceted Subject Indexing-POPSI
- Elementary Categorize of POPSI
- SKOS (Simple Knowledge Organization System)
- Model of the proposed Faceted Semantic Annotation
System - Conclusion
5Present state of document annotation
- In present scenario, the subject metadata are
assigned in order to express the subject of the
document. - Limitation But they are not always in context.
- For ex. In KIM Named Entities (NEs) are
identified and relationships are established - Limitation The context may change as per the
use of the NEs in a document - Plant in agriculture
- Steel Plant
6Alternative Approach
- By representing the basic constituent elements of
the subject content. In other words, by providing
the context to keywords.
7Efficient information retrieval language
- Which should be capable of
- Dealing with the complex structure of knowledge
- Provide for the sequencing of a set of selected
terms according to probable relevance to a
particular topic - Contextualizing the concept
- Giving aid to the searcher in choosing the right
keywords for searching - Mixing the searching and browsing facilities to
work in co-ordination
Vickery, B.C. (2006). Structure and Function in
Retrieval Language, Journal of Documentation
,Vol. 62 No. 1, 2006 pp. 7-20
8Why Faceted Subject Indexing language ?
- It uses the Faceted Classification structure
which - Uses logical structure to organize
- Uses a standard set of categories to analyze the
concepts and these categories are not locked but
are left free to combine with each other in
fullest freedom - Breaks free from the restriction of traditional
classification to the hierarchical, genus-species
relations. By combining terms in compound
subjects it introduces new logical relations
between them, thus better reflecting the
complexity of knowledge
9What is the Facet?
- A generic term used to denote any component- be
it a basic subject or an isolate- of a Compound
subject...Facets inhere in the subjects
themselves, whether we sense them or not. S.R
Ranganathan. - A homogeneous group or category derived according
to the principles of facet analysis
10What is the Facet?
- Near synonyms
- Small components of larger entities/units,
- Properties, Attributes, Characteristics,
category, - attribute, class, group, concept, and dimension
- Facets are flat faces on diamond
- which reflect the underlying
- symmetry of the crystal
- structure.
11Quick recipe for building faceted Classification
- Define the subject field What entities are of
interest to the intended user of the system - Formulate Facets Sort the terms and arrange them
in homogeneous groups known as Facets - Structure each facets Following the postulates
and principles given by Ranganathan. - Arrangement of the facets
12 Buildings
13Facet Analysis
- Fundamental concepts are analyzed and grouped
together as facets (Following the principles and
postulates give by Ranganathan) - Hunter, E. (2002) Classification made simple.
Ashgate - Building Facets
- Location
- Composition
- Purpose
- Date/Period constructed
- Performance
- Style
- Associated persons
- ETC. . .
14What is the Faceted Subject Indexing?
- Subject indexing is the technique which indicates
the location of the resources according to their
specific subject and it has two-fold job- - Translating the name of the subject of the
document (NL) into a preferred system of
artificial language - Translating the users queries (NL) to the
systems language - Faceted Subject Indexing is the system which uses
Facet Analytico Theory in order to bring the
context to the indexing system.
15Postulate based Permuted Subject Indexing (POPSI)
- It is a generalized model for the representation
of the thought content of information resource as
well as to model a particular subject domain. - It consists of
- Four elementary categories (Fundamental
Categories) - Modifiers
Bhattacharyya, G. (1979), "POPSI its
fundamentals and procedure based on a general
theory of subject indexing languages", Library
Science with a Slant to Documentation, Vol. 16
No. 1, March, pp. 1-34.
16Elementary Categories of POPSI
- Discipline
- It includes the conventional field of studies
or any aggregate of such fields - Entity
- It includes any manifestation which is the
core of the subject, be it, concrete or abstract
as contrasted with their properties or action
performed on or by them.
17Elementary Categories of POPSI
- Action
- It includes the manifestation denoting the
concept doing. It includes the processes and
steps of doing. The action may be self action or
external action. - Property
- It includes the manifestation denoting the
concept of attribute.
18Modifiers in POPSI--
- Are divided in two categories
- Dependent Modifiers
- Independent Modifiers / Common Isolate
- Dependent modifiers are used in conjunction with
the elementary categories so that they can
sharpen the particular facet. - For ex. Romantic in Romantic Love
- Infections in Infectious Disease
19Common Modifiers/Common Isolates
- These modifiers have the capability of modifying
or - sharpening any of the elementary categories.
Some - of them are
- Space Modifiers
- Time Modifiers
- Language Modifiers
- Form Modifiersand so on
20Taking Care of the Complex Subjects
- Phase Relations
- General Relation
- Bias Relation
- Comparison
- Similarity
- Difference
- Application Relation
- Influence relation
21EXAMPLE 1
- In Medical Science, Treatment of Infectious
Disease of Lungs.
Discipline Medical Science Entity
Lung Property Infectious Disease Action
Treatment
22EXAMPLE 2
- In Medical Science, A Report on the Treatment of
Infectious Disease of Lungs in India during
1950-1965. - Discipline Medical Science
- Entity Lung
- Property Infectious Disease
- Action Treatment
- Space Modifier India
- Time Modifier 1950-1960
- Form Modifier Report
23Expressing the POPSI in SKOS
- SKOS (Simple Knowledge organization System)-
- claims to provide a simple, machine-understandable
, representation framework for Knowledge
Organisation Systems (KOS) - has the flexibility and extensibility to cope
with the variation found in KOS idioms - is fully capable of supporting the publication
and use of KOS within a decentralised,
distributed, information environment such as the
world wide (semantic) web.
http//www.w3.org/2004/02/skos/
24SKOS cont..
- In scope
- controlled vocabularies
- thesauri
- taxonomies
- classification schemes
- subject heading systems
- Grey area
- terminologies (sensu ISO TC37 SC4)
- wordnets
- lexical databases
- synonym rings
- glossaries
- dictionaries
- ontologies
- folksonomies
25POPSI Classes Properties (1/2)
ElementaryCategory Discipline Entity Property
Action
Property Classes
form time subPropertyOf DAML/Time
(TemporalEntity) place subPropertyOf DAML/Place
phaseRelation general biasedBy influenceBy
comparisonWith similarityWith
differenceWith application tool
26POPSI Classes Properties (2/2)
-ElementaryCategory -Discipline
-Entity -Property -Action
-Form -Environment -place
-Time -modifier -type
-discipline (hasDiscipline, isDisciplineOf)
-entity (hasEntity, isEntityOf) -property
(hasProperty, isPropertyOf)
-action(hasAction, actionOn)
-phaseRelation -general -bias
(biased, biasing) -influence (influenced,
influencing) -comparison (comparedWith)
-difference (differencedBy, differencing)
-application -tool
27Facetizing Concepts
- (Discipline) Medicine,
- (Entity) Human body,
- (property of Entity) disease,
- (action on property) treatment,
- (type of action) radiation therapy,
- (Entity of action) X-ray,
- (method of action) treatment using Rotation
technique, - (action of action) determination
- (application of action) depth dose,
- (tool of action) Ionized packet chamber
28POPSI in RDF
lt?xml version"1.0"?gt ltrdfRDF xmllang"en"
xmlnspopsi"http//drtc.isibang.ac.in/guha/popsi
/popsi-skos" xmlnsrdf"http//www.w3.org/1999/02
/22-rdf-syntax-ns" xmlnsrdfs"http//www.w3.org/
2000/01/rdf-schema" xmlnsskos"http//www.w3.org
/2004/02/skos/core"gt ltrdfDescription
rdfabout"http//hdl.net/1849/234"gt
ltpopsiElementaryCategorygt ltrdfsOrderedCollec
tiongt ltpopsiDisciplinegtMedicinelt/popsiDisci
plinegt ltpopsiEntitygtHuman
Bodylt/popsiEntitygt ltpopsiPropertygtDiseaselt/
popsiPropertygt ltpopsihasActiongtTreatmentlt/
popsihasActiongt ltpopsitypegtRadiation
Therapylt/popsitypegt ltpopsihasEntitygtX-raylt/po
psihasEntitygt ltpopsiapplicationgtRotat
ion Techniquelt/popsiapplicationgt
ltpopsitoolgtIonized packet chamberlt/popsitoolgt
lt/rdfsOrderedCollectiongt
lt/popsiElementaryCategorygt lt/rdfDescriptiongt lt
/rdfRDFgt
29Graphical Representation
http//hdl.net/1849/234
popsiEntity
popsiDiscipline
Human body
Medicine
popsiProperty
Disease
popsihasAction
treatment
popsitypeOf
Radiation Therapy
popsitool
popsihasEntity
popsiapplication
Ionized Packet Chamber
Rotation Technique
X-Ray
30Faceted Semantic Annotation System
- It will consist of two parts
- The Classaurus
- It will be arranged in two parts-
- Hierarchical Display of all the facets arranged
in elementary categories and modifier classes - Alphabetical listing of the keywords (word
Clouds) - The Associative index
- It will be inverted index of classaurus
facets.
31 faceted Semantic Annotation System
32Further Research
- Better algorithm and model for automatic text
categorization - Inclusion of the Faceted Semantic Subject
Annotation model in existing Annotation Systems - Formalization of the process of Facet Analysis
- Bringing the Associative effect in index
33Thank You