Title: Controlled vocabularies as Linked Data on the Web
1Controlled vocabularies as Linked Data on the Web
- Rebecca Guenther
- Network Development MARC Standards Office,
- Library of Congress
- rgue_at_loc.gov
Linked Data program July 13, 2009
2Outline of presentation
- Types of controlled vocabularies
- Encoding formats for controlled vocabularies
- What is SKOS?
- id.loc.gov vocabulary services
- Example of concept scheme in SKOS ISO 639-2
- Next steps
3Credits
- Ed Summers, Office of Strategic Initiatives
leading developer and creator of LCSH in SKOS - Clay Redding, Network Development MARC
Standards Office, for developing web service and
work on other controlled vocabularies
4Why establish controlled vocabularies?
- Control values that occur in metadata
- Reduce ambiguity
- Control synonyms
- Make documentation available for reuse
- Test and validate terms
- Establish formal relationships among values where
appropriate
5Types of Controlled Vocabularies used in metadata
standards
- Lists of enumerated values
- Code lists (e.g. language, country)
- Taxonomies
- Formal Thesauri
- Locally controlled enumerated lists
6Enumerated lists
- Simple list of terms used in a pull-down menu or
Web site pick list - Values enumerated in an XML schema
- Little additional information or structure about
each value - Examples
- Enumerated value MD5 for METS CHECKSUMTYPE
- Enumerated value born digital in MODS
digitalOrigin - Code and value from a MARC 21 fixed field, e.g.
code e in Leader/06 is cartographic material
7Code lists
- Some established as ISO standards and used
worldwide in many communities for many purposes - The standard generally standardizes the code, not
a particular name for it - Codes are used as identifiers
- Some examples
- ISO 639 family (language codes)
- MARC relator codes
- MARC country codes
- ISO 3166 country codes
8Thesauri
- A thesaurus is a controlled vocabulary with
multiple types of relationships - Example
- Rice
- UF Paddy
- BT Cereals
- BT Plant products
- NT Brown rice
- RT Rice straw
-
9Standards maintained at LC contain controlled
vocabularies
- LCSH/NAF
- Thesaurus of Graphic Materials
- ISO 639-2 (language codes)
- MARC (including code lists)
- MODS
- METS
- PREMIS
- MIX (XML schema for NISO Z39.87 Technical
metadata for digital still images) - and some others
10Representing information about controlled
vocabulary values
- Data elements in metadata formats, e.g. MARC
Authority format - XML schemas (sometimes as enumeration values)
- RDF/XML and RDFS (Resource Description Framework)
- SKOS
- MADS (Metadata Authority Description Schema)
11About SKOS
- Simple Knowledge Organization System
- RDF application used to express knowledge
organization systems such as classifications,
thesauri, taxonomies, and the concepts within - Allows distributed, decentralized management of
KOS through Linked Data-inspired application. - All concepts and schemes require a URI
12The SKOS data model (Classes)
- ConceptSchemes (e.g., published vocabularies,
thesauri, code lists, etc.) - Concepts (individual entries or terms within the
larger vocabulary) - Collections (logical groupings of Concepts)
12
13SKOS concepts
- Labeling properties prefLabel, altLabel,
hiddenLabel, notation - Annotation properties note, historyNote,
scopeNote, changeNote, editorialNote, example,
definition - Associative properties broader, narrower,
related, broadMatch, narrowMatch, closeMatch,
exactMatch, minorMatch, majorMatch (match
properties go to Concepts in external
ConceptSchemes)
14Advantages to using SKOS
- SKOS has a defined element set which is
particularly relevant for controlled vocabularies - Relationships between entries in a concept scheme
can be expressed (broader, narrower, etc.) - Relationships between entries in different
concept schemes can be expressed (exactMatch,
related) - Having a dereferencable URI for concepts and
their concept schemes enhances the ability to
provide web services for consumers of these
standards
15Reasons for developing a web service for
vocabularies
- Facilitate development and maintenance process
for vocabularies - Make controlled lists openly available
- Provide comprehensive information about
controlled values - Experiment with semantic web technologies and
linked data - Expose vocabularies to wider communities
16Introducing id.loc.gov
- Library of Congress Authorities Vocabularies
service http//id.loc.gov - Allows both human-oriented and programmatic
access to LC-promulgated authorities and
vocabularies. - First offering is Library of Congress Subject
Headings, but more to come e.g. Thesaurus of
Graphic Materials, ISO 639-2, MARC code lists,
etc.
17Introducing id.loc.gov
- Offers bulk data downloads in several RDF
serializations (likely more to come) - Goals
- Convey a clear policy regarding access, usage,
distribution - Provide continuous updates to keep the data sets
fresh - Only serves data values authority and vocabulary
data, not bibliographic
18Some features of id.loc.gov
- Provides resolvability by assigning RESTful URIs.
Each vocabulary and data value within it
possesses a resolvable URI - Known-label searches use when you know the label
but not the identifier (e.g. LCCN)
http//id.loc.gov/authorities/label/orchids,
http//id.loc.gov/authorities/label/orchidaceae - Visualizations
- Default serialization is RDFa XHTML, which can
be transformed by RDFa tools - Influenced by the Linked Data movement
implements SKOS, REST, and HTTP content
negotiation
19Example ISO 639-2 vocabulary
- One in the family of ISO 639 language coding
standards - Has a close relationship with other language
coding standards (ISO 639-1 and -3, MARC) - LC is maintenance agency
- The standard is the CODE, not the language name
multiple names are given
20ISO 639-2 language code in SKOS
ltrdfDescription rdfabout "http//www.loc.gov/st
andards/registry/vocabulary/iso639-2/por"gt
ltrdftype rdfresource"http//www.
w3.org/2008/05/skos Concept"/gt
ltskosprefLabel xmllang"x-notation"gtporlt/skos
prefLabelgt ltskosaltLabel xmllang"en-Latn"gtP
ortugueselt/skosaltLabelgt ltskosaltLabel
xmllang"fr-Latn"gtportugaislt/skosaltLabelgt
ltskosnotation rdfdatatype"xsstring"gtporlt/sk
osnotationgt ltskosdefinition
xmllang"en-Latn"gtThis Concept has not yet been
defined.lt/skosdefinitiongt ltskosinScheme
rdfresource"http//www.loc.gov/standards/regist
ry/vocabulary/iso639-2"/gt ltvsterm_statusgtstable
lt/vsterm_statusgt ltskoshistoryNote
rdfdatatype"xsdateTime"gt2006-07-19T084154.000
- 0500lt/skoshistoryNotegt ltskosexactMatch
rdfresource "http//www.loc.gov/standards/regis
try/vocabulary/iso639-1/pt"/gt ltskosexactMatch
rdfresource "http//www.loc.gov/standards/regis
try/vocabulary/languages/por"/gt
ltskoschangeNote rdfdatatype"xsdateTime"gt2008
-07- 09T134905.321-0400lt/skoschangeNotegt lt/rdf
Descriptiongt
21(No Transcript)
22Next steps
- Advocacy, user feedback etc. for LCSH
- Implement update mechanism for processing changes
processed from LC CDS - Expand system to allow more vocabularies and
Linked Data relationships - Name authorities
-
23Next steps
- MADS OWL Schema to enable identification of
facets within name and subject authorities
Aeronautics--Soviet UnionHistory - Develop mechanisms to output all public
documentation for controlled vocabularies
already working for ISO 639-5 (master data is
SKOS) - http//www.loc.gov/standards/iso639-5/