Title: Proposals for a new flexible and extensible XML-model for exchange of research information
1Proposals for a new flexible and extensible
XML-model for exchange of research information
- By
- Jens Vindvad, Jens.Vindvad_at_rbt.no
- National Office for Research Documentation,
Academic and Special Libraries, Norway - Erlend Øverby, erlend.overby_at_conduct.no
- Conduct AS, Oslo, Norway
2Content of presentation
- Assumptions, Objectives, Difference in structure
and mapping between structures. - Validation and MicroSchema
- Working model (vocabulary, internal structure)
- Namespace
- Testing of model
33 assumptions and 1 observation
- Internet is the driving force and preferred
medium for international information exchange. - Today and in the near future XML is the basic
Internet standard for information exchange. - Research Information Systems are based on
relation database technology for storing. - Validation of data exchange against structure and
allowed values are often scarified.
4Objectives
- We want to exchange Research Information between
different CRIS-systems, and other systems as
well, by use of Internet and XML-technology. - We want to have the possibility to validate the
exchanged data according to defined structure and
allowed values. - Agreeing upon a common XML exchange model, which
can be used for information exchange, can help to
achieve this objective.
5How should the exchange model work
- To be able to exchange information, the existing
data needs to be transformed/mapped into the
structure of the exchange model. To receive
information, data in the exchange model has to be
transformed/mapped into the receivers data
model. This will ease information exchange when
sender and receiver do not share or have the same
data model, and will also ease data exchange
between different communities.
6Different structures for CRIS and the exchange
model
- The CERIF standard is based on relational
database technology, which is table oriented. - The exchange model is based on the XML standard
by W3C, which has a hierarchical tree structure.
7Exchange between table structure and hierarchical
tree structure illustrated
Table structure Relational databases
Hierarchical tree structure XML-document
8Information mapping between the exchange model
and CRIS (I)
- When mapping information between different
structures we do not have a one to one solution.
We have to make some choices when defining the
exchange model, and no final answer or perfect
solution exists. - When building the exchange model we have used the
following guidelines - Data should be placed in elements, not attributes.
9Information mapping between the exchange model
and CRIS (II)
- Characteristics and properties of data should be
placed in attributes. - Be as explicit at possible, not implicit. The
implicit path can lead to situations where users
of the model have to make assumptions about how
to handle the information in the model.
10The validation problem
- Normally two alternatives exist to describe and
define the information structure or model in an
XML document, the first is a DTD (ISO 8879) and
the second is an XML-schema. Both these
approaches currently have the disadvantages that
in order to validate and check the structure of
the information, description of the whole
structure and all its possibilities and
constraints must be in existence in one large and
inflexible model, making it harder to establish
an efficient validation of data exchange between
different systems
11MicroSchema
- The idea of a MicroSchema is that it should only
describe a very small piece of information, and
only such information as is relevant to the
specific description. Information that is not
relevant to the specific context is described in
another MicroSchema. - To be able to express the relevance and the
connection between the MicroSchemas, we need to
develop a standard method of enhancing the schema
specification in order to address the valid
elements in the specific context. Using
namespaces, introducing the term
"Allow-schema-namespaces", will do this.
12Working model
- A working model limited to documentation produced
by researchers has been build. - Successful communication requires common wordings
and definitions to achieve this a vocabulary
has been defined. - With vocabulary and guidelines for building an
exchange model in place an internal structure has
been established. - Based on vocabulary and internal structure the
XML exchange model is proposed in terms of
MicroSchemas and Namespaces.
13Vocabulary output and results
- The collection of all types of information
produced by the researchers is called output.
Outputs are divided into four subgroups results,
communication, documentation and art. - With the exception of art, results are taken to
mean the results of research produced by the
researcher in person. Examples of results are
publications, patents and products.
14Vocabulary - communication
- By communication we want to label the forms of
communication that researchers use in their work.
Researchers often need to or whish to discuss
their ideas and views their form for
communication is not a result of their work, but
represents interesting and important steps in the
process of producing results. Examples of
communications are conference presentations,
workshops, broadcasting and interview in the
press.
15Vocabulary - documentation
- A researcher has to carry out administrative
tasks and produce documentation, which cannot be
classified as results or forms of communication.
This can be pure administration or high level of
professional work. Examples are reports to
founding institution, computer programs,
manuscripts, (thesis).
16Vocabulary - art
- Art is not necessary output of a researchs work
but it can be. Art can be seen as a result in
itself, a form of communication or type of
documentation, or all of these. Art needs and
deserves a classification based on standards used
and accepted in the art community. Examples of
art are works of art, exhibitions and
performances.
17Vocabulary publication and five-point test.
- Publication is a commonly used word, which in
daily use does not have a precise and distinct
definition. To establish a vocabulary and a
namespace, we need the word publication and
have to give it a precise and distinct
definition. To do this we have established a
five-point test, which involve addressee,
copies, location, readability and time. The test
must be taken in the following order test
against publication, then communication and
finally against documentation.
18Internal structure (I)
- Based on vocabulary and guidelines for building
an exchange model an internal structure has been
established. - The basic elements of the model are HEAD,
contents and EXTENSIONS. The core of the model is
content HEAD and EXTENSIONS can be left out.
19Internal structure - HEAD
- Each schema can consist of one and only one HEAD.
HEAD can be left out. In HEAD, all administrative
data should be placed. Administrative data such
as when the information object was created, by
whom, and who revised or edited the information,
should be placed in HEAD. In case of transaction
between systems, all transactional administration
data should be placed in HEAD. HEAD may also
contain EXTENSIONS.
20Internal structure - content
- Contents make up the core elements, on which the
model is built. All the content elements could
have been put into one element e.g. BODY. This is
not necessary when we know that all elements,
which are not HEAD or EXTENSIONS are part of the
core model.
21Internal structure - EXTENSIONS
- The elements in the basic model should be
understood and managed by all who want to
exchange information. A basic model with a high
degree of certainty will not satisfy all needs.
To accomplish these needs the model is made
extensible. With this construction, everybody can
easily see what is part of the core model and
what belongs to a specific extension. All that
make use of an extension will have to supply a
working namespace for the extension.
22Example MicroSchema ArticleInJournal
General identifier Occurrence Content model
HEAD Zero or one mSchema HEAD
TitleInfo One mSchema TitleInfo
Author One or more mSchema Person mSchema OrgUnit
RefInToJournal One mSchema RefInToJournal
URI Zero or one Uri
Abstract Zero or more Text
EXTENSIONS Zero or one mSchema EXTENSIONS
23Namespace - examples
Element name NS-abbr. NS-Uri
Output out root/Outputs.msc
Results res root/output/Results.msc
Publications pub root/output/results/Publications.msc
Journal jour root/output/results/publications/Journal.msc
ArticleInJournal aij root/output/results/publications/ArticleInJournal.msc
Person pers root/level1/Person.msc
OrgUnit org root/level1/OrgUnit.msc
HEAD HEAD root/misc/HEAD.msc
TitleInfo ti root/output/TitleInfo.msc
24Test of exchange model
- We have so far tested the model against CRIS data
from BIBSYS FORSKDOK and data from the library
system BIBSYS. - To perform the tests we have developed two XSLT
program (XSL-stylesheets), which maps/ transform
the input data into the proposed exchange model.
25Test example Input (I)
- ltpublikasjongt
- ltf001gt
- ltf001bgtr00015557lt/f001bgt
- ltf001dgtA12lt/f001dgt
- ltf001igtflt/f001igt
- ltf001jgtFO02RBRUlt/f001jgt
- ltf001ngt2000-04-03lt/f001ngt
- ltf001ogt2000-04-03lt/f001ogt
- lt/f001gt
- ltf008gt
- ltf008cgtenglt/f008cgt
- lt/f008gt
- ltf020gt
- lt/f020gt
- ltf022gt
- lt/f022gt
- ltf100gt
- ltf100agtHeimdal, J-H.lt/f100agt
- ltf100bgt02013300lt/f100bgt
- ltf100agtAarstad, H.J.lt/f100agt
- ltf100bgt02013300lt/f100bgt
- ltf100agtOlofsson, J.lt/f100agt
- ltf100bgt02013300lt/f100bgt
- lt/f100gt
- ltf245gt
- ltf245agtPeripheral Blood T-Lymphocyte and
Monocyte Function and Survival in Patients with
Head and Neck Carcinoma.lt/f245agt - lt/f245gt
- ltf260gt
- lt/f260gt
26Test example Input (II)
- ltf300gt
- ltf300agt402 - 407lt/f300agt
- lt/f300gt
- ltf507gt
- lt/f507gt
- ltf509gt
- ltf509agtLaryngocopelt/f509agt
- ltf509cgt2000lt/f509cgt
- ltf509fgt110lt/f509fgt
- ltf509hgt3lt/f509hgt
- ltf509xgt0023-852Xlt/f509xgt
- lt/f509gt
- lt/publikasjongt
27Test example Output (I)
- ltoutResults xmlnsres"http//www.rbt.no/xmlns/ce
rif/output/Results.msc"gt - ltresPublications xmlnspub"http//www.rbt.no/xml
ns/cerif/output/
results/Publications.msc"gt - ltpubArticleInJournal xmlnsaij"http//www.rbt.no
/xmlns/cerif/output/
results/publications/Ar
ticleInJournal.msc"gt - ltaijHEAD xmlnsHEAD"http//www.rbt.no/xmlns
/cerif/
misc/HEAD.msc"gt - ltHEADSourceNamegtBIBSYS
FORSKDOKlt/HEADSourceNamegt - ltHEADIdNumbergtr00015557lt/HEADIdNumbergt
- ltHEADClassificationCodegtA12lt/HEADClass
ificationCodegt - ltHEADDescriptiongtArtikkel i
internasjonalt vit. tidsskrift uten
refereelt/HEADDescriptiongt - ltHEADCreatedgt2000-04-03lt/HEADCreatedgt
- ltHEADUpdatedgt2000-04-03lt/HEADUpdatedgt
- lt/aijHEADgt
28Test example Output (II)
- ltaijTitleInfo xmlnsti"http//www.rbt.no/xmlns/c
erif/output/TitleInfo.msc"gt - lttiMainTitle Language"eng"gt
- lttiTitlegtPeripheral Blood
T-Lymphocyte and Monocyte Function and Survival
in Patients with Head and Neck Carcinoma.lt/tiTitl
egt - lt/tiMainTitlegt
- lt/aijTitleInfogt
29Test example Output (III)
- ltaijAuthor xmlnspers"http//www.rbt.no/xmlns/
cerif/level1/Person.msc"gt - ltpersPersongt
- ltpersFamilyNamesgtHeimdallt/persFam
ilyNamesgt - ltpersFirstNamesgtJ-H.lt/persFirstNa
mesgt - lt/persPersongt
- ltpersPersongt
- ltpersFamilyNamesgtAarstadlt/persFam
ilyNamesgt - ltpersFirstNamesgtH.J.lt/persFirstNa
mesgt - lt/persPersongt
- ltpersPersongt
- ltpersFamilyNamesgtOlofssonlt/persFa
milyNamesgt - ltpersFirstNamesgtJ.lt/persFirstName
sgt - lt/persPersongt
- lt/aijAuthorgt
30Test example Output (IV)
- ltaijRefInToJournal xmlnsRIn2J"http//www.rbt.no
/xmlns/cerif/
output/results/publications/j
ournal/RefInToJournal.msc"gt - ltRIn2JJournal xmlnsjour"http//www.rb
t.no/xmlns/cerif/
output/results/publications/Journal.msc"gt - ltjourTitleInfo xmlnsti"http//ww
w.rbt.no/xmlns/cerif/output/TitleInfo.msc"gt - lttiMainTitle Language"eng"gt
- lttiTitlegtLaryngocopelt/ti
Titlegt - lt/tiMainTitlegt
- lt/jourTitleInfogt
- ltjourISSNgt0023-852Xlt/jourISSNgt
- lt/RIn2JJournalgt
- ltRIn2JPublishingYeargt2000lt/RIn2JPublis
hingYeargt - ltRIn2JVolumgt110lt/RIn2JVolumgt
- ltRIn2JIssuegt3lt/RIn2JIssuegt
- ltRIn2JPagesgt402 - 407lt/RIn2JPagesgt
- lt/aijRefInToJournalgt
- lt/pubArticleInJournalgt
- lt/resPublicationsgt
- lt/outResultsgt