4 Approaches to Structuring Lists - PowerPoint PPT Presentation

About This Presentation
Title:

4 Approaches to Structuring Lists

Description:

... Albania Algeria ... – PowerPoint PPT presentation

Number of Views:233
Avg rating:3.0/5.0
Slides: 42
Provided by: rog101
Category:

less

Transcript and Presenter's Notes

Title: 4 Approaches to Structuring Lists


1
4 Approaches to Structuring Lists
February 22, 2009
2
Lists are everywhere
  • A list of countries
  • A list of religions
  • A list of weights
  • A list of students
  • A list of days of the week
  • A list of planets

3
The purpose of this document is to answer these
questions
  • What are the different approaches to structure
    lists?
  • What are the pros and cons of each approach?
  • Is there a way to structure lists to maximize
    their utility and minimize their overhead?

4
Lists should be usable for multiple purposes
5
Example
  • We will use a country list to illustrate the four
    approaches.

6
Some ways we mightuse a country list
  • Use it as values in an XForms pick list
  • Merge it with other data to create a document
    that contains, for each country, sales figures
    (or death rates, births, political leadership,
    religions, etc)
  • Use it to validate an element's content

country list
validate
ltcountry-visitedgt_______lt/country-visitedgt
7
Approach 1 Express lists using the XML Schema
vocabulary
8
lt?xml version"1.0" encoding"UTF-8"?gt ltxsschema
xmlnsxs"http//www.w3.org/2001/XMLSchema"
targetNamespace"http//www.countrie
s.org" xmlns"http//www.count
ries.org" elementFormDefault"
qualified"gt ltxselement name"countries"
type"countriesType" /gt ltxssimpleType
name"countriesType"gt ltxsrestriction
base"xsstring"gt ltxsenumeration
value"Afghanistan"/gt ltxsenumeration
value"Albania"/gt ltxsenumeration
value"Algeria"/gt ...
lt/xsrestrictiongt lt/xssimpleTypegt lt/xsschema
gt
9
Approach 2 Express lists using the RELAX NG
vocabulary
10
lt?xml version"1.0" encoding"UTF-8"?gt ltgrammar
xmlns"http//relaxng.org/ns/structure/1.0"
ns"http//www.countries.org"gt
ltdefine name"countriesElement"gt ltelement
name"countries"gt ltref
name"countriesType" /gt lt/elementgt
lt/definegt ltdefine name"countriesType"gt
ltchoicegt ltvaluegtAfghanistanlt/valuegt
ltvaluegtAlbanialt/valuegt
ltvaluegtAlgerialt/valuegt ...
lt/choicegt lt/definegt lt/grammargt
11
Approach 3 Express lists using domain-specific
vocabularies. The markup comes from terminology
used by Subject Matter Experts (SMEs)
12
lt?xml version"1.0" encoding"UTF-8"?gt ltcountries
xmlns"http//www.countries.org"gt
ltcountrygtAfghanistanlt/countrygt
ltcountrygtAlbanialt/countrygt ltcountrygtAlgerialt/c
ountrygt ... lt/countriesgt
13
Approach 4 Express lists using a generic list
vocabulary
14
lt?xml version"1.0" encoding"UTF-8"?gt ltListgt
ltIdentifiergthttp//www.countries.orglt/Identifiergt
ltligtAfghanistanlt/ligt ltligtAlbanialt/ligt
ltligtAlgerialt/ligt ... lt/Listgt
15
Analysis of Each Approach
16
Approach 1 Approach 2
  • Approach 1 and approach 2 make it easy to use a
    list for validation purposes. A schema simply
    imports the list schema and then the lists'
    values are immediately available for validating
    element content.
  • Here is an XML Schema that imports the country
    list XML Schema and uses its simpleType as the
    datatype for the ltcountry-visitedgt element

lt?xml version"1.0" encoding"UTF-8"?gt ltxsschema
xmlnsxs"http//www.w3.org/2001/XMLSchema"
targetNamespace"http//www.example.
org" xmlnsc"http//www.count
ries.org" elementFormDefault"
qualified"gt ltxsimport namespace"http//www.
countries.org" schemaLocation"coun
tries.xsd" /gt ltxselement name"country-visit
ed" type"ccountriesType" /gt lt/xsschemagt
17
Approach 1 Approach 2
  • Here is a RELAX NG schema that includes the
    country list RELAX NG schema and uses its define
    element as the datatype for the ltcountry-visitedgt
    element

lt?xml version"1.0" encoding"UTF-8"?gt ltgrammar
xmlns"http//relaxng.org/ns/structure/1.0"
ns"http//www.example.org"gt ltinclude
href"countries.rng"/gt ltstartgt
ltelement name"country-visited"gt ltref
name"countriesType" /gt lt/elementgt
lt/startgt lt/grammargt
18
Approach 1 Approach 2
  • If the schema doing the importing is an XML
    Schema then it can't use the list if it's
    expressed using RELAX NG. And vice versa.

country list (rng)
country list (xsd)
Schema (xsd)
Schema (rng)
19
Approach 1 Approach 2
  • Although these two approaches enable the
    efficient usage of lists for validation, they are
    not the most efficient format for the myriad
    other ways that a list may be used (rendering in
    a pick list, merging with other lists, searching,
    and so forth). This is discussed further in the
    below analysis of approach 3.

20
Approach 3
  • Recall that approach 3 uses domain-specific
    terminology. This can be helpful to Subject
    Matter Experts (SMEs) as they maintain the lists.
  • Validation can be accomplished using a Schematron
    schema. Here is a Schematron schema which
    validates that the content of the
    ltcountry-visitedgt element matches one of the
    values in the country list

lt?xml version"1.0"?gt ltschschema
xmlnssch"http//www.ascc.net/xml/schematron"gt
ltschns uri"http//www.countries.org"
prefix"c" /gt ltschns uri"http//www.exa
mple.org" prefix"ex" /gt
ltschpattern name"Country List Check"gt
ltschrule context"excountry-visited"gt
ltschassert test". document('countries.xml')//c
country"gt The value of
country-visited must be one of the
countries in the countries' list.
lt/schassertgt lt/schrulegt
lt/schpatterngt lt/schschemagt
21
Approach 3
  • With approach 3 the markup used to construct the
    list has semantics specific to the
    list http//www.countries.orgcountries http
    //www.countries.orgcountry
  • This makes possible the creation of programs that
    are readily understood, as they use terminology
    consistent with the domain. For example, the XSLT
    program on the following slide uses the country
    list to generate an HTML list of all countries ?

22
lt?xml version"1.0"?gt ltxslstylesheet
xmlnsxsl"http//www.w3.org/1999/XSL/Transform"
xmlnsc"http//www.countr
ies.org" version"2.0"gt
ltxsloutput method"html"/gt
ltxsltemplate match"ccountries"gt
lthtmlgt ltheadgt
lttitlegtCountries of the Worldlt/titlegt
lt/headgt ltbodygt ltolgt
ltxslapply-templates /gt
lt/olgt lt/bodygt
lt/htmlgt lt/xsltemplategt ltxsltemplate
match"ccountry"gt ltligt
ltxslvalue-of select"." /gt lt/ligt
lt/xsltemplategt lt/xslstylesheetgt
Note the template match values. They match on
http//www.countries.orgcountries
http//www.countries.orgcountry
23
Contrast with Approach 1 and Approach 2
  • Conversely, with approach 1 and approach 2 the
    markup used to construct the list has semantics
    that are specific to the schema
    language http//www.w3.org/2001/XMLSchemaeleme
    nt http//www.w3.org/2001/XMLSchemasimpleType
    http//www.w3.org/2001/XMLSchemaenumeration
    http//relaxng.org/ns/structure/1.0define
    http//relaxng.org/ns/structure/1.0choice
    http//relaxng.org/ns/structure/1.0value
  • Consequently programs must operate using schema
    terminology rather than domain terminology. For
    example, the XSLT program on the following slide
    generates an HTML list of all countries from the
    countries list specified by the XML Schema
    document ?

24
lt?xml version"1.0"?gt ltxslstylesheet
xmlnsxsl"http//www.w3.org/1999/XSL/Transform"
xmlnsxs"http//www.w3.org/2001/XM
LSchema" version"2.0"gt
ltxsloutput method"html"/gt ltxsltemplate
match"xssimpleType"gt lthtmlgt
ltheadgt lttitlegtCountries of the
Worldlt/titlegt lt/headgt
ltbodygt ltolgt
ltxslapply-templates /gt lt/olgt
lt/bodygt lt/htmlgt
lt/xsltemplategt ltxsltemplate
match"xsenumeration"gt ltligt
ltxslvalue-of select"_at_value" /gt lt/ligt
lt/xsltemplategt lt/xslstylesheetgt
Note the template match values. Rather than the
XSLT program operating on ltcountriesgt and
ltcountrygt elements, it operates on ltschemagt,
ltsimpleTypegt, ltrestrictiongt, and ltenumerationgt
elements. This makes programming challenging and
error-prone.
25
Approach 3
  • With approach 3 a list can be used as a building
    block (data component) which can be immediately
    dropped into other documents to create compound
    documents. For example, consider a list of
    religions, also structured using approach 3

lt?xml version"1.0" encoding"UTF-8"?gt ltreligions
xmlns"http//www.religions.org"gt
ltreligiongtBaha'ilt/religiongt
ltreligiongtBuddhismlt/religiongt
ltreligiongtCatholicismlt/religiongt
... lt/religionsgt
26
Approach 3
  • It's easy to construct a compound document
    comprised of the country and religion lists

lt?xml version"1.0" encoding"UTF-8"?gt ltreligions-
per-countrygt ltcountries xmlns"http//www.coun
tries.org"gt ltcountrygtAfghanistanlt/countrygt
ltcountrygtAlbanialt/countrygt
ltcountrygtAlgerialt/countrygt ...
lt/countriesgt ltreligions xmlns"http//www.reli
gions.org"gt ltreligiongtBaha'ilt/religiongt
ltreligiongtBuddhismlt/religiongt
ltreligiongtCatholicismlt/religiongt ...
lt/religionsgt lt!-- markup that maps religions
to countries --gt lt/religions-per-countrygt
27
Approach 3
  • Due to the modularity provided by approach 3, it
    is possible to perform list-specific processing
    on this compound document. That is, a
    country-list-aware application would be able to
    extract the country list from this compound
    document and process it. Ditto for a
    religion-list-aware application.

lt?xml version"1.0" encoding"UTF-8"?gt ltreligions-
per-countrygt ltcountries xmlns"http//www.coun
tries.org"gt ltcountrygtAfghanistanlt/countrygt
ltcountrygtAlbanialt/countrygt
ltcountrygtAlgerialt/countrygt ...
lt/countriesgt ltreligions xmlns"http//www.reli
gions.org"gt ltreligiongtBaha'ilt/religiongt
ltreligiongtBuddhismlt/religiongt
ltreligiongtCatholicismlt/religiongt ...
lt/religionsgt lt!-- markup that maps religions
to countries --gt lt/religions-per-countrygt
country-list-aware application
religion-list-aware application
28
Constrast with Approach 1 and Approach 2
  • With approach 1 and approach 2 the XML
    vocabulary used to construct the list is the same
    regardless of the list. Here is the compound
    document using lists that are defined using the
    XML Schemas vocabulary

lt?xml version"1.0" encoding"UTF-8"?gt ltreligions-
per-countrygt ltxssimpleType
xmlnsxs"http//www.w3.org/2001/XMLSchema"
name"countriesType"gt
ltxsrestriction base"xsstring"gt
ltxsenumeration value"Afghanistan"/gt
ltxsenumeration value"Albania"/gt
ltxsenumeration value"Algeria"/gt
... lt/xsrestrictiongt
lt/xssimpleTypegt ltxssimpleType
xmlnsxs"http//www.w3.org/2001/XMLSchema"
name"religionsType"gt
ltxsrestriction base"xsstring"gt
ltxsenumeration value"Baha'i"/gt
ltxsenumeration value"Buddhism"/gt
ltxsenumeration value"Catholicism"/gt
... lt/xsrestrictiongt
lt/xssimpleTypegt lt!-- markup that maps
religions to countries --gt lt/religions-per-country
gt
Applications can't distinguish the country list
from the religion list. The namespace used by the
country list cannot be distinguished from the
namespace used by the religion list. Thus, the
benefits namespaces provide in terms of
modularity are negated. It is not easy to create
country-list-aware applications or
religion-list-aware applications.
29
Approach 3
  • Approach 3 has minimal markup overhead.

30
Approach 4
  • In this approach the vocabulary is not customized
    for a specific list as with approach 3 rather,
    it is a vocabulary for any list.
  • An element in an XML instance document can be
    validated against the list using Schematron in
    the same manner described in Approach 3.
  • With the other approaches, the vocabulary is
    identified via a namespace. Approach 4 doesn't
    use namespaces instead, it uses data to identify
    the list

This data indicates that this is a list of
countries
ltIdentifiergthttp//www.countries.orglt/Identifiergt
31
Identifying a Vocabulary via a Namespace versus
Identifying a Vocabulary via a data
32
Identifying a Vocabulary via aNamespace
One way of identifying an XML building block
(data component) is by namespace. For example,
this list component is identified by the
namespace http//week.days.org ltDaysOfTheWeek
xmlns"http//week.days.org"gt
ltDaygtSundaylt/Daygt ltDaygtMondaylt/Daygt
ltDaygtTuesdaylt/Daygt ltDaygtWednesdaylt/Daygt
ltDaygtThursdaylt/Daygt ltDaygtFridaylt/Daygt
ltDaygtSaturdaylt/Daygt lt/DaysOfTheWeekgt Thi
s list is identified by the namespace
http//meetings.org ltMeetings
xmlns"http//meetings.org"gt
ltMeetinggtDentistlt/Meetinggt
ltMeetinggtDoctorlt/Meetinggt
ltMeetinggtBosslt/Meetinggt lt/Meetingsgt Applicatio
ns can be built that are namespace-aware. Differe
nt data components can be mashed together into a
single document and still be extracted and
processed individually because each is in a
namespace.
33
Identifying a Vocabulary via Data
There is an alternate way of identifying an XML
building block (data component) by embedding an
identifier within the document, as data. The
weekday list could be expressed like this
ltListgt ltIdentifiergthttp//week.days.orglt/I
dentifiergt ltligtSundaylt/ligt
ltligtMondaylt/ligt ltligtTuesdaylt/ligt
ltligtWednesdaylt/ligt ltligtThursdaylt/ligt
ltligtFridaylt/ligt ltligtSaturdaylt/ligt
lt/Listgt And the meetings list could be expressed
like this ltListgt ltIdentifiergthttp//
meetings.orglt/Identifiergt
ltligtDentistlt/ligt ltligtDoctorlt/ligt
ltligtBosslt/ligt lt/Listgt
34
Cont.
Things to note on the previous slide 1.
Namespaces are not being used. 2. The list is
identified by the content of ltIdentifiergt 3. The
same XML vocabulary is used for both lists. (In
fact, the same XML vocabulary is used for all
lists) The two lists can be brought together
into a single document and still be processed
individually. Applications partition the document
based on the value in ltIdentifiergt
35
Analysis
  • The namespace approach has the benefit of being
    widely adopted. Most XML tools, parsers, and
    technologies are based on namespaces. For
    example, NVDL is entirely based on using
    namespaces to partition a compound document an
    XSLT processor processes a document based on the
    XSLT namespace.

36
Cont.
  • By using data to identify a list (rather than
    namespaces) the same XML vocabulary can be used
    for all lists which makes all list-processing
    algorithms and code independent of the content,
    allowing one to leverage a single investment in
    software and access all code lists.
  • However, that raises an interesting question is
    content-specific processing easier when lists are
    expressed using a domain-specific vocabulary or
    when lists are expressed using a generic
    vocabulary?

37
Analysis of all Approaches
  • Regardless of which approach is used, the meaning
    of the list and its values must be clearly
    documented. It may be challenging to achieve
    consensus on meaning
  • The same terminology may be used by different
    people to mean the same thing. For example, one
    person expects to see Puerto Rico in a country
    list, whereas another person does not. This is
    because one person defines "country" only as
    principal sovereignties whereas another person
    defines "country" to include territories and
    protectorates.
  • Further, some people use different terminology to
    mean the same thing. For example, one person
    calls it "country" another calls it
    "principality."
  • With all approaches the issue arises of which
    terminology and definitions to adopt.

38
Recommendation
  • Each of the four approaches has pros and cons so,
    as always, be sure to understand the alternatives
    and decide which is best for your situation.

39
genericode
  • genericode is a standardized generic list
    vocabulary 1. That is, it is an example of
    approach 4.
  • Here's the idea behind the design of genericode's
    vocabulary
  • Oftentimes when creating a list there are
    multiple ways to express each value in the list.
    For example, in a list of countries we may
    express the first value as Afghanistan or AF.
    genericode permits each value to be expressed in
    multiple ways. Thus, the list is expressed in
    terms of rows and columns - each row has a column
    for the multiple ways to express a list value.

1 http//docs.oasis-open.org/codelist/cd-generic
ode-1.0/doc/oasis-code-list-representation-generic
ode.pdf
40
ltgcCodeList xmlnsgc"http//docs.oasis-open.org/
codelist/ns/genericode/1.0/"gt
ltIdentificationgtuniquely identify this list
herelt/Identificationgt ltSimpleCodeListgt
ltRowgt ltValuegt
ltSimpleValuegtAFlt/SimpleValuegt
lt/Valuegt ltValuegt
ltSimpleValuegtAFGHANISTANlt/SimpleValuegt
lt/Valuegt lt/Rowgt ltRowgt
ltValuegt ltSimpleValuegtALlt/SimpleVa
luegt lt/Valuegt ltValuegt
ltSimpleValuegtALBANIAlt/SimpleValuegt
lt/Valuegt lt/Rowgt ...
lt/SimpleCodeListgt lt/gcCodeListgt
41
Acknowledgements
  • Thanks to the following people for contributing
    to this document
  • Roger Costello
  • Bruce Cox
  • Ken Holman
  • Rick Jelliffe
  • Michael Kay
  • Rob Simmons
Write a Comment
User Comments (0)
About PowerShow.com