Globally Unique Identifiers and Life Science Identifiers - PowerPoint PPT Presentation

About This Presentation
Title:

Globally Unique Identifiers and Life Science Identifiers

Description:

Life Science Identifiers. Official standard ... LSID Resources. LSID Articles and code from IBM ... SEEK is funded by National Science Foundation award 0225676. ... – PowerPoint PPT presentation

Number of Views:295
Avg rating:3.0/5.0
Slides: 25
Provided by: dave55
Category:

less

Transcript and Presenter's Notes

Title: Globally Unique Identifiers and Life Science Identifiers


1
Globally Unique IdentifiersandLife Science
Identifiers
  • Dave Thau
  • thau_at_learningsite.com
  • University of Kansas
  • California Academy of Sciences
  • www.learningsite.com

2
Outline
  1. Describe Global Unique Identifiers
  2. Show how theyre relevant
  3. Describe one GUID system (LSIDs)
  4. Outline some issues around using GUIDs for
    TDWG-related activities
  5. Provide some resources
  6. Open discussion

3
GUID Is Not An Ugly Word
It s guid to be merry and wise, It s guid to
be honest and true,        Robert Burns Heres a
Health to Them that s Awa.
Pteroptochos tarnii AKA Guidguid
Image From animaldiversity.ummz.umich.edu
4
GUID Globally Unique Identifier
  • A short name for a complex entity
  • Useful for locating information about the entity
  • Each name identifies only one entity
  • There is some sense of permanence

5
Some things which fit this description
  • GenBank accession numbers AP006480.1
  • US Patent numbers 5443036 (laser guided cat
    exercise)
  • Digital Object Identifier 10.121/3212

6
In Our Domain
SDD Document Representing some data
set. ltClassName id"1"gt ltLabelgt
ltRepresentation language"en"gt  
ltTextgtCypselurus heterurus (Rafinesque,
1810)lt/Textgt   lt/Representationgt   lt/Labelgt
ltLinkgt   ltLSIDgtlsid.gbif.netwww.fishbase.org10
29lt/LSIDgt   lt/Linkgt   ltRankgtsplt/Rankgt
lt/ClassNamegt
Napier Schema Document Representing some
taxon. ltTaxonConcept idurnlsidbioguid.orgsee
k121212
type"original"gt
ltName type"scientific"gt   ltNameSimplegtCanis
lupuslt/NameSimplegt lt/Namegt
ltRelationshipsgt ltRelationship typeis child
of"gt   ltToTaxonConcept refurnlsidbioguid.o
rgseek5743" /gt lt/Relationshipgt
lt/Relationshipsgt lt/TaxonConceptgt
7
Features of a GUID system
  • Global uniqueness scoped to Internet
  • Should be easily resolvable by a computer or
    human
  • Should identify things down to whatever level of
    granularity necessary
  • Should not be limited to proprietary systems
  • Should serve up all sorts of data
  • Database records
  • Text files
  • Images
  • It would be nice if the identifier had associated
    metadata

8
Life Science Identifiers
  • Official standard of the Object Management Group
    (OMG)
  • Support for metadata and authentication
  • Supports multiple protocols (e.g. HTTP, SOAP)
  • Can serve up data in any format
  • Decentralized anyone can issue an LSID
  • LSID code available in Java and Perl.
  • A young standard, but increasingly used.

9
Organizations Using LSIDs
  • National Center for Biotech Information (NCBI)
  • Pubmed
  • Genbank
  • European Bioinformatics Institute (EBI)
  • US Long Term Ecological Research Network (LTER)
  • BioMOBY an biological database interoperability
    program (biomoby.org)
  • Open Bioinformatics Foundation (open-bio.org)
  • myGrid a BioGRID project (mygrid.org.uk)

10
A Small Pause For More Squid Humor
11
LSID Format
urnlsidbioguid.orgseek117866v1
  • urn indicates that this is a URN
  • lsid indicates that its an LSID-type urn
  • bioguid.org the authority who issued the LSID
  • Doesnt have to be a domain name but for now
    probably should be.
  • bioguid.org does not necessarily have the data or
    metadata.
  • There may not even be a machine called
    bioguid.org.
  • seek a name space id internal to that authority
  • The name space is meaningless to systems outside
    that authority.
  • 117866 the local identifier within that
    authority
  • Also internal to the authority
  • v1 an optional version number
  • If no version, no trailing colon either.

12
Data and Metadata
  • An LSID has data
  • Examples
  • The gene sequence in GenBank
  • The actual LTER data set, maybe in excel, or in a
    text file
  • The data should never change
  • An LSID also has metadata
  • Example metadata
  • The format of the data
  • A display title for clients displaying the LSID
  • Dublin core metadata
  • Anything you want
  • The metadata can change

13
Example LSIDs
  • An LTER fish abundance data set
  • urnlsidlimnology.wisc.edudatasetntlfi02
  • A PubMed reference
  • urnlsidncbi.nlm.nih.gov.lsid.biopathways.orgpub
    med12441808
  • A GenBank sequence
  • urnlsidncbi.nlm.nih.gov.lsid.biopathways.orggen
    bank_gi30350027

14
How LSIDs work
LSID Client Maybe Launchpad Maybe Haystack Maybe
BioFerret Maybe myGRID Maybe Yours!
DNS Find DNS record Resolve it to get Address of
Authority
  1. Find the authority for this LSID

Returns the LSID Authority Server
LSID Authority
2. Query authority for available services
Returns WSDL for this LSID
3. Chose a service, get the goods
Data Store
Metadata Store
HTTP, SOAP, FTP, others
15
LSID Promises
  • I promise to never change the data behind an
    LSID.
  • I will make sure my LSIDs are being served, or
    give them to someone who can do it.
  • I will give my LSIDs metadata at least give
    them a title and a format

16
Other GUID systems
  • URLs
  • Files move
  • The data change
  • Unstructured metadata
  • UUIDs 128 bit string, guaranteed unique
  • 58f202ac-22cf-11d1-b12d-002035b29092
  • No resolution
  • No metadata
  • Handle System / DOIs (10.12/2312)
  • Non standard protocol
  • Centralized resolution
  • Unstructured metadata (for Handle System)
  • High costs (for DOI)

17
Issues For This Community
  • What gets a GUID?
  • For each of those things, whats the data, whats
    the metadata?
  • One GUID per item?
  • Centralization who issues GUIDs?

18
What Gets a GUID?
  • These things probably should get GUIDs
  • Taxonomic concepts
  • Specimens
  • Publications
  • People
  • These things might get GUIDs
  • Taxonomic names
  • Journals
  • Data providers
  • Observations

19
Specimen Data? Metadata?
  • If specimens get a GUID what does it identify?
  • The physical specimen?
  • A collections database record of the specimen?
  • What about multiple labels?
  • Main question what doesnt change about a
    specimen?
  • Other main question how should the data be
    represented?
  • Darwin core includes current institution
    location. Not a good idea for the data of a GUID
    since that may change.

20
One GUID Per Item?
  • No GUID system inherently enforces a 11 mapping
    between GUID and data.
  • Everyone should TRY to limit the number of GUIDs
    per item.
  • Should there be any centralization to help
    achieve this?

21
Degrees of Centralization
  • An index
  • List your GUID authority in an index so your
    GUIDs are easy to find.
  • A central authority
  • One authority could be responsible for issuing
    GUIDs to the community for specific types of
    information youd have to get one from here.
  • GBIF?
  • The IC_Ns? (ICZN, ICBN.)
  • lsidauthority.org?
  • This would help enforce a 11 mapping of GUIDs
    and data items
  • It would also alleviate data providers from the
    need to maintain their own authorities
  • It MAY also reduce the likelihood of GUIDs
    becoming unresolvable
  • It may also be infeasible technically, or
    socially.
  • A respected authority
  • With LSIDs, an authority can be set up to serve
    its own GUIDs and proxy other authorities.
  • This would help enforce a 11 mapping for those
    who use the authority
  • It may also be more feasible.

22
LSID Resources
  • LSID Articles and code from IBM
  • http//www-124.ibm.com/developerworks/oss/lsid/wh
    atislsid
  • Current LSID specification
  • http//www.omg.org/cgi-bin/doc?dtc/04-05-01
  • Launchpad An LSID resolver for Windows IE
  • available from first link
  • A website which resolves LSIDs
  • http//lsid.biopathways.org/resolver/
  • URN specification
  • http//www.ietf.org/rfc/rfc2141.txt

23
Acknowledgements
  • My work on GUIDs has been funded by the SEEK
    project seek.ecoinformatics.org.
  • SEEK is funded by National Science Foundation
    award 0225676.
  • Thanks to Ben Szekely at IBM for his LSID
    articles, his LSID java code, and for answering
    all my questions.

24
Questions for Discussion
  • Do we need GUIDs?
  • What gets a GUID?
  • One GUID per item?
  • Centralization?
Write a Comment
User Comments (0)
About PowerShow.com