Session V: Life Science Identifiers Use Cases, Future Directions - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Session V: Life Science Identifiers Use Cases, Future Directions

Description:

Session V: Life Science Identifiers - Use Cases, Future Directions. Recent History ... If we could not agree on a data standard, could we at least agree on how we ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 15
Provided by: syst120
Category:

less

Transcript and Presenter's Notes

Title: Session V: Life Science Identifiers Use Cases, Future Directions


1
Session V Life Science Identifiers - Use Cases,
Future Directions
2
Recent History
  • LSIDs 3 years old
  • I3C evaluating AGAVE, BSML
  • encoded IDs as tuples/triples
  • If we could not agree on a data standard, could
    we at least agree on how we write the identifiers

3
Today
  • OMG Spec
  • google LSID bioinformatics
  • 686 results (10/27/04, 240pm)
  • 700 results (10/27/04, 720am)

4
Broad Use Cases
5
How GenePattern is using LSIDs
  • Identify analysis tasks and pipelines via LSIDs
  • Create sharable pipelines referencing tasks via
    LSIDs
  • Provide a repository and retrieval for analysis
    tasks by LSID

6
Example ALL/AML Analysis
Training Data
Test Data
all_aml_train 27 ALL, 11 AML expression samples
all_aml_test 20 ALL, 14 AML expression samples

Preprocess Filter uninformative genes
Preprocess Filter uninformative genes
SOM Clustering Cluster samples to separate tumor
types
Weighted Voting Train-test Build a classifier
and compute its accuracy on a test set
Class Neighbors Find genes that most closely
match a profile
Weighted Voting Cross-Validation Build a
classifier and compute its accuracy using
cross-validation
Golub and Slonim et al., 1999
7
Example ALL/AML Analysis
urnlsidbroad.mit.educancer.software.genepattern
.module.pipeline000010
Training Data
Test Data
all_aml_train 27 ALL, 11 AML expression samples
all_aml_test 20 ALL, 14 AML expression samples

Preprocess urnlsidbroad.mit.edu cancer.software
.genepattern.module.analysis000200
Preprocess urnlsidbroad.mit.edu cancer.softwar
e.genepattern.module.analysis000200
SOM Clustering urnlsidbroad.mit.educancer.soft
ware.genepattern.module.analysis000290
Weighted Voting Train-test urnlsidbroad.mit.edu
cancer.software.genepattern.module.analysis00027
0
Class Neighbors urnlsidbroad.mit.educancer.sof
tware.genepattern.module.analysis000010
Weighted Voting Cross-Validation urnlsidbroad.m
it.educancer.software.genepattern.module.analysis
000280
Golub and Slonim et al., 1999
8
  • LSIDs enable
  • Reproducible research
  • exactly repeating an in silico experiment
  • modernizing pipelines to latest
  • Tracking module provenance
  • Someday
  • Data will be available via LSID too

9
Future
urnlsidbroad.mit.educancer.software.genepattern
.module.pipeline000010
Training Data
Test Data
urnlsidbroad.mit.edu cancer.microarray abcde1
.0
urnlsidbroad.mit.edu cancer.microarray zyxwv1
.0

Preprocess urnlsidbroad.mit.edu cancer.software
.genepattern.module.analysis000200
Preprocess urnlsidbroad.mit.edu cancer.softwar
e.genepattern.module.analysis000200
SOM Clustering urnlsidbroad.mit.educancer.soft
ware.genepattern.module.analysis000290
Weighted Voting Train-test urnlsidbroad.mit.edu
cancer.software.genepattern.module.analysis00027
0
Class Neighbors urnlsidbroad.mit.educancer.sof
tware.genepattern.module.analysis000010
Weighted Voting Cross-Validation urnlsidbroad.m
it.educancer.software.genepattern.module.analysis
000280
Golub and Slonim et al., 1999
10
Other LSID use at the Broad
  • Sample management
  • Sharing samples (tissues, clones, etc) between
    program groups
  • LSIDs identify samples
  • Permits scientists to find all experiments done
    with a sample in any Broad program

11
Other LSID use at the Broad
  • 2. GeneCruiser web service
  • annotation web service for microarray probes
  • maps probe set identifiers to GO, GenBank,
    SwissProt etc
  • Interface returns LSIDs to these other sources
    for their identifiers

12
Use Cases and Future Directions
  • What does it actually mean to identify a
    biological object such as "a gene"?
  • How does LSID address structural elements of
    biological and chemical objects?
  • What are the lessons learned from early
    implementations of LSID?

13
Use Cases and Future Directions
  • What granularity of object do we identify?
  • Should LSID be a URI not a URN?
  • Should virtual persistent identifiers for
    derived/calculated properties be used?
  • What are the barriers to widespread use?
  • Data/Metadata split is this a problem?
  • Phil Lord mentioned _at_end of yesterday in MyGrid
    talk

14
Best LSID quote
  • LSIDs are in a sense just a sociological con
    trick, since they are nothing more than cheap and
    cheerful URNs David Shotten
Write a Comment
User Comments (0)
About PowerShow.com