Title: Session V: Life Science Identifiers Use Cases, Future Directions
1Session V Life Science Identifiers - Use Cases,
Future Directions
2Recent History
- LSIDs 3 years old
- I3C evaluating AGAVE, BSML
- encoded IDs as tuples/triples
- If we could not agree on a data standard, could
we at least agree on how we write the identifiers
3Today
- OMG Spec
- google LSID bioinformatics
- 686 results (10/27/04, 240pm)
- 700 results (10/27/04, 720am)
4Broad Use Cases
5How GenePattern is using LSIDs
- Identify analysis tasks and pipelines via LSIDs
- Create sharable pipelines referencing tasks via
LSIDs - Provide a repository and retrieval for analysis
tasks by LSID
6Example ALL/AML Analysis
Training Data
Test Data
all_aml_train 27 ALL, 11 AML expression samples
all_aml_test 20 ALL, 14 AML expression samples
Preprocess Filter uninformative genes
Preprocess Filter uninformative genes
SOM Clustering Cluster samples to separate tumor
types
Weighted Voting Train-test Build a classifier
and compute its accuracy on a test set
Class Neighbors Find genes that most closely
match a profile
Weighted Voting Cross-Validation Build a
classifier and compute its accuracy using
cross-validation
Golub and Slonim et al., 1999
7Example ALL/AML Analysis
urnlsidbroad.mit.educancer.software.genepattern
.module.pipeline000010
Training Data
Test Data
all_aml_train 27 ALL, 11 AML expression samples
all_aml_test 20 ALL, 14 AML expression samples
Preprocess urnlsidbroad.mit.edu cancer.software
.genepattern.module.analysis000200
Preprocess urnlsidbroad.mit.edu cancer.softwar
e.genepattern.module.analysis000200
SOM Clustering urnlsidbroad.mit.educancer.soft
ware.genepattern.module.analysis000290
Weighted Voting Train-test urnlsidbroad.mit.edu
cancer.software.genepattern.module.analysis00027
0
Class Neighbors urnlsidbroad.mit.educancer.sof
tware.genepattern.module.analysis000010
Weighted Voting Cross-Validation urnlsidbroad.m
it.educancer.software.genepattern.module.analysis
000280
Golub and Slonim et al., 1999
8- LSIDs enable
- Reproducible research
- exactly repeating an in silico experiment
- modernizing pipelines to latest
- Tracking module provenance
- Someday
- Data will be available via LSID too
9Future
urnlsidbroad.mit.educancer.software.genepattern
.module.pipeline000010
Training Data
Test Data
urnlsidbroad.mit.edu cancer.microarray abcde1
.0
urnlsidbroad.mit.edu cancer.microarray zyxwv1
.0
Preprocess urnlsidbroad.mit.edu cancer.software
.genepattern.module.analysis000200
Preprocess urnlsidbroad.mit.edu cancer.softwar
e.genepattern.module.analysis000200
SOM Clustering urnlsidbroad.mit.educancer.soft
ware.genepattern.module.analysis000290
Weighted Voting Train-test urnlsidbroad.mit.edu
cancer.software.genepattern.module.analysis00027
0
Class Neighbors urnlsidbroad.mit.educancer.sof
tware.genepattern.module.analysis000010
Weighted Voting Cross-Validation urnlsidbroad.m
it.educancer.software.genepattern.module.analysis
000280
Golub and Slonim et al., 1999
10Other LSID use at the Broad
- Sample management
- Sharing samples (tissues, clones, etc) between
program groups - LSIDs identify samples
- Permits scientists to find all experiments done
with a sample in any Broad program
11Other LSID use at the Broad
- 2. GeneCruiser web service
- annotation web service for microarray probes
- maps probe set identifiers to GO, GenBank,
SwissProt etc - Interface returns LSIDs to these other sources
for their identifiers
12Use Cases and Future Directions
- What does it actually mean to identify a
biological object such as "a gene"? - How does LSID address structural elements of
biological and chemical objects? - What are the lessons learned from early
implementations of LSID?
13Use Cases and Future Directions
- What granularity of object do we identify?
- Should LSID be a URI not a URN?
- Should virtual persistent identifiers for
derived/calculated properties be used? - What are the barriers to widespread use?
- Data/Metadata split is this a problem?
- Phil Lord mentioned _at_end of yesterday in MyGrid
talk
14Best LSID quote
- LSIDs are in a sense just a sociological con
trick, since they are nothing more than cheap and
cheerful URNs David Shotten