Title: Journal Club: Title of Paper by Authors Bibliographic reference
1Journal Club Title of Paper by
AuthorsBibliographic reference
- Your Name
- Date
- Distribute the paper before your talk, for
motivated listeners to read
2BMI Journal Club Finding function
evaluation methods for functional genomic
dataMyers, Barrett, Hibbs, Huttenhower
TroyanaskayaBMC Genomics 2006 7187
3Why this paper?
- Brief bullet points about why this paper is a
good BMI journal club paper, and why you selected
it
4Why this paper?
- Needed a good methodological paper
- Proliferation of work here and elsewhere on
predicting gene function from high throughput
genomics - This paper addresses an important problem in
evaluation, and uses general informatics
principles - Olga is a recent BMI graduate )
5Informatics Problem
- Describe what is the general biomedical
informatics question/problem addressed in the
paper - Brief review of what others have done to solve
this problem, and how performance has been. THIS
MAY REQUIRE READING OTHER PAPERS! - Why is there another paper on this topic?
6Informatics Problem
- Whenever a method is created that makes
predictions or diagnoses it must be evaluated
against a gold standard of truth. - When making multiple predictions, there can be
biases in the gold standard based on its coverage
of the predicted space - The resulting reports of performance can vary
widely and unpredictably based on which parts of
the gold standard are used. - This is a relatively new problem in the context
of large scale predictive technologies
7Informatics Problem
- What is the best way to evaluate a system making
thousands or millions of predictions? - How can we level the playing field so that
different methods and data sources can be
assessed with respect to information content
fairly?
8Potentially confounding biomedicine! )
- What is application area of biology or medicine
in which this work is presented? - Discussion of the biological or medical problem
that drove/required/suggested researchers to
recognize potential for informatics innovation - What is the significance of this biomedical
problem - REMEMBER TO SEPARATE THE INFORMATICS FROM THE
BIOMEDICAL APPLICATION. THAT MAY LEAVE NOTHING
9(Potentially confounding) biomedical background
- With the human genome sequenced, we need to
understand the interactions and functions of
genes (for understanding, drug-design - High-throughput experimental data sets are used
and integrated for this purpose two-hybrid,
mRNA expression, affinity precipitation - Diverse algorithms are also created for
integrating these data - Naïve Bayes (Troyanskaya others)
- Probabilistic Relational Models (Koller)
- Comparative techniques (Segal Stuart)
10More biology context
- It is critical to assemble networks of
interacting and functionally related genes in
order to generate hypotheses about cellular
biology, identify drug targets, assess pathway
engineering opportunities. - Yeast is the best-studied organism because of the
wealth of data sets - Authors suspect that use of existing silver
standards may skew conclusions about high vs.
low information content methods/data sources. - Scientists are frustrated if many predictions are
high confidence and then fail in the lab.
11Background
- Review of informatics and biomedicine people
need to know in order to understand the key
contributions of the paper
12Background
- Gene Ontology
- Taxonomy of gene function, 30K terms
- Terms assigned to genes manually genes related
if they get the same term - KEGG
- Database of biological pathways
- Mostly metabolic, manually curated
- Genes in same pathway related
- Each of these provides a biased coverage of gene
function space!
13Uneven gold-standard
14Different conclusions from different silver
standards
15Background
- GO is organized from most general (top) to most
specific (bottom) - For validation, people often choose a level of
GO at which they define GO annotations to be
meaningful. - E.g. All GO codes at level 5 or below
sufficiently precise predictions.
16(No Transcript)
17Wide variability in GO depth annotation frequency
18Aims of Paper
- As in BMI 212, a listing of the specific aims of
the paper. No more than 3 usually (often less). - NOTE the paper should be presented initially
in the most positive light, as the authors would
have presented it. The time for critique is
after the author perspective presentation.
19Aims of Paper
- Define the problem of biased gold standards in
high-throughput evals. - Create a method for comparing prediction methods
fairly - Build a manual gold standard and associated web
tool - Allow evaluations to report not only overall
performance, but area-specific performance.
20Methods Employed
- This is the key part of the presentation for BMI
crowd. This should be a presentation of the
methods described in the paper at sufficient
technical level so people can discuss and
evaluate it. Avoid detailed math/equations
unless absolutely critical to the discussion.
21Methods Employed
- 6 post-doctoral biologists
- Examine every GO code and vote on informative
or not informative if applied to a gene - 3 informative votes useful category
- lt1 informative and gt1000 annotations not
useful category - Not usefuls are key denominator for
computations of precision/specificity
22Results
- Recapitulate major results. Usually by
presenting main figures from the paper.
23Results of selecting GO codes manually for Gold
Standard
24Methods
- With gold standard GO codes that they trust,
can now analyze methods/data sources and give
specific performance report on different areas
(of biology). - Can also systematically remove GO topics in order
to see if there are dominant effects (e.g. remove
ribosomes)
25Comparison of methods using new gold standard
26New method to compare/assess methods
27GRIFn website available (?)
28Authors Conclusions
- A presentation of how the authors summarize
their results and significance. Usually not more
than 3 major points. Often one.
29Authors Conclusions
- Curated GO codes now provide more trustworthy
gold-standard - Allows tools to be built that give
- Overall performance
- Subarea-specific breakdown of performance
- Direct comparison of different methods/data
sources - Sets the bar on evaluation, and starts a
discussion about community-wide standards.
30Assessment of Paper Informatics
- What are the major methodological (engineering)
innovations in the paper, in your opinion? - Are the methods presented soundly, completely,
and evaluated appropriately? - How general are the methods presented for use in
other areas either directly or with some effort
by others?
31Assessment of Paper Informatics
- Beautiful description and justification of the
work. Clearly a general problem. - Well informed by research in the field, and
evaluation of problems that arise in eval. - Solution applicable in many domains
- Close (KEGG, NLP, others)
- Farther (Any large volume prediction activity)
- Some bias in expert-based gold standards
- Very good availability of specific tool to allow
use (cf. Maureen)
32Assessment of paper Biomedicine
- Has the paper helped make a new contribution of
biomedical knowledge? - What is the domain significance of this paper?
- Was it published in the right journal to find
the audience who should care about it the most?
33Assessment of paper Biomedicine
- Should greatly reduce the noise in papers about
high-throughput predictions - Should create a new bar for performance
- Systems biology and interaction informatics
workers need to pay attention. - Microarray information content may be lower than
thought previously on average - Genomics audience is a good one, since they need
to be aware of these relatively sophisticated
informatics issues.
34Detailed Concerns
- Particularly if you dont like the paper, what
are your technical informatics concerns about the
method, implementation or evaluation?
35Detailed Concerns
- A little confused about negative gold standard
and how it is meant to be used. (Email in to
Olga) - There are still biases in the gold standard (e.g.
GO) by omission that cant be addressed without
more work - What is 2 bad GO area after ribosome? that
example is used a lot in the paper.
36Summary Conclusions
- Do you accept all of the authors conclusions
previously presented? - Modified conclusions that you would accept
37Summary Conclusions
- Very important paper for evaluation of these
methods - Now mandatory for papers in future to address
these issues. - Authors aims achieved
- Showed the problem
- General solution proposed
- Specific solution built and disseminated
38References
- This paper, and other related papers that a BMI
student studying for quals or otherwise
interested could review.
39References
- Myers CL, Barrett DR, Hibbs MA, Huttenhower C,
Troyanskaya OG. Finding function evaluation
methods for functional genomic data. BMC
Genomics. 2006 Jul 257187. PMID 16869964 - Lin N, Wu B, Jansen R, Gerstein M, Zhao H.
Information assessment on predicting
protein-protein interactions. BMC Bioinformatics.
2004 Oct 185154.PMID 15491499 - Lee SG, Hur JU, Kim YS. A graph-theoretic
modeling on GO space for biological
interpretation of gene clusters. Bioinformatics.
2004 Feb 1220(3)381-8. Epub 2004 Jan 22. PMID
14960465 - Jansen R, Gerstein M. Analyzing protein function
on a genomic scale the importance of
gold-standard positives and negatives for network
prediction. Curr Opin Microbiol. 2004
Oct7(5)535-45. PMID 15451510 - Ben-Hur A, Noble WS. Choosing negative examples
for the prediction of protein-protein
interactions.BMC Bioinformatics. 2006 Mar 207
Suppl 1S2. PMID 16723005
40Faculty of 1000 entry
41Acknowledgments
- Thanks to those who contributed to preparation
of presentation. - Dont hesitate to contact authors of paper for
clarifications. They are usually flattered that
you are looking at their paper.
42Acknowledgments
- Maureen Hillenmeyer first brought this paper to
my attention. - Olga provided a few clarifications that I needed
after reading the paper. - BMI-exec encouraged me to do this as an example
for how we would like students to select and
present BMI JC papers this year.
43Thanks.
- insert your email address
44Thanks.
- russ.altman_at_stanford.edu