Journal Club: Title of Paper by Authors Bibliographic reference PowerPoint PPT Presentation

presentation player overlay
1 / 44
About This Presentation
Transcript and Presenter's Notes

Title: Journal Club: Title of Paper by Authors Bibliographic reference


1
Journal Club Title of Paper by
AuthorsBibliographic reference
  • Your Name
  • Date
  • Distribute the paper before your talk, for
    motivated listeners to read

2
BMI Journal Club Finding function
evaluation methods for functional genomic
dataMyers, Barrett, Hibbs, Huttenhower
TroyanaskayaBMC Genomics 2006 7187
  • Russ B. Altman
  • 10/4/06

3
Why this paper?
  • Brief bullet points about why this paper is a
    good BMI journal club paper, and why you selected
    it

4
Why this paper?
  • Needed a good methodological paper
  • Proliferation of work here and elsewhere on
    predicting gene function from high throughput
    genomics
  • This paper addresses an important problem in
    evaluation, and uses general informatics
    principles
  • Olga is a recent BMI graduate )

5
Informatics Problem
  • Describe what is the general biomedical
    informatics question/problem addressed in the
    paper
  • Brief review of what others have done to solve
    this problem, and how performance has been. THIS
    MAY REQUIRE READING OTHER PAPERS!
  • Why is there another paper on this topic?

6
Informatics Problem
  • Whenever a method is created that makes
    predictions or diagnoses it must be evaluated
    against a gold standard of truth.
  • When making multiple predictions, there can be
    biases in the gold standard based on its coverage
    of the predicted space
  • The resulting reports of performance can vary
    widely and unpredictably based on which parts of
    the gold standard are used.
  • This is a relatively new problem in the context
    of large scale predictive technologies

7
Informatics Problem
  • What is the best way to evaluate a system making
    thousands or millions of predictions?
  • How can we level the playing field so that
    different methods and data sources can be
    assessed with respect to information content
    fairly?

8
Potentially confounding biomedicine! )
  • What is application area of biology or medicine
    in which this work is presented?
  • Discussion of the biological or medical problem
    that drove/required/suggested researchers to
    recognize potential for informatics innovation
  • What is the significance of this biomedical
    problem
  • REMEMBER TO SEPARATE THE INFORMATICS FROM THE
    BIOMEDICAL APPLICATION. THAT MAY LEAVE NOTHING

9
(Potentially confounding) biomedical background
  • With the human genome sequenced, we need to
    understand the interactions and functions of
    genes (for understanding, drug-design
  • High-throughput experimental data sets are used
    and integrated for this purpose two-hybrid,
    mRNA expression, affinity precipitation
  • Diverse algorithms are also created for
    integrating these data
  • Naïve Bayes (Troyanskaya others)
  • Probabilistic Relational Models (Koller)
  • Comparative techniques (Segal Stuart)

10
More biology context
  • It is critical to assemble networks of
    interacting and functionally related genes in
    order to generate hypotheses about cellular
    biology, identify drug targets, assess pathway
    engineering opportunities.
  • Yeast is the best-studied organism because of the
    wealth of data sets
  • Authors suspect that use of existing silver
    standards may skew conclusions about high vs.
    low information content methods/data sources.
  • Scientists are frustrated if many predictions are
    high confidence and then fail in the lab.

11
Background
  • Review of informatics and biomedicine people
    need to know in order to understand the key
    contributions of the paper

12
Background
  • Gene Ontology
  • Taxonomy of gene function, 30K terms
  • Terms assigned to genes manually genes related
    if they get the same term
  • KEGG
  • Database of biological pathways
  • Mostly metabolic, manually curated
  • Genes in same pathway related
  • Each of these provides a biased coverage of gene
    function space!

13
Uneven gold-standard
14
Different conclusions from different silver
standards
15
Background
  • GO is organized from most general (top) to most
    specific (bottom)
  • For validation, people often choose a level of
    GO at which they define GO annotations to be
    meaningful.
  • E.g. All GO codes at level 5 or below
    sufficiently precise predictions.

16
(No Transcript)
17
Wide variability in GO depth annotation frequency
18
Aims of Paper
  • As in BMI 212, a listing of the specific aims of
    the paper. No more than 3 usually (often less).
  • NOTE the paper should be presented initially
    in the most positive light, as the authors would
    have presented it. The time for critique is
    after the author perspective presentation.

19
Aims of Paper
  • Define the problem of biased gold standards in
    high-throughput evals.
  • Create a method for comparing prediction methods
    fairly
  • Build a manual gold standard and associated web
    tool
  • Allow evaluations to report not only overall
    performance, but area-specific performance.

20
Methods Employed
  • This is the key part of the presentation for BMI
    crowd. This should be a presentation of the
    methods described in the paper at sufficient
    technical level so people can discuss and
    evaluate it. Avoid detailed math/equations
    unless absolutely critical to the discussion.

21
Methods Employed
  • 6 post-doctoral biologists
  • Examine every GO code and vote on informative
    or not informative if applied to a gene
  • 3 informative votes useful category
  • lt1 informative and gt1000 annotations not
    useful category
  • Not usefuls are key denominator for
    computations of precision/specificity

22
Results
  • Recapitulate major results. Usually by
    presenting main figures from the paper.

23
Results of selecting GO codes manually for Gold
Standard
24
Methods
  • With gold standard GO codes that they trust,
    can now analyze methods/data sources and give
    specific performance report on different areas
    (of biology).
  • Can also systematically remove GO topics in order
    to see if there are dominant effects (e.g. remove
    ribosomes)

25
Comparison of methods using new gold standard
26
New method to compare/assess methods
27
GRIFn website available (?)
28
Authors Conclusions
  • A presentation of how the authors summarize
    their results and significance. Usually not more
    than 3 major points. Often one.

29
Authors Conclusions
  • Curated GO codes now provide more trustworthy
    gold-standard
  • Allows tools to be built that give
  • Overall performance
  • Subarea-specific breakdown of performance
  • Direct comparison of different methods/data
    sources
  • Sets the bar on evaluation, and starts a
    discussion about community-wide standards.

30
Assessment of Paper Informatics
  • What are the major methodological (engineering)
    innovations in the paper, in your opinion?
  • Are the methods presented soundly, completely,
    and evaluated appropriately?
  • How general are the methods presented for use in
    other areas either directly or with some effort
    by others?

31
Assessment of Paper Informatics
  • Beautiful description and justification of the
    work. Clearly a general problem.
  • Well informed by research in the field, and
    evaluation of problems that arise in eval.
  • Solution applicable in many domains
  • Close (KEGG, NLP, others)
  • Farther (Any large volume prediction activity)
  • Some bias in expert-based gold standards
  • Very good availability of specific tool to allow
    use (cf. Maureen)

32
Assessment of paper Biomedicine
  • Has the paper helped make a new contribution of
    biomedical knowledge?
  • What is the domain significance of this paper?
  • Was it published in the right journal to find
    the audience who should care about it the most?

33
Assessment of paper Biomedicine
  • Should greatly reduce the noise in papers about
    high-throughput predictions
  • Should create a new bar for performance
  • Systems biology and interaction informatics
    workers need to pay attention.
  • Microarray information content may be lower than
    thought previously on average
  • Genomics audience is a good one, since they need
    to be aware of these relatively sophisticated
    informatics issues.

34
Detailed Concerns
  • Particularly if you dont like the paper, what
    are your technical informatics concerns about the
    method, implementation or evaluation?

35
Detailed Concerns
  • A little confused about negative gold standard
    and how it is meant to be used. (Email in to
    Olga)
  • There are still biases in the gold standard (e.g.
    GO) by omission that cant be addressed without
    more work
  • What is 2 bad GO area after ribosome? that
    example is used a lot in the paper.

36
Summary Conclusions
  • Do you accept all of the authors conclusions
    previously presented?
  • Modified conclusions that you would accept

37
Summary Conclusions
  • Very important paper for evaluation of these
    methods
  • Now mandatory for papers in future to address
    these issues.
  • Authors aims achieved
  • Showed the problem
  • General solution proposed
  • Specific solution built and disseminated

38
References
  • This paper, and other related papers that a BMI
    student studying for quals or otherwise
    interested could review.

39
References
  • Myers CL, Barrett DR, Hibbs MA, Huttenhower C,
    Troyanskaya OG. Finding function evaluation
    methods for functional genomic data. BMC
    Genomics. 2006 Jul 257187. PMID 16869964
  • Lin N, Wu B, Jansen R, Gerstein M, Zhao H.
    Information assessment on predicting
    protein-protein interactions. BMC Bioinformatics.
    2004 Oct 185154.PMID 15491499
  • Lee SG, Hur JU, Kim YS. A graph-theoretic
    modeling on GO space for biological
    interpretation of gene clusters. Bioinformatics.
    2004 Feb 1220(3)381-8. Epub 2004 Jan 22. PMID
    14960465
  • Jansen R, Gerstein M. Analyzing protein function
    on a genomic scale the importance of
    gold-standard positives and negatives for network
    prediction. Curr Opin Microbiol. 2004
    Oct7(5)535-45. PMID 15451510
  • Ben-Hur A, Noble WS. Choosing negative examples
    for the prediction of protein-protein
    interactions.BMC Bioinformatics. 2006 Mar 207
    Suppl 1S2. PMID 16723005

40
Faculty of 1000 entry
41
Acknowledgments
  • Thanks to those who contributed to preparation
    of presentation.
  • Dont hesitate to contact authors of paper for
    clarifications. They are usually flattered that
    you are looking at their paper.

42
Acknowledgments
  • Maureen Hillenmeyer first brought this paper to
    my attention.
  • Olga provided a few clarifications that I needed
    after reading the paper.
  • BMI-exec encouraged me to do this as an example
    for how we would like students to select and
    present BMI JC papers this year.

43
Thanks.
  • insert your email address

44
Thanks.
  • russ.altman_at_stanford.edu
Write a Comment
User Comments (0)
About PowerShow.com