Title: Reference Genome Project
 1Reference Genome Project
ZFIN 
 2Purpose 
- - Provide comprehensive annotation for 12 
genomes  - Arabidopsis thaliana 
 - Caenorhabditis elegans 
 - Danio rerio 
 - Dictyostelium discoideum 
 - Drosophila melanogaster 
 - Escherichia coli 
 - Gallus gallus 
 - Homo sapiens 
 - Mus musculus 
 - Rattus norvegicus 
 - Saccharomyces cerevisiae 
 - Schizosaccharomyces pombe 
 
- Those organisms were selected because they are 
 - established model organisms with published 
experimental data  - have a genome database 
 - have experienced GO curators
 
  3Questions for Advisors
- What criteria should we use to collect and 
prioritize genes for the reference genomes?  - What measures would be effective for assessing 
progress on the reference genome projects?  - What level of accuracy and level of detail for 
individual genes, and coverage of all functional 
elements across the genome?  
  4Complete genome annotation
- Breadth every gene in the 12 genomes be 
annotated  - Depth every gene be annotated to the highest 
level of experimental knowledge in that organism 
-  The group has agreed that depth of annotation 
is best assessed by the curator annotating the 
gene. - If a gene has less than 5-10 papers, it 
makes sense to read and annotate all papers  - - If a gene has a lot of literature, the 
preferred strategy is to look at a recent review 
to make sure all important primary literature is 
captured more recent papers be read 
  5Metrics assessing breadth and depth of 
annotations 
- Breadth 
 -  Number of genes (protein coding and functional 
RNAs based on SO)  -  Number of genes with some functional annotation 
 -  Number of genes with functional annotation based 
on experiments using that organism  -  Number of genes with function inferred by 
sequence similarity  -  Number of genes with function inferred by 
electronic annotations  -  Number of genes for which there is no available 
information (root/ND annotations)  - Depth 
 -  Number of papers linked to a gene 
 -  Number of papers used to produce functional 
annotation  -  Number of papers read but for which no new 
annotations were produced.  -  Ratio of deepest annotation to leaf node to 
measure granularity and use of the ontology (Suzi) 
http//gocwiki.geneontology.org/index.php/Metrics
_breath_and_depth_of_annotations 
 6Figures for Reference Genome genes, completely 
fictitious data
Figures for Whole Genome genes, completely 
fictitious data 
 7Mike Cherry 
 8(No Transcript) 
 9Measuring Information Content
Chris Mungall 
 10Information Content from an Ontology Perspective 
 11Priorities Selection of curation targets
- Genes that, when mutated, cause a disease 
 - Not included upregulated in cancer x, 
interacts with tumor suppressor y, and other 
weak evidence  - Disease gene lists - OMIM- RGD disease portal 
first group neurological diseases  - Other lists - list of common genes between 
human, fly and zebrafish that were being used as 
a test case for PATO annotations many were not 
in OMIM revisit??  - Current status trying to focus on genes with 
the broadest interest, however these often lack 
orthologs in yeast, E. coli, etc, so need to 
balance these factors.  - ADVICE HOW TO BALANCE?
 
  12Orthologs
- Curators for each database are responsible for 
identifying orthologs of the selected gene 
(currently prioritized by OMIM disease set)  - Available tools - YOGY- InParanoid- 
OrthoMCL- TreeFam- Homologene  - Sequence analysis by curators 
 - REFERENCE GENOME MEETING WED  THURSmajor 
discussion topic 
  13Software
- Google spreadsheet - shared by all curators- 
each database keeps track of putative orthologs- 
each database records the curation status for 
each gene  - Software requirements - Ensures consistent use 
of identifiers- Allow loading of MOD reports- 
Track that no ortholog was found- Provide 
reports to focus curation effort- Record that 
curation is 'comprehensive' as of a certain 
date- Allow a 1many relation between Human gene 
and MOD ortholog- Record orthology determination 
method  - Software Group currently developing
 
  14http//rails-dev.bioinformatics.northwestern.edu2
4000/index.html 
 15Annotation Progress
-  Curation software will be able to generate that 
information  -  We would like to display the list of selected 
genes, the list of identified orthologs, the 
curation status and a way to access annotations 
(graphs)  
  16Annotation Consistency Comparing annotations 
 17(No Transcript) 
 18Ontology development
Number of Source Forge requests in the "Reference 
Genome" group 
 19Outreach publicizing the reference genome effort
- Several suggestions 
 - GO newsletter (already have the gene of the 
quarter) could add diseases  - NCBI/OMIM could display/advertise genes with 
annotations  - Take advantage of user requests that fit nicely 
in the initiative  - Set up a reference genome wiki page showing which 
genes are coming up for annotation, which could 
also be used by researchers to suggest target 
genes  - Make a page on the GO website that would include 
diseases genes we are curating and the gene of 
the quarter articles  - Special display in AmiGO 
 - Provide annotations in a separate file 
 - Mark disease genes specifically in MODs 
 
http//gocwiki.geneontology.org/index.php/Outreach
_publicizing_the_project_and_developing_a_web_pre
sence 
 20Goals for Upcoming Year
- Continued curation 
 - at least 250 additional genes 
 - Review priorities for target selection 
 - Software and database implemented 
 - Increased visibility 
 - Web presence 
 - Paper 
 - Integration with GO database 
 - Meetings 
 - Metrics established and tracked 
 
  21Questions for Advisors
- What criteria should we use to collect and 
prioritize genes for the reference genomes?  - What measures would be effective for assessing 
progress on the reference genome projects?  - What level of accuracy and level of detail for 
individual genes, and coverage of all functional 
elements across the genome?