Title: Genetic Literature Curation at FlyBaseCambridge
1Genetic Literature Curation at FlyBase-Cambridge
A Database of Drosophila Genes Genomes
- Steven Marygold
- ABC meeting, December 2007
2Talk Outline
- Group Structure
- The FlyBase bibliography
- Prioritizing curation
- Curation practice
- Curation support
- Future directions
3Talk Outline
- Group Structure
- The FlyBase bibliography
- Prioritizing curation
- Curation practice
- Curation support
- Future directions
4Group structure
FlyBase
- FB-Indiana
- website
- fly stocks
- image curation
- FB-Harvard
- database
- genome annotation
- expression curation
- FB-Cambridge
- bibliography
- gene and phenotype curation
- ontologies
Principal Investigators Michael Ashburner Nick
Brown
Group Manager Steven Marygold
GO Curator 1 FTE
Reactome Curator 1 FTE
Literature Curators 3.25 FTEs
Developer 1 FTE
FB Ontology Editor 0.25 FTE
5Talk Outline
- Group Structure
- The FlyBase bibliography
- Prioritizing curation
- Curation practice
- Curation support
- Future directions
6Bibliography
- Search for string Drosophil in title, abstract
or keywords - Semi-automated search of publication databases
- Medline, BIOSIS, ZooRec
- Manual searches of journal issues
7Talk Outline
- Group Structure
- The FlyBase bibliography
- Prioritizing curation
- Curation practice
- Curation support
- Future directions
8Curation prioritization
- Types of publication curated
- Primary research papers
- Supplemental information
- Errata
- Personal communications to FlyBase
- Conference abstracts
- Reviews
- Books/Book chapters
- Miscellaneous others
9Curation prioritization
- Prioritization of selected journals
- Set of (50) journals publishing on Drosophila
biology - Chronological, issue by issue curation
- Prioritization of selected papers
- Flagged by skim curation
- Flagged by stock center
- Genes prioritized by GO project
- Alerted to by research community
10Talk Outline
- Group Structure
- The FlyBase bibliography
- Prioritizing curation
- Curation practice
- Curation support
- Future directions
11Curation practice
Read abstract skim-read intro
Identify/select relevant paper
Access pdf
Highlight curatable material within Results,
Methods, Figures legends, Tables
Curate material into individual proformae to
form a curation record
- Error-checking
- spelling
- consistency
- validity
Completed records submitted for loading into
Chado database
12Curation practice
- Curated data classes (proforma types)
- Publication
- Gene
- Allele
- Aberration
- Transgenic constructs
- Transgenic insertions
- Natural transposons
13Curation practice
- Gene-level curated data
- valid FlyBase gene symbol/name
- gene symbol/name used in paper
- action gene rename or merge
- action creation or deletion of gene
- etymology of gene name
- Sequence Ontology (SO) terms
- cytological map position
- relationship to cDNA/genomic clone
- Gene Ontology (GO) terms
- y/n flags to indicate paper has expression or
annotation information
14Curation practice
- Allele-level curated data
- valid FlyBase allele symbol/name
- allele symbol/name used in paper
- action allele rename or merge
- action creation or deletion of allele
- allele class
- mutagen
- nucleotide/amino acid changes
- phenotype class, anatomy, free text
- genetic interaction class, anatomy, free text
- complementation data
- associated transgenic construct/insertion
- associated tag
15Curation practice
- ! GENE PROFORMA Version 50 05 Oct 2007!
- ! G1a. Gene symbol to use in database
ey - ! G1b. Gene symbol used in reference
ey - ! G24a. GO -- Cellular component evidence CV
- ! G24b. GO -- Molecular function evidence CV
calcium channel activity GO0005262 IDA - ! G24c. GO -- Biological process evidence CV
eye-antennal disc development GO0035214 IMP - ! ALLELE PROFORMA Version 39 6 July 2007!
- ! GA1a. Allele symbol to use in database
ey46 - ! GA1b. Allele symbol used in paper
ey461 - ! GA56. Phenotypic dominance class bipartite
CV visible recessive - ! GA17. Phenotype CV, body part(s) where
manifest eye - anterior vertical bristle
16Talk Outline
- Group Structure
- The FlyBase bibliography
- Prioritizing curation
- Curation practice
- Curation support
- Future directions
17Curation support
- Curation support files
- Text files of data from latest DB instance
- Ontology files
- GO, SO, FB-anatomy, FB-phenotypes etc.
- PeeVeS
- Proforma Validation Software
- Other custom scripts
18Future directions
- More paper-by-paper prioritization
- Skim curation
- Manual curation
- Automated curation?
- User-submitted data
- Use of text-mining aids for deep curation
- Review breadth and depth of curation
- Enhanced curation interface
19Acknowledgements
- FB-Cambridge
- Michael Ashburner (co-PI)
- Nick Brown (co-PI)
- Steven Marygold (Manager)
- Gillian Millburn (Literature curator)
- David Osumi-Sutherland (Ontology Editor and
Literature curator) - Ruth Seal (Literature curator)
- Peter McQuilton (Literature curator)
- Paul Leyland (Developer)
- Susan Tweedie (GO curator)
- Mark Williams (Reactome curator)
- Rachel Drysdale (former FB-Cambridge co-PI)
- Genetics Dept., University of Cambridge, UK
- The FlyBase Consortium
- NHGRI at the NIH