Genome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA. - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Genome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA.

Description:

Genome annotation techniques: new approaches and challenges,Drug Discovery Today, ... E, Fraser G, Kanapin A, Karavidopoulou Y, Kersey P, Kriventseva E, Mittard V, ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 18
Provided by: hele188
Category:

less

Transcript and Presenter's Notes

Title: Genome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA.


1
Genome annotation techniquesnew approaches and
challenges Presented by Haili Ping
  • Genome annotation techniques new approaches and
    challenges,Drug Discovery Today, Volume 7, Issue
    11, 6 May 2002, Pages 570-576 Alistair G. Rust,
    Emmanuel Mongin and Ewan Birney Loraine AE, Helt
    GA.

2
Exponential increase of the amount of human
genomic sequence and genomes from other species
needs to be matched by increases in the accurate
annotation of this huge variety of
genomes Accurate annotation of the human genome
and other species is an essential element in
supporting current drug discovery
efforts Bioinformatics solutions are
increasingly required to develop automatic
annotation techniques to support and complement
the manual curation process
3
  • Automatic genome annotation pipelines
  • Primary goal is to deliver highly accurate and
    reliable genome annotations, using the widest
    range of evidence from available databases.
  • Enssence pipelines are the integration of
    suites of bioinformatics software tools with
    multiple databases, to manage automatically the
    analysis and storage of genomic sequence
  • Trend
  • single algorithm methods ?consensus-based
    approaches
  • combined results of gene predictors and
    similarity search methods are used

4
The generic structure of an automatic genome
annotation pipeline and delivery system
5
Box 1. Useful human genome annotation and browser
URLs Automated annotation pipelines
                EBI/Sanger Institute Ensembl
Project http//www.ensembl.org/Homo_sapiens/
                NCBI Human Genome Browser
http//proxy.library.uiuc.edu3367/genome/guide/h
uman/                 The Oak Ridge National
Laboratories Genome Channel http//compbio.ornl.
gov/channel/                 Celera Discovery
System http//cds.celera.com/                
Incyte Genomics Genomics Knowledge Platform
http//www.incyte.com/incyte_science/technology/g
kp/                 Paracel GeneMatcher2
System http//www.paracel.com/products/gm2.html H
uman genome browsers                 UCSC
Human Genome Browser http//genome.cse.ucsc.edu/c
gi-bin/hgGateway/                 Softberry
Genome Explorer http//www.softberry.com/berry.ph
tml?topicgenomexp                 Viaken
Enterprise Ensembl Solution http//www.viaken.co
m/ns/solutions/ensembl.html                
LabBook Inc. Genomic Explorer Suite
http//www.labbook.com/products/ExplorerSuite.asp
                University of Tokyo Gene
Resource Locator Browser http//grl.gi.k.u-tokyo.
ac.jp/ Other useful sites                 The
Institute for Genomic Research (TIGR)
http//www.tigr.org/                 Human
Genome Central http//www.ensembl.org/genome/cent
ral/ and http//proxy.library.uiuc.edu3528/genom
e/central/
6
  • From raw sequence to gene predictions
  • Raw sequence pre-processing
  • masking known repeats and low comlexity
    sequences using
  • RepeatMasker
  • identifying homology matches using BLAST
  • Scans for other features, such as sequence
    tagged site (STS)
  • markers and CpG islands
  • Gene prediction
  • Predictions based on protein matches
  • Predictions based on DNA sequence
  • Ab initio gene prediction programs

7
A simplified schematic of algorithmic gene
prediction
8
  • Gene function characterization
  • Mapping to known genes
  • RefSeq and SWISS-PROT
  • HUGO (NCBI,UCSC and Ensemble)
  • Protein domain annotation
  • Pam, PRINTS, PROSITE, ProDom, BLOCKS and SMART.
  • Interpro project creating a unique
    characterization for a given protein family,
    domain or functional site. Domains of the protein
    sequences can then be identified using this
    signature method. The use of Interpro provides
    the least-redundant and extensive annotation
    currently available
  • Gene ontology
  • Gene Ontology (GO) project aims at defining such
    common terms to specify molecular function,
    biological process and cellular location

9
  • Sharing genome annotations
  • Website display and ftp sites
  •   

Chromosome 20    Overview

10
(No Transcript)
11
  • Pros does not require expert bioinformatics
    skills and they are thus more accessible to a
    wide range of researchers wishing to gain access
    to genomic annotation
  • Cons it makes it difficult to perform
    large-scale data mining
  • Solution enabling more experienced users to
    retrieve the data they require and to run
    analyses locally
  • Open annotation
  • The need for researchers to have access to
    annotations available in the community and to
    share their own contributions with the community
  • The need for a common protocol between systems
    that enables genome data to be freely exchanged
  • the AGAVE (Architecture for Genomic Annotation,
    Visualization and Exchange) and the Distributed
    Annotation System (DAS) projects

12
  • Challenges facing automatic annotation systems
  • Data warehousing a solution for large-scale data
    mining
  • First, the desired query statement might be too
    complex to implement
  • Second, the computing power needed might be too
    expensive in most cases for queries performed on
    large, monolithic databases
  • Solution
  • the business sector using data warehousing,
    which segregates information into denormalized
    databases, enabling fast querying and data
    retrieval.
  • a large variety of data-mining tools to extract
    datasets of interest efficiently can result in
    subsequent stages of statistical analyses or data
    mining

13
  • The requirement to remain flexible

The development of automated annotation pipelines
is an evolving process.
  • the quality of sequences and assemblies continue
    to improve,
  • redundant sequences are replaced with new,
    superior sequences
  • demands
  • a flexible system in which new, individual
    sequences can be added and analysed without
    disrupting the whole system
  • new, improved algorithms and methodologies
  • demands
  • the architecture of a pipeline flexible to
    incorporate them into the analysis process
    without redesign of the system.

14
  • Future opportunities
  • Comparative genomics
  • As more genomes are sequenced and become publicly
    available in the next few years, comparative
    genomics will become one of the greatest areas of
    development
  • Cross-species Analysis human-mouse
  • Protein coding genes are likely to be highly
    conserved between closely related species (e.g.
    mouse and human), and other regions, such as RNA
    genes and regulatory regions, could also be
    elucidated
  • need for the development of bioinformatics tools
  • Vista, Synplot and FamilyJewels
  • the integration of such tools with the current
    automated approaches
  • the design of genome browsers and websites that
    can intelligently display and annotate
    comparative results

15
  • Integrating and delivering new data
  • Horizontal integration
  • genomic systems should be able to cross-match
    species that can be sensibly compared
  • Vertical integration
  • New flows of data coming from proteomics and
    microarray sources will soon have to be
    incorporated

16
  • Concluding remarks
  • Automatic genome annotation systems
  • increased and is increasing.
  • Grounded upon central cores of bioinformatics
    software tools and associated relational
    databases

sequenced genomes ? integration of new genomes
into the current systems ?the demand for an
openess towards the distribution of annotation
data ?the delivery of genomic data in forms
suitable for large- scale data mining
17
References 1.Genome annotation techniques new
approaches and challenges,Drug Discovery Today,
Volume 7, Issue 11, 6 May 2002, Pages 570-576
Alistair G. Rust, Emmanuel Mongin and Ewan Birney
Loraine AE, Helt GA. 2.Discovering new genes
with advanced homology detection, Trends in
Biotechnology, Volume 20, Issue 8, 1 August 2002,
Pages 315-316 Weizhong Li and Adam Godzik
3.Biswas M, O'Rourke JF, Camon E, Fraser G,
Kanapin A, Karavidopoulou Y, Kersey P,
Kriventseva E, Mittard V, Mulder N, Phan I,
Servant F, Apweiler R. Applications of InterPro
in protein annotation and genome analysis. Brief
Bioinform. 2002 Sep3(3)285-95. PMID 12230037
PubMed - in process http//www.ebi.ac.uk/interpr
o/ 4.Visualizing the genome techniques for
presenting human genome data and annotations. BMC
Bioinformatics. 2002 Jul 303(1)19.
http//www.pubmedcentral.gov/articlerender.fcgi?to
olpubmedpubmedid12149135 5.Oshiro G, Wodicka
LM, Washburn MP, Yates JR 3rd, Lockhart DJ,
Winzeler EA. Parallel identification of new genes
in Saccharomyces cerevisiae. Genome Res. 2002
Aug12(8)1210-20. PMID 12176929 PubMed -
indexed for MEDLINE http//www.genome.org/cgi/con
tent/full/12/8/1210
Write a Comment
User Comments (0)
About PowerShow.com