Genome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA.

About This Presentation

Title:

Genome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA.

Description:

Genome annotation techniques: new approaches and challenges,Drug Discovery Today, ... E, Fraser G, Kanapin A, Karavidopoulou Y, Kersey P, Kriventseva E, Mittard V, ... – PowerPoint PPT presentation

Number of Views:104

Avg rating:3.0/5.0

Slides: 18

Provided by: hele188

Category:

more less

Transcript and Presenter's Notes

Title: Genome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA.

1
Genome annotation techniquesnew approaches and
challenges Presented by Haili Ping

Genome annotation techniques new approaches and
challenges,Drug Discovery Today, Volume 7, Issue
11, 6 May 2002, Pages 570-576 Alistair G. Rust,
Emmanuel Mongin and Ewan Birney Loraine AE, Helt
GA.

2
Exponential increase of the amount of human
genomic sequence and genomes from other species
needs to be matched by increases in the accurate
annotation of this huge variety of
genomes Accurate annotation of the human genome
and other species is an essential element in
supporting current drug discovery
efforts Bioinformatics solutions are
increasingly required to develop automatic
annotation techniques to support and complement
the manual curation process
3

Automatic genome annotation pipelines
Primary goal is to deliver highly accurate and
reliable genome annotations, using the widest
range of evidence from available databases.
Enssence pipelines are the integration of
suites of bioinformatics software tools with
multiple databases, to manage automatically the
analysis and storage of genomic sequence
Trend
single algorithm methods ?consensus-based
approaches
combined results of gene predictors and
similarity search methods are used

4
The generic structure of an automatic genome
annotation pipeline and delivery system
5
Box 1. Useful human genome annotation and browser
URLs Automated annotation pipelines
                EBI/Sanger Institute Ensembl
Project http//www.ensembl.org/Homo_sapiens/
                NCBI Human Genome Browser
http//proxy.library.uiuc.edu3367/genome/guide/h
uman/                 The Oak Ridge National
Laboratories Genome Channel http//compbio.ornl.
gov/channel/                 Celera Discovery
System http//cds.celera.com/
Incyte Genomics Genomics Knowledge Platform
http//www.incyte.com/incyte_science/technology/g
kp/                 Paracel GeneMatcher2
System http//www.paracel.com/products/gm2.html H
uman genome browsers                 UCSC
Human Genome Browser http//genome.cse.ucsc.edu/c
gi-bin/hgGateway/                 Softberry
Genome Explorer http//www.softberry.com/berry.ph
tml?topicgenomexp                 Viaken
Enterprise Ensembl Solution http//www.viaken.co
m/ns/solutions/ensembl.html
LabBook Inc. Genomic Explorer Suite
http//www.labbook.com/products/ExplorerSuite.asp
                University of Tokyo Gene
Resource Locator Browser http//grl.gi.k.u-tokyo.
ac.jp/ Other useful sites                 The
Institute for Genomic Research (TIGR)
http//www.tigr.org/                 Human
Genome Central http//www.ensembl.org/genome/cent
ral/ and http//proxy.library.uiuc.edu3528/genom
e/central/
6

From raw sequence to gene predictions
Raw sequence pre-processing
masking known repeats and low comlexity
sequences using
RepeatMasker
identifying homology matches using BLAST
Scans for other features, such as sequence
tagged site (STS)
markers and CpG islands
Gene prediction
Predictions based on protein matches
Predictions based on DNA sequence
Ab initio gene prediction programs

7
A simplified schematic of algorithmic gene
prediction
8

Gene function characterization
Mapping to known genes
RefSeq and SWISS-PROT
HUGO (NCBI,UCSC and Ensemble)
Protein domain annotation
Pam, PRINTS, PROSITE, ProDom, BLOCKS and SMART.
Interpro project creating a unique
characterization for a given protein family,
domain or functional site. Domains of the protein
sequences can then be identified using this
signature method. The use of Interpro provides
the least-redundant and extensive annotation
currently available
Gene ontology
Gene Ontology (GO) project aims at defining such
common terms to specify molecular function,
biological process and cellular location

Sharing genome annotations
Website display and ftp sites

Chromosome 20 Overview

10
(No Transcript)
11

Pros does not require expert bioinformatics
skills and they are thus more accessible to a
wide range of researchers wishing to gain access
to genomic annotation
Cons it makes it difficult to perform
large-scale data mining
Solution enabling more experienced users to
retrieve the data they require and to run
analyses locally
Open annotation
The need for researchers to have access to
annotations available in the community and to
share their own contributions with the community
The need for a common protocol between systems
that enables genome data to be freely exchanged
the AGAVE (Architecture for Genomic Annotation,
Visualization and Exchange) and the Distributed
Annotation System (DAS) projects

Challenges facing automatic annotation systems
Data warehousing a solution for large-scale data
mining

First, the desired query statement might be too
complex to implement
Second, the computing power needed might be too
expensive in most cases for queries performed on
large, monolithic databases
Solution
the business sector using data warehousing,
which segregates information into denormalized
databases, enabling fast querying and data
retrieval.
a large variety of data-mining tools to extract
datasets of interest efficiently can result in
subsequent stages of statistical analyses or data
mining

The requirement to remain flexible

The development of automated annotation pipelines
is an evolving process.

the quality of sequences and assemblies continue
to improve,
redundant sequences are replaced with new,
superior sequences
demands
a flexible system in which new, individual
sequences can be added and analysed without
disrupting the whole system
new, improved algorithms and methodologies
demands
the architecture of a pipeline flexible to
incorporate them into the analysis process
without redesign of the system.

Future opportunities
Comparative genomics
As more genomes are sequenced and become publicly
available in the next few years, comparative
genomics will become one of the greatest areas of
development
Cross-species Analysis human-mouse
Protein coding genes are likely to be highly
conserved between closely related species (e.g.
mouse and human), and other regions, such as RNA
genes and regulatory regions, could also be
elucidated
need for the development of bioinformatics tools
Vista, Synplot and FamilyJewels
the integration of such tools with the current
automated approaches
the design of genome browsers and websites that
can intelligently display and annotate
comparative results

Integrating and delivering new data
Horizontal integration
genomic systems should be able to cross-match
species that can be sensibly compared
Vertical integration
New flows of data coming from proteomics and
microarray sources will soon have to be
incorporated

Concluding remarks
Automatic genome annotation systems
increased and is increasing.
Grounded upon central cores of bioinformatics
software tools and associated relational
databases

sequenced genomes ? integration of new genomes
into the current systems ?the demand for an
openess towards the distribution of annotation
data ?the delivery of genomic data in forms
suitable for large- scale data mining
17
References 1.Genome annotation techniques new
approaches and challenges,Drug Discovery Today,
Volume 7, Issue 11, 6 May 2002, Pages 570-576
Alistair G. Rust, Emmanuel Mongin and Ewan Birney
Loraine AE, Helt GA. 2.Discovering new genes
with advanced homology detection, Trends in
Biotechnology, Volume 20, Issue 8, 1 August 2002,
Pages 315-316 Weizhong Li and Adam Godzik
3.Biswas M, O'Rourke JF, Camon E, Fraser G,
Kanapin A, Karavidopoulou Y, Kersey P,
Kriventseva E, Mittard V, Mulder N, Phan I,
Servant F, Apweiler R. Applications of InterPro
in protein annotation and genome analysis. Brief
Bioinform. 2002 Sep3(3)285-95. PMID 12230037
PubMed - in process http//www.ebi.ac.uk/interpr
o/ 4.Visualizing the genome techniques for
presenting human genome data and annotations. BMC
Bioinformatics. 2002 Jul 303(1)19.
http//www.pubmedcentral.gov/articlerender.fcgi?to
olpubmedpubmedid12149135 5.Oshiro G, Wodicka
LM, Washburn MP, Yates JR 3rd, Lockhart DJ,
Winzeler EA. Parallel identification of new genes
in Saccharomyces cerevisiae. Genome Res. 2002
Aug12(8)1210-20. PMID 12176929 PubMed -
indexed for MEDLINE http//www.genome.org/cgi/con
tent/full/12/8/1210

Write a Comment

User Comments (0)

About PowerShow.com

Genome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA. - PowerPoint PPT Presentation

Genome annotation techniques: new approaches and challenges,Drug Discovery Today, Volume 7, Issue 11, 6 May 2002, Pages 570-576 Alistair G. Rust, Emmanuel Mongin and Ewan Birney Loraine AE, Helt GA.

Genome annotation techniques: new approaches and challenges,Drug Discovery Today, ... E, Fraser G, Kanapin A, Karavidopoulou Y, Kersey P, Kriventseva E, Mittard V, ... – PowerPoint PPT presentation