Title: The Zebrafish Genome Sequencing Project Bioinformatics resources
1The Zebrafish Genome Sequencing Project
Bioinformatics resources
Kerstin Howe, Mario Caccamo, Ian Sealy
2Bioinformatics resources
- outline
- clone mapping, sequencing and manual annotation
in - genome assemblies and automated annotation in
- integrated ZF-Models data and tools
3Clone mapping and sequencing
- mapping
- 2 BAC Tuebingen libraries
- 1 BAC and 1 cosmid library from single Tuebingen
double-haploid fish - end sequencing, RH mapping, fingerprinting
- pieced together according to fingerprints,
marker mapping, sequence alignment - currently 2500 ctgs
4Clone mapping and sequencing
- sequencing pipeline
- select clones based on position in fpc contig
- subcloning
- sequencing
- automatical assembly/pre-finishing
- (back to sequencing if necessary)
- finishing
- QC
- automated analysis pipeline
- manual annotation
- submission to EMBL
5Manual annotation
unfinished sequence finished
sequence automated analysis pipeline manual
annotation
6Manual annotation
- annotation policy
- follows guidelines for human annotation (havana
team, Sanger Institute) - no "guesses", annotations solely based on
supporting evidence - annotation of CDSs and UTRs / transcripts
- splice variants
- pseudogenes
- poly A features
- transposons
- repeats
- approved nomenclature (SIclone.number)
- collaboration with ZFIN
- existing ZFIN records are reported
- ZFIN provides new records for newly found genes
7Manual annotation
8vega.sanger.ac.uk
9Vega
contigview
10Vega
geneview
11www.sanger.ac.uk/Projects/D_rerio
12www.sanger.ac.uk/Projects/D_rerio
13when to use what
- go to vega.sanger.ac.uk if you need
- highly reliable sequence
- highly reliable annotation (with your input)
- your gene stable over time (TILLING)
- go to www.ensembl.org if you need
- the whole genome
- comparative data
- ZF-Models microarray or insertional mutagenesis
data - complicated searches (BioMart)
14Zebrafish Genome Project
whole genome shotgun sequencing
clone mapping and sequencing
WGS reads
integration
(un)finished clones
assembly release (Zv5)
8,000 finished clones (1 Gb)
automatic annotation
manual annotation
15WGS assembly
Phusion assembler - High Performance Assembly
Group (Zemin Ning et al.)
reads
group reads
contig
contig
contig
contig
contig
supercontig
supercontig
supercontig
supercontig
16Read grouping
continuous base hash - k12 ATGGCGTGCAGTCCATGTTCG
GATCA ATGGCGTGCAGT TGGCGTGCAGTC GGCGTGCAGTCC
GCGTGCAGTCCA
gap hash k12 (4x3) - dealing with
variation ATGGCGTGCAGTCCATGTTCGGATCA ATGGCGTGCAGTC
CATGT TGGCGTGCAGTCCATGTT GGCGTGCAGTCCATGTTC
GCGTGCAGTCCATGTTCG
17Zebrafish Genome Project
whole genome shotgun sequencing
clone mapping and sequencing
WGS reads
integration
(un)finished clones
assembly release (Zv5)
7,000 finished clones (1 Gb)
automatic annotation
manual annotation
18Integration
BACs
BX005049.6
BX005123.6
BX005153
BX005057.8
fpc contig
Zv5 scaffoldn
19Assemblies
Zv5 Zv4 Zv3 Zv2
release date assembly 27.05.05 12.07.04 27.11.03 03.04.03
total length bp 1,630,306,866 1,592,025,686 1,459,115,486 1,452,210,772
scaffolds 16,214 21,333 58,339 83,470
finished clones 4,519 (699 Mb) 2.828 (443 Mb) 1,502 (263Mb) -
scaffolds in chr 1-25 1,749 1,892 1,490 -
scaffolds in fpc contigs 265 (chrU) 694 (chrU) 1,842 5,677
NA scaffolds 14,676 18,747 54,798 77,793
sum(length) chr 1-25 bp 1,200,129,620 (73) 1,097,507,810 (69) 718,270,423 (49) -
sum(length) ctgs 183,993,739 (11) 176,222,396 (11) 365,271,659 (25) 1,143,459,008
sum(length) NAs 246,183,507 (16) 318,295,480 (20) 335,615,307 (23) 308,751,764
20Automatic Annotation
21Ensembl
22Contigview
23Geneview
24Searching Ensembl
25Biomart
26(No Transcript)
27Dos and Donts
go elsewhere (Ensembl) if you want to know about
the whole genome need comparative data need
ZF-Models microarray or insertional mut data
need to do complicated searches go to Vega if
you need highly reliable sequence need highly
reliable annotation need your gene stable over
time (TILLING)
28DAS
genome browser
local storage
reference sequence
XML
29SNPs and Indels
30Ensembl releases