Title: The Genome Access Course Genome Analysis
1TheGenomeAccessCourseGenome Analysis
From a 13th century French Bible
2- Genome Sequencing and Assembly
- Genome Analysis
- Genomes on Display
3Milestones in Genome Sequencing
4Hierarchical vs. Whole Genome Shotgun
5Sequencing Software Examples
- Phred base-calling
- Phrap assembler
- Cross_Match
- Consed graphical editor
- AutoFinish finishing
- FPCFinger Print Clone
6(No Transcript)
7Genome Analysis
- Whole genome analysis
- Gene count
- Gene classification
- Repeat content
- Chromosomal duplications
- Multi-Genome Analysis
- Synteny
- Sequence similarity
- Gene classification comparisons
8Gene Count-How do we find genes in genomic
sequences?
Map cDNA sequences to a genome. Sim4
(http//pbil.univ-lyon1.fr/sim4.html) EST2Genome
(http//bioweb.pasteur.fr/seqanal/interfaces/est2g
enome.html) Genomewise BLAT
9Finding Genes Cont.
Gene Predictions Fgenesh (http//www.softberry.
com) GenemarkHMM (http//opal.biology.gatech.edu/
GeneMark/eukhmm.cgi) Genscan (http//genes.mit.ed
u/GENSCAN.html) Grail (http//compbio.ornl.gov/Gr
ail-1.3/) Glimmer (http//www.tigr.org/softlab/gl
immer/glimmer.html) Homology blastx
10Gene Prediction Types
Known cDNA evidence/homology Putative Gene
prediction which has homology to
known gene Unknown EST matching a gene
prediction Hypothetical Gene prediction(s)
only
11Gene Classification
- Automated
- Similarity search against an annotated database
- Swiss-Prot
- Nr
- Protein Domain search
- i. Pfam (http//www.sanger.ac.uk/Software/Pfam/)
- ii. Prosite
- iii. Prints
- iv. Prodom
- v. Interpro (http//www.ebi.ac.uk/interpro/scan.ht
ml)
12Gene Classification Cont.
- 2) Curated
- Similar to above but usually people will verify
results through literature searches
13Looking for Repeats
- RepeatMasker can find and mask repeats in DNA
sequence - RepeatMasker can be found at http//woody.embl-hei
delberg.de/repeatmask/ or http//repeatmasker.geno
me.washington.edu/ - 3. RepeatMasker is often run on genomic sequences
before doing gene predictions
14Comparative Genome Analysis
15MUMmer
- Whole genome alignments
- Compares closely related sequences
- Maximally Unique Matching subsequences
- agctcgatGGGCTTTAGACTCTCGATAggcgcagagGCTCGCTAGAATCG
CTAGATCac - agacctaaGGGCTTTAGACTCTCGATAagtctatccGCTCGCTAGAATCG
CTAGATCta
16(No Transcript)
17Segmentally duplicated regions in the Arabidopsis
genome, detected using MUMmer
Individual chromosomes are depicted as horizontal
grey bars (with chromosome 1 at the top),
centromeres are marked black. Coloured bands
connect corresponding duplicated segments.
Analysis of the genome sequence of the flowering
plant Arabidopsis thaliana. 2000.Nature
408796-815
18PIPMaker
- PIP stands for Percent Identity Plot
- Graphical view of similarity between two or more
sequences - http//bio.cse.psu.edu/pipmaker/
19Alignment
PIP Plot
Dot Plot
20(No Transcript)
21Fugu PTEN
2-6
A
B
1
7
8
9
5
100
100
H. sapiens
50
50
M. musculus
D. melanogaster
C. briggsae
C. elegans
A. thaliana 2
A. thaliana 3
L. major
S. pombe
2kb
4kb
6kb
8479
1
X. laevis
1239
1
22Vista
- Similar to PipMaker
- http//www-gsd.lbl.gov/vista/
23(No Transcript)
24(No Transcript)
25Genomes on Display
- UCSC Browser
- Ensembl browser
- NCBI Browser
- GMOD
26UCSC Browser
27Ensembl Browser
28NCBI Browser
- http//www.ncbi.nlm.nih.gov/cgi-bin/Entrez/hum_src
h?chrhum_chr.infquery
29GMOD
- Generic Model Organism Database
- Attempt to make a common set of tools for
databases/browsers for various species - www.gmod.org
30(No Transcript)