5 Open Problems in Bioinformatics - PowerPoint PPT Presentation

About This Presentation
Title:

5 Open Problems in Bioinformatics

Description:

Comment: Obvious parallel to Wiuf-Hein99 reformulation of Hudson's 1983 algorithm ... http://scop.mrc-lmb.cam.ac.uk/scop/ Summary. Pedigrees from Genomes ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 25
Provided by: Office20093
Category:

less

Transcript and Presenter's Notes

Title: 5 Open Problems in Bioinformatics


1
5 Open Problems in Bioinformatics
  • Pedigrees from Genomes
  • Comparative Genomics of Alternative Splicing
  • Viral Annotation
  • Evolving Turing Patterns
  • Protein Structure Evolution

2
From genomes to pedigrees
Coalescent Rebombination process
Seqeunce/Individual Boundary
Pedigree process
Three Processes
  • Recombination
  • Choosing Parents
  • The Mutational Process

From Yun Song
3
Probability of Data given a pedigree.
Elston-Stewart (1971) -Temporal Peeling
Algorithm
Father
Mother
Condition on parental states Recombination and
mutation are Markovian
Lander-Green (1987) - Genotype Scanning
Algorithm
Father
Mother
Condition on paternal/maternal inheritance Recombi
nation and mutation are Markovian
Comment Obvious parallel to Wiuf-Hein99
reformulation of Hudsons 1983 algorithm
4
Benevolent Mutation and Recombination Process
Genomes with r and m/r --gt infinity r -
recombination rate, m - mutation rate
  • Counting within a small interval would reveal the
    length of the path connecting the two segments.
  • Siblings are readily revealed, since they will
    have segments with 2m density of mutations
  • The distribution of path lengths are readily
    observable between two sequences
  • All embedded phylogenies are observable

5
From Phylogenies to Pedigrees Mikes counter
example, linkage and individuals
Gluing Phylogenies together
Sibling Sequences come from different parents.
Different Pedigrees Same Phylogenies
Individual 1
?
A recombinants parent are sister sequences.
grandparents
Individual 2
6
Comparative Genomics of Alternative Splicing
7
From Transcripts to the AS-Graph
  • How well known is the AS-graph as a function
    number of transcripts?
  • A family and distribution of transcripts, can
    they be explained an AS-graph with probabilities
    at donor sites or do we need probabilities for
    (donor,acceptor) pairs? Or possibly even more
    complicated situations. And is sampling
    transcripts good enough to distinguish these
    situations.

8
Mini-project reliability of AS-detection.
  • Choose Idealized AS-Graph
  • Genome
  • Choose donor and acceptor sites in random pairs.
  • For each possible splice pair assign probability
    for choosing it.
  • This should define a probability for all
    transcripts.
  • Generate a set of transcripts.
  • Reconstruct AS-Graph.
  • Key questions
  • How many transcripts must be sampled to detect
    AS.
  • How well will the AS-Graph be recovered?

9
Optimal DAG (directed acyclic graph) under
restrictions
  • Finding a set of annotations
  • Find set of paths, maximizing sum of scores.
  • The score of minimal path must be above
    threshold.
  • Two paths must differ significantly An enclosed
    area, the maximal height must be d higher than
    the boundary defining it. Height(i,j) di,j
    di,j
  • Does known AS genes have more CTO structure than
    non-AS genes?
  • Do the AS correspond to the CTO structure
  • Is the CTO structure evolutionary conserved?

10
Phylogenetically related ASGs
  • Is ASG conserved?
  • What is conserved?
  • How is selection along position dependent on
    splicing status?

11
Virus Annotation
Classes of Gene Structures
http//www.tulane.edu/dmsander/WWW/335/Diarrhoea.
html
Diarrhoea Causing Arrangements
Illustrating the 3 main classes of gene
structures Unidirectional, Convergent and
Divergent.
http//www.tulane.edu/dmsander/WWW/335/Retrovirus
es.html
http//www.tulane.edu/dmsander/WWW/335/Papovaviru
ses.html
Retroviridae Arrangements
Papoviridae Arrangement
12
The Problems of Viral Annotation
  • HMM gene structure generator (McCauley)
  • Gene Structure Evolution (de Groot)
  • Alignment (Caldeira, Lunter, Rocco)
  • Recombination (Lyngsø, Song)
  • Multiple constraints RNA secondary structure,
    gene conservation, binding/transcriptional
    instructional sites.

13
Our 8 State HMM which allows for Unidirectional
overlapping gene structures
  • HMM States
  • Non-coding
  • Coding RF1
  • Coding RF2
  • Coding RF3
  • Coding RF1,2
  • Coding RF1,3
  • Coding RF2,3
  • Coding RF1,2,3

14
Combining Levels of Selection.
Assume multiplicativity fA,B fAfB
Protein-Protein Hein Støvlbæk, 1995
Codon Nucleotide Independence Heuristic Jensen
Pedersen, 2001 Contagious Dependence
Protein-RNA
Doublets
Singlet
Contagious Dependence
15
Table illustrating the performance benefit in
Sensitivity we obtain utilizing a Phylogenetic
HMM. We extend the HMM model to include
evolutionary information from 13 aligned HIV2
sequences.
16
GenBank Centralized resource for publicly
available viral sequence data.
Entrez Genomes currently contains 2120 Reference
Sequences for 1510 viral genomes and 36 Reference
Sequences for viroids.
http//www.ncbi.nlm.nih.gov/Genbank/
http//www.ncbi.nlm.nih.gov/genomes/VIRUSES/viruse
s.html
Properties of overlapping genes are conserved
across microbial genomes.Genome Res. 2004
Nov14(11)2268-72.
Within microbial genomes, one third of annotated
genes contain some degree of overlap, and one
third of these are either Convergent or Divergent.
Krakauer, D.C. Stability and evolution of
overlapping genes. Evolution 54 731-739 (2000)
Genome Res. 2004 Nov14(11)2268-72.
General preponderance of overlapping gene
structures is roughly a 9091 ratio split across
Unidirectional, Convergent and Divergent
arrangements.
17
Turing Patterns
18
Mathematical models to understand biological
patterns
From Mainis Home Page http//www.maths.ox.ac.uk/
maini
Turing Model
19
Different parameters lead to different patterns
Stripes p small
Spots p large
From Leppanen et al. Dimensionality effects in
Turing pattern formation, Int. J. Mod. Phys. B
17, 5541-5553 (2003)
20
3 suggestions
  • Networks and Turing Patterns
  • 2. Stochastic Partial Differential Equations
  • 3. Phylogenetically related Turing Patterns

21
Evolutionary Models of Protein Structure Evolution
?
?
?
?
Known
Unknown
Known
300 amino acid changes 800 nucleotide changes 1
structural change 1.4 Gyr
a-globin
Myoglobin
1. Given Structure what are the possible events
that could happen? 2. What are their
probabilities? Old fashioned substitution
indel process with bias. Bias
Folding(Sequence ?Structure) Fitness of
Structure 3. Summation over all paths.
22
2 suggestions
A. Structure? (Homology Modelling, Topology)
Folding(Sequence ?Structure) As a first
approximation similar structures should be
compared and the problem could be solved by
comparative modelling.
Fast Homology Modelling
Using Protein Topology as Hidden Variable
Fitness of Structure such functions are common
place in guiding prediction programs.
B. MCMC
23
Questions to be asked
Negative Note
Protein Structure Analysis is much harder than
Sequence Analysis. Much of the first hand
impression will remain Structures are either
trivially similar or highly dissimilar the
middle ground is empty. At Gyr scale other
rearrangements occur.
Positive Note If it works
Test of smooth/catastrophic structure
evolution Separation of analogous/homologous
similarities Protein Evolution in General How
closely linked are homologous and structurally
equivalent sites?
http//www.biochem.ucl.ac.uk/bsm/cath/ http//sco
p.mrc-lmb.cam.ac.uk/scop/
24
Summary
Pedigrees from Genomes Does infinite
genomes determine pedigrees? How many
pedigrees are there? Comparative Genomics of
Alternative Splicing How well do you know
the ASG? How do you measure selection on
the ASG? Viral Annotation How well can you
annotate viruses from observed evolution? Evolving
Turing Patterns Turing Patterns and
Networks Stochastic Turing Patterns
Phylogenetically Related Turing Patterns Protein
Structure Evolution Full Model of Structure
Evolution Model of Protein Topology Evolution
Write a Comment
User Comments (0)
About PowerShow.com