Introduction to Bioinformatics 20120 - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Introduction to Bioinformatics 20120

Description:

Intro to Bioinformatics 20120. Introduction to Bioinformatics. 20120. Gianluca ... returned by BLAST, from 6 different organisms (ticking the appropriate boxes ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 25
Provided by: gruye
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Bioinformatics 20120


1
Introduction to Bioinformatics20120
  • Gianluca Pollastri
  • office CS A1.07
  • email gianluca.pollastri_at_ucd.ie

2
Credits
  • Richard Lathrop and Pierre Baldis Bioinformatics
    courses at University of California _at_ Irvine.

3
Course overview
  • Context DNA, RNA, proteins
  • Resources GenBank, PDB, etc.
  • Algorithms for sequence comparison.
  • Phylogenetics.
  • Structural bioinformatics protein structure
    prediction.

4
Lecture notes
  • http//gruyere.ucd.ie/2007_courses/20120/
  • confidential..

5
Recommended/useful readings
  • No book is actually required
  • Introduction to Bioinformatics
  • Lesk
  • Introduction to Computational Molecular Biology
  • Setubal, Meidanis
  • Bioinformatics the Machine Learning approach
  • Baldi, Brunak

6
  • CS 20120, Introduction to Bioinformatics
  • Assignment 1, 29 January 2007
  • 10 of the overall mark
  • To hand in by midnight of February 12
  • 1. identify your favourite pet
  • 2. get the protein sequence for one of its genes
    on
  • a. http//www.ncbi.nlm.nih.gov/entrez/
  • 3. BLAST your sequence against UniProt at
  • a. http//www.ebi.ac.uk/blast2/index.html?UniProt
  • 4. If you get less than 6 results from 6
    different organisms, go back to 2 and choose
    another protein
  • 5. Select 6 sequences returned by BLAST, from 6
    different organisms (ticking the appropriate
    boxes and downloading them in fasta format will
    give you the right input format for the next
    step)
  • 6. Run clustalW on them using the page (be
    patient, might take time)
  • a. http//www.ebi.ac.uk/clustalw/index.html
  • 7. Draw a phylogenetic tree for your guide tree
    (.dnd) using an online viewer, e.g.
  • a. http//bioweb.pasteur.fr/seqanal/interfaces/dra
    wtree.html
  • 8. email me (gianluca.pollastri_at_ucd.ie)
  • a. your protein sequence UniProt record

7
Algorithms for sequence comparison
  • Generating all possible alignments and picking
    the best one impossibly slow.
  • Dynamic programming (here programming has
    nothing to do with computers) solving a problem
    by splitting it dynamically into subparts.
  • We build up a solution based on similarity
    between prefixes of the two sequences..

8
Aligning prefixes
  • Specifically, we solve the alignment problem of
    two sequences by splitting it iteratively (or
    recursively) into the alignment of their prefixes.

9
the algorithm
  • We can fill an (n1)x(m1) matrix with this
    stuff

10
Hope its right
11
Computing the matrix
  • m s
  • n t
  • for i0..m ai,0 ig //m1
  • for j0..n a0,j jg //n1
  • for i1..m
  • for j1..n
  • ai,j max(ai-1,jg,
  • ai,j-1g,
  • ai-1,j-1p(si,tj))
  • // (n1)(m1) max3 sums, etc.

12
Computing the alignment
  • What we computed here is the max similarity
    matrix between all prefixes of s and t.
  • Using this matrix we can compute the optimal
    alignments between s and t (they could be more
    than one).
  • am,n is the max similarity between s and t. We
    find the alignment by tracing the choices that
    led us there.

13
Computing the alignment (2)
  • // We have filled in matrix ai,j before
  • ns
  • mt
  • al_s //store here aligned s
  • al_t //store here aligned t
  • gap2 //gap penalty
  • inm //index for the alignment don't know how
    long, but at most nm
  • align()
  • while (ngt0 mgt0)
  • if (ngt0 an,man-1,m-gap)
  • al_sisn
  • al_ti'-'
  • nn-1
  • else if (ngt0 mgt0 an,man-1,m-1-p(sn
    ,tm))
  • al_sisn
  • al_titm
  • nn-1

14
align() while (ngt0 mgt0) if (ngt0
an,man-1,m-gap) al_sisn
al_ti'-' nn-1 else if (ngt0
mgt0 an,man-1,m-1-p(sn,tm))
al_sisn al_titm nn-1
mm-1 else if (mgt0 an,man,m-1-gap
) al_si'-' al_titm
mm-1 ii-1
15
(No Transcript)
16
Alignment
  • ACC-AGGCTACGA
  • ACCTGGGCCACGT
  • only one gap, no big deal here..

17
Order matters
  • There might be multiple paths with the same
    score. We used an upmost order here

1 vertical 2 diagonal 3 horizontal
18
Order matters (2)
  • To follow a downmost order, reverse the if
    statements in the code.
  • 1 horizontal
  • 2 diagonal
  • 3 vertical

19
(No Transcript)
20
Upmost and downmost alignment
  • upmost
  • ACC-AGGCTACGA
  • ACCTGGGCCACGT
  • ---
  • downmost
  • ACCAGG-CTACGA
  • ACCTGGGCCACGT
  • ---
  • Both alignments have the same score
  • 9 matches (1 x 9),
  • 3 mismatches (-1 x 3),
  • 1 gap (-2)
  • 4

21
NW algorithm issues
  • Always looks for a global alignment.
  • If I try to align the following
  • first_alignment_try_ok
  • second_alignment_try_ehm
  • This is what I get
  • -first_alignment_try_-ok
  • second_alignment_try_ehm
  • -------

22
NW algorithm issues
  • Always looks for a global alignment.
  • If I try to align the following
  • alignment_alignment_try_ok
  • alignment_try_ehm
  • This is what I might get
  • alignment_alignment_try_-ok
  • alig----------nment_try_ehm
  • --

23
Local alignment
  • We may want a variation of the previous algorithm
    that throws away stuff that clearly does not
    match, while keeping the good bits, together.
  • More formally find the highest scoring alignment
    between substrings of s and t.

24
Smith-Waterman algorithm
Mike Waterman at a conference
Write a Comment
User Comments (0)
About PowerShow.com