Introductory Biological Sequence Analysis Through Spreadsheets - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Introductory Biological Sequence Analysis Through Spreadsheets

Description:

Need to make the math in the courses correlate with math that needed in that discipline ... of proteins (e.g. charge or hydrophobicity) which depend on the nature and ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 17
Provided by: stephenj5
Learn more at: http://www.mscs.mu.edu
Category:

less

Transcript and Presenter's Notes

Title: Introductory Biological Sequence Analysis Through Spreadsheets


1
Introductory Biological Sequence Analysis
Through Spreadsheets
  • Stephen J. Merrill
  • Sandra E. Merrill
  • Marquette University
  • Milwaukee, WI

2
Teaching Mathematics to Students of
Biology
  • Need to make the math in the courses correlate
    with math that needed in that discipline
  • The most important math needed is statistics
  • The molecular biology revolution in biology
    presents data in a form in which calculus has
    little impact (sequences of letters)

3
The Nature of Biological Sequence Data
  • Primary structure of DNA, RNA, and proteins are
    sequences of letters -- 4 letters in the case of
    DNA (ATGC) and RNA (AUGC) and 20 letters
    representing the sequence of amino acids which
    makes up a protein
  • Secondary and Tertiary structures (bending,
    folding and twisting) of structures determines
    function -- hints seen through primary structure

4
Use of Spreadsheets in this setting
  • Commonly found and used in biological labs for
    data acquisition, storage and organization, and
    data analysis
  • Commonly present on student computers and
    computer labs
  • Unlike calculators -- able to handle data sets
    typical of real world applications
  • R.F. Murphy at CMU has developed a set of
    worksheets for sequence analysis

5
Meaningful Questions Problems
  • 1. Measuring the similarity between two strings
    -- alignment or homology
  • 2. Finding instances of a pattern in a string
  • 3. Describing the composition and properties of a
    string
  • 4. Graphing the evolutionary process and
    construction of phylogenetic trees

6
Measuring the Similarity between Strings
  • Given a gene -- suggest the function of the
    protein coded for by finding a similar sequence
    (possibly in another species)
  • Simple homology involves assigning a 1 for
    agreement and 0 for nonagreement at each site.
    Then sum over all sites
  • Homology is the fraction of the highest possible
    score, in

7
Spreadsheet 1 Simple Homology
8
Spreadsheet 1 (cont.)comparing random
sequences
9
Finding Instances of a Particular
Pattern in a String
  • The process of locating genes involves locating
    regions of the DNA sequences that contain
    patterns which resemble those of known genes
  • Identifying sites on DNA where one of the
    restriction enzymes can cleave DNA -- Also of
    interest is size of the fragments that result
  • Identify regions of RNA which correspond to
    particular features (e.g. loops) which may be
    splice sites

10
Describing the Composition and Properties of a
String
  • Counts of frequencies of particular letters due
    to their properties (e.g. regions rich in GC or
    AT in DNA)
  • Properties of proteins (e.g. charge or
    hydrophobicity) which depend on the nature and
    frequencies of the particular amino acids

11
Spreadsheet 2 Hydropathy Plot
12
Spreadsheet 2 (Cont.)
13
Graphing Evolution and Phylogenetic
Trees
  • Evolutionary distance between two DNA sequences
    used to determine the process of the changes in
    the sequences over time (e.g. the evolution of
    HIV or the flu viruses)
  • Trees constructed to express the relationship
    between related sequences -- distance in the tree
    a monotone function of homology

14
Spreadsheet 3 Mutation Evolution
15
Spreadsheet 3 (cont.)
To study the evolution of a
sequence, we randomly pick a site for mutation,
then change its letter
16
Conclusion
  • Use of a spreadsheet makes possible an
    experimental approach to introducing the
    mathematics of sequence analysis
  • The use of spreadsheets makes possible the use of
    real-world data and presents the computational
    tool in a meaningful context
  • The importance of the topics to all educated
    individuals suggests that the topics be included
    in many liberal arts math courses
Write a Comment
User Comments (0)
About PowerShow.com