Computational Biology - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Computational Biology

Description:

9/15/09. Math 6390. 1. Computational Biology. Dr. Isabel Darcy. EC 3.914. 972-882-4435 ... 5% of genome codes for protein. 95% junk DNA??? Thryroglobin gene: ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 18
Provided by: utda
Category:

less

Transcript and Presenter's Notes

Title: Computational Biology


1
Computational Biology
  • Dr. Isabel Darcy
  • EC 3.914
  • 972-882-4435
  • darcy_at_utdallas.edu
  • www.utdallas.edu/darcy

2
Human Genome
  • 3 billion base pairs
  • 46 chromosomes
  • 5 of genome codes for protein.
  • 95 junk DNA???
  • Thryroglobin gene (extreme example)
  • introns more 100,000 bp
  • exons only 8500 bp
  • Gene Expression

some proteins are 1000x more common then other
proteins.
3
Areas used/needed in computational biology
  • Biology
  • Computer science
  • Statistics
  • Graph Theory
  • Linear Algebra
  • Topology
  • Algebraic Geometry
  • etc.

4
  • Mathematics
  • 1 1 2 always.
  • Topology means
  • Biology
  • G always pairs with C (I.e. usually)
  • Topology means

5
My definition of computational biology
  • The translating of biological
    concepts/questions into rigorous mathematical
    and/or computational problems, solving these
    problems, and translating the answers back into
    useful biological information.

Example
Determining the packing of mitochondrial DNA of
the parasite trypanosome which infects the
Tse-tse fly which infects humans and animals with
sleeping sickness.
6
  • The trypanosome mitochondrial DNA consists of
    5000 DNA mini-circles and about 25 DNA
    maxi-circles.
  • Question How are the mini-circles of DNA linked
    together?
  • Assumptions The linking is uniform throughout
    the network.
  • Tool The network can be randomly broken into
    much smaller networks. The number of DNA
    mini-circles in the resulting small networks can
    be determined (via gel electrophoresis)

7
We will cover
  • Ch 1 Intro to Molecular Biology
  • Ch 2 Some basics (strings, graphs, algorithms).
  • Ch 3 Sequence comparison and data base search
  • Ch 6 Phylogenetic Trees
  • Ch 7 Genome Rearrangements
  • Microarrays
  • Protein Folding (ch 8?)

8
Possible Projects
  • Microarrays
  • Protein Folding
  • DNA computing (Ch 9)
  • Human Brain Project
  • Chemical Chirality
  • Gel Electrophoresis (DNA Topology)
  • Or any other approved subject.

9
Web page creation
  • For instructions specific to setting up a UTD
    web page see http//www.utdallas.edu/ir/tcs/labs/
    unixdocs/provider.html
  • For instruction on how to create an html document
    see
  • http//www.ncsa.uiuc.edu/General/Internet/
  • WWW/HTMLPrimer.html
  • or http//www.hypernews.org/HyperNews/get/
  • www/html/lang.html
  • OR
  • Use another format and convert to html. For
    example, Microsoft Word documents and PowerPoint
    slides can be converted to html using the save as
    command. Latex files can be converted to html
    files using the command latex2html file.tex.
    This will create the directory file in which
    includes file.html, figures, etc.
  • You can also learn from other peoples web pages
    by looking at/copying their html code. From
    netscape or internet exploror, click on Source
    located under View.

10
Web page creation (cont.)
  • Use netscape composer.
  • First create file.html
  • Under Communicator, click Composer. Netscape
    Composer will pop up.
  • Click Open button.
  • Enter URL (http//www.utdallas.edu/your_name/file
    .html) or click Choose file and click file.html
    (after changing into apporpriate directory).
  • Use a PC product such as Front page.

11
2.1 Strings
  • Alphabet A finite set
  • e.g. A, T, C, G, amino acids, a, , z.
  • Character or symbol an element of the alphabet.
  • Sequence or String an ordered succession of
    characters.
  • e.g ATATCAGTTGCC
  • Length of a string s s number of characters
    in s.
  • si the ith character in string s.
  • Empty string e the string of length zero.
  • Subsequence of s is a sequence that can be
    obtained from s by removing some characters.
  • t is a supersequece of s if s is a subsequence of
    t.
  • A substring of s is a su sequence where the
    characters are consecutive in s.

12
  • An interval, i..j, is a set of consecutive
    indices such that
  • si..j sisi1sj if
    si..j e if i j1.
  • st s1..nt1..m is the concatenation of s
    s1..n and t t1..m.
  • prefix(s,j) s1..j is a prefix of s s1..n.
  • Suffix(s, j) sn-j1n is a suffix of
    s s1..n.
  • k is the killer agent that destroys characters it
    operates on.
  • Note Concatenation is not associative
  • (ATk)CT ACT ATT AT(kCT)
  • k -1
  • prefix(s,j) sks-j
  • suffix(s,j) ks-js
  • si..j ki-1sks-j

13
  • A graph, G (V, E), is a collection of
    vertices, V va , and edges,
  • E (va, vb)
  • If G is a undirected graph,
  • (va, vb) (vb, va) for all
  • Example G (V, E) where
  • V v1, v2, v3, v4 and
  • E (v1, v2), (v1, v3), (v1, v4), (v2, v3),
    (v2, v4), (v3, v4)

14
  • G directed (u,v) (v,u).
  • G undirected (u,v) (v,u).
  • Simple graph no loops
  • (u,u) E, 2 copies of (u,v) E.
  • V of vertices, E of edges.
  • u and v are the endpoints of the edge (u,v).
  • u and v are incident to (u,v).
  • If (u,v) is directed, u is the tail of this edge
    and v is the head.
  • u and v are adjacent if (u,v) is in E.
  • The degree of v is the number of edges adjacent
    to it.
  • If G is directed
  • the outdegree of v is the number of edges in E
    of the form (v,x).

15
  • G (V,E) is a subgraph of G (V,E) if
  • If G is a subgraph of G and
  • If then G is a
    proper subgraph of G.
  • If V V, G is a spanning subgraph of G.
  • If V v v is an endpoint of an edge in E,
    then G is the graph induced by E.
  • If E (v,w) v,w V, then G is the
    graph induced by V.
  • A path is an ordered list of distinct vertices
    (v1, v2, , vk) such the (vi, vi1) is a edge in
    G.
  • A cycle in an undirected graph is a path where vk
    v1 and no edge is repeated.
  • A simple cycle is a cycle where all vertices
    except the first and the last are distinct.
  • A vertex v is reachable from vertex u if there is
    a path between u and v.
  • The weight of a path is the sum of the weights of
    its edges.

16
  • An undirected graph is connected if every vertex
    is reachable from every other vertex.
  • The connected components of G is the set of all
    connected subgraphs of G such that no element of
    the set is a subgraph of another element of the
    set.
  • A directed graph is strongly connected if every
    vertex is reachable from every other vertex.
  • A directed graph is weakly connected if the
    underlying undirected graph is connected (every
    vertex is reachable if we ignore edge direction).
  • A directed graph is not connected if it is
    neither strongly nor weakly connected.
  • An acyclic graph is a graph without cycles.
  • A complete graph is a graph such that v, w V
    implies (v,w) in E.
  • A bipartite graph G (V,E) is a graph such that
    V V1 V2 where V1 V2 empty
    set and every edge has one endpoint in V1 and the
    other endpoint in V2.
  • A tree is a graph which is acyclic and connected.
  • A forest is a graph whose connected components
    are trees.

17
Trees
  • A node is a vertex
  • A leaf is a node with degree 1. All other nodes
    are interior nodes.
  • A tree is rooted is one of its nodes is
    distinguished. This distinguished node is called
    the root (denoted by r).
  • If v is a node in the path from r to u, then v is
    an ancestor of u and u is a descendant of v.
  • If u and v are adjacent and v is an ancestor of
    u, then v is the parent of u and u is the child
    of v. Note leaves are nodes without children,
    interior nodes have children, the root has no
    parents.
  • The depth of a node v is the number of edges on
    the path from v to r.
  • The lowest common ancestor of u and v is the
    deepest node that is ancestor of both u and v
    (I.e. the closest node to u and v which is an
    ancestor of both u and v).
  • Interval graphs
  • An interval graph G (V,E) is an undirected
    graph obtained from a collection C of intervals
    on the real line. To each interval in C there
    corresponds a vertex in G. The edge (u,v) is in
    E if and only if their corresponding intervals
    intersect.
Write a Comment
User Comments (0)
About PowerShow.com