Computational Questions - PowerPoint PPT Presentation

About This Presentation
Title:

Computational Questions

Description:

Computational Questions Bioinformatics – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 31
Provided by: JohnPa174
Category:

less

Transcript and Presenter's Notes

Title: Computational Questions


1
Computational Questions
  • Bioinformatics

2
Where CS and Biology Meet
  • Bioinformatics Applications of CS to the life
    sciences
  • What are the computational issues?
  • Storage and retrieval of genetic data, data
    mining, tools
  • Analysis of genetic data similarities,
    differences, structure
  • Processing experimental data

3
Problem Solving inComputer Science
  • Program Sequence of instructions that perform a
    particular task
  • Task (problem) expressed as Given data (input),
    produce results (output)
  • From problems to programs
  • Formulate the problem
  • Develop and verify an algorithm
  • Write and test the program

4
Algorithm Analysis
  • Algorithm Conceptual/theoretical form of a
    program
  • What is analyzed?
  • Correctness does it solve the problem?
  • Complexity how much resources (time and memory)
    does it consume?
  • Tradeoffs sometimes, we need to sacrifice
    correctness for efficiency

5
Example 1 Searching for an Element in a List
  • Problem formulation
  • Input sorted list L of n elements (e.g., names)
    and a target element x
  • Output the position of the target element if it
    exists in the list
  • Possible algorithms
  • Linear search
  • Binary search

6
Linear Search
  • Algorithm
  • For each element in L (from the first to the last
    element), compare it with x and return the
    position if equal
  • Time complexity
  • Up to n comparisons performed
  • On the average, n/2 comparisons
  • Runs in linear time (proportional to the list
    size n)

7
Binary Search
  • Algorithm
  • Compare middle element of the list with x, return
    the position if equal if not, reduce the list to
    either the lower half or the upper half of
    original list repeat the process
  • Time complexity
  • Up to log2n comparisons performed
  • Runs in logarithmic time

8
Linear vs Logarithmic Time
n n/2 log n
10 5 4
100 50 7
1000 500 10
1,000,000 500,000 20
2,000,000 1,000,000 21
9
Comparing Running Times
  • Exercise tabulate values of the following
    run-time functions for different values of n
  • Functions
  • log n (logarithmic)
  • n (linear)
  • n2 (quadratic)
  • n3 (cubic)
  • 2n (exponential)
  • n!

10
Example 2 Substring Search
  • Problem formulation
  • Input Strings s and t of characters
  • Output If s is a substring of t, its position
    in t
  • Example
  • Input s ctct, t agtctcttctaac,
  • Output 4
  • Algorithm? Time Complexity?

11
Example 3 Traveling Salesman
  • Problem Formulation
  • Input n cities, distances between cities
  • Output shortest tour of all cities
  • Algorithm
  • Consider all permutations of the cities, compute
    total distances for each permutation, select the
    minimum among all total distances

12
Exponential Algorithms and Intractable Problems
  • The Traveling Salesman problem is an example of
    an intractable (NP-complete) problem
  • Characterized by
  • The existence of a correct exponential algorithm
  • No known polynomial algorithm
  • Exponential algorithm is impractical. Now what?

13
Heuristics
  • There are polynomial algorithms for intractable
    problems that do not always yield the correct
    answer
  • Example Start with any city, go to the nearest
    unvisited city, repeat process
  • Not always correct. Counterexample?
  • Selection of nearest city is called a heuristic
  • Compromise Can prove some statements on the
    (incorrect) algorithm and that may be enough in
    practice

14
Back to Bioinformatics Some Objectives
  • Formulate problems relevant to biology
  • Devise/understand algorithms for these problems
  • Computer scientists and biologists need to talk
    more
  • Computer scientists have a tendency to make
    (often unreasonable) assumptions
  • Biologists may place too much faith on results
    returned by automated systems

15
Overview Selected Problems in Bioinformatics
  • Sequence alignment
  • Phylogeny
  • Dealing with experimental results

16
BLAST Search
17
Blast Results
18
DNA Sequence Databases
  • Data representation, integrity, accuracy
  • Search and scoring methods
  • Meaning and reliability of results
  • e.g., how does BLAST (Basic Local Alignment
    Search Tool) respond to random data?

19
Sequence Alignment Problem
  • Given two nucleotide sequence, obtain an optimal
    alignment between the sequences
  • Example AT-C-TGAT -TGCAT-A-

20
Dynamic Programming
T G C A T A
A
T
C
T
G
A
T
21
Phylogeny
  • Construction of phylogenetic trees based on
    genomic distance
  • Problems to be solved
  • Determining genomic distance
  • Tree construction from the distances

22
Determining Genomic Distance
  • Given two genomes, determine the number of
    mutations necessary to obtain one from the other
  • Common distance model (least number of mutations)
  • Mutation on the genome level rearrangement
    (sorting!) operations on permutations

23
(No Transcript)
24
Sorting Permutations and a Graph Theoretic Model
0 3 5 6 7 2 1
4 8 9
0 1 2 7 6 5
3 4 8 9
25
Phylogenetic Tree Reconstruction
  • Given a set of species and genomic distances
    between the species, construct a phylogenetic
    tree that is (most) consistent with the distances
  • Problem shown to be NP-complete
  • This means we should try some heuristics

26
Phylogenetic Tree
Mouse
Monkey
Human
27
Experimental Results
  • Image or data directly drawn from a device
  • e.g., microarray, scanner
  • Need to make objective, discrete conclusions
  • e.g., pixel intensity vs. gene expression
  • Need to handle errors and imperfections

28
Microarray Image
29
Image Analysis to Aid Microarray Experiments
  • Automatically locating the grid of spots
  • Use Fourier transforms to compute periods and
    offsets
  • Extracting intensity
  • Refine spot sample to collect significant,
    normalized data
  • Make conclusions on genetic function

30
Summary
  • Bioinformatics a perfect opportunity for
    interdisciplinary research within the sciences
  • Academics from the different backgrounds need to
    study, discuss, debate with each other
Write a Comment
User Comments (0)
About PowerShow.com