Title: Computational Questions
1Computational Questions
2Where CS and Biology Meet
- Bioinformatics Applications of CS to the life
sciences - What are the computational issues?
- Storage and retrieval of genetic data, data
mining, tools - Analysis of genetic data similarities,
differences, structure - Processing experimental data
3Problem Solving inComputer Science
- Program Sequence of instructions that perform a
particular task - Task (problem) expressed as Given data (input),
produce results (output) - From problems to programs
- Formulate the problem
- Develop and verify an algorithm
- Write and test the program
4Algorithm Analysis
- Algorithm Conceptual/theoretical form of a
program - What is analyzed?
- Correctness does it solve the problem?
- Complexity how much resources (time and memory)
does it consume? - Tradeoffs sometimes, we need to sacrifice
correctness for efficiency
5Example 1 Searching for an Element in a List
- Problem formulation
- Input sorted list L of n elements (e.g., names)
and a target element x - Output the position of the target element if it
exists in the list - Possible algorithms
- Linear search
- Binary search
6Linear Search
- Algorithm
- For each element in L (from the first to the last
element), compare it with x and return the
position if equal - Time complexity
- Up to n comparisons performed
- On the average, n/2 comparisons
- Runs in linear time (proportional to the list
size n)
7Binary Search
- Algorithm
- Compare middle element of the list with x, return
the position if equal if not, reduce the list to
either the lower half or the upper half of
original list repeat the process - Time complexity
- Up to log2n comparisons performed
- Runs in logarithmic time
8Linear vs Logarithmic Time
n n/2 log n
10 5 4
100 50 7
1000 500 10
1,000,000 500,000 20
2,000,000 1,000,000 21
9Comparing Running Times
- Exercise tabulate values of the following
run-time functions for different values of n - Functions
- log n (logarithmic)
- n (linear)
- n2 (quadratic)
- n3 (cubic)
- 2n (exponential)
- n!
10Example 2 Substring Search
- Problem formulation
- Input Strings s and t of characters
- Output If s is a substring of t, its position
in t - Example
- Input s ctct, t agtctcttctaac,
- Output 4
- Algorithm? Time Complexity?
11Example 3 Traveling Salesman
- Problem Formulation
- Input n cities, distances between cities
- Output shortest tour of all cities
- Algorithm
- Consider all permutations of the cities, compute
total distances for each permutation, select the
minimum among all total distances
12Exponential Algorithms and Intractable Problems
- The Traveling Salesman problem is an example of
an intractable (NP-complete) problem - Characterized by
- The existence of a correct exponential algorithm
- No known polynomial algorithm
- Exponential algorithm is impractical. Now what?
13Heuristics
- There are polynomial algorithms for intractable
problems that do not always yield the correct
answer - Example Start with any city, go to the nearest
unvisited city, repeat process - Not always correct. Counterexample?
- Selection of nearest city is called a heuristic
- Compromise Can prove some statements on the
(incorrect) algorithm and that may be enough in
practice
14Back to Bioinformatics Some Objectives
- Formulate problems relevant to biology
- Devise/understand algorithms for these problems
- Computer scientists and biologists need to talk
more - Computer scientists have a tendency to make
(often unreasonable) assumptions - Biologists may place too much faith on results
returned by automated systems
15Overview Selected Problems in Bioinformatics
- Sequence alignment
- Phylogeny
- Dealing with experimental results
16BLAST Search
17Blast Results
18DNA Sequence Databases
- Data representation, integrity, accuracy
- Search and scoring methods
- Meaning and reliability of results
- e.g., how does BLAST (Basic Local Alignment
Search Tool) respond to random data?
19Sequence Alignment Problem
- Given two nucleotide sequence, obtain an optimal
alignment between the sequences - Example AT-C-TGAT -TGCAT-A-
20Dynamic Programming
T G C A T A
A
T
C
T
G
A
T
21Phylogeny
- Construction of phylogenetic trees based on
genomic distance - Problems to be solved
- Determining genomic distance
- Tree construction from the distances
22Determining Genomic Distance
- Given two genomes, determine the number of
mutations necessary to obtain one from the other - Common distance model (least number of mutations)
- Mutation on the genome level rearrangement
(sorting!) operations on permutations
23(No Transcript)
24Sorting Permutations and a Graph Theoretic Model
0 3 5 6 7 2 1
4 8 9
0 1 2 7 6 5
3 4 8 9
25Phylogenetic Tree Reconstruction
- Given a set of species and genomic distances
between the species, construct a phylogenetic
tree that is (most) consistent with the distances - Problem shown to be NP-complete
- This means we should try some heuristics
26Phylogenetic Tree
Mouse
Monkey
Human
27Experimental Results
- Image or data directly drawn from a device
- e.g., microarray, scanner
- Need to make objective, discrete conclusions
- e.g., pixel intensity vs. gene expression
- Need to handle errors and imperfections
28Microarray Image
29Image Analysis to Aid Microarray Experiments
- Automatically locating the grid of spots
- Use Fourier transforms to compute periods and
offsets - Extracting intensity
- Refine spot sample to collect significant,
normalized data - Make conclusions on genetic function
30Summary
- Bioinformatics a perfect opportunity for
interdisciplinary research within the sciences - Academics from the different backgrounds need to
study, discuss, debate with each other