Title: Introduction to Bioinformatics
1Introduction to Bioinformatics
- Alexandra M Schnoes
- Univ. California San Francisco
- Alexandra.Schnoes_at_ucsf.edu
2What is Bioinformatics?
- Intersection of Biology and Computers
- Broad field
- Often means different things to different people
- Personal Definition
- The utilization of computation for biological
investigation and discoverythe process by which
you unlock the biological world through the use
of computers.
3What does one do in Bioinformatics? (a small
sample)
- Our Lab Understanding Protein (Enzyme) Function
?
4What does one do in Bioinformatics? (a small
sample)
- Discover new drug targetscomputational docking
Atreya, C. E. et al. J. Biol. Chem.
200327814092-14100 Shoichet, B. K. Nature.
2004432862-865
5What does one do in Bioinformatics? (a small
sample)
sbw.kgi.edu/ www.sbi.uni-rostock.de/
research.html
6This lab Nucleotide Protein Informatics
- Sequence analysis
- Finding similar sequences
- Multiple sequence alignment
- Phylogenetic analysis
7Sequence?Structure?Function
8Process of Evolution
- Sequences change due to
- Mutation
- Insertion
- Deletion
9Use Evolutionary Principles to Analyze Sequences
- If sequence A and sequence B are similar
- A and B evolutionarily related
- If sequence A, B and C are all similar but A and
B are more similar than A and C and B and C. - A and B are more closely evolutionarily related
to each other than to C
10Extremely Powerful Idea
- Start with unknown sequence
- Find what the unknown is similar to
- Use information about the known to make
predictions about the unknown
11How do you know when sequences are similar?
- Align two sequences together and score their
similarity
TASSWSYIVE TATSFSYLVG
- Use substitution matrices to score the alignment
12Substitution Matrices Give a Score for Each
Mutation
Blosum 62 Scoring matrix
- Many different matrices available
- Blosum matrices standard in the field
http//www.carverlab.org/images/
13Scoring Add up the positional Scores
TASSWSYIVE TATSFSYLVG
TASSWSYIVE TATSFSYLVG
14Additional issues
- Gaps (insertions/deletions)
- Have scoring penalties for opening and continuing
a gap
TASSWSYIVE TASSWSYIVE TATSFLVG
TATSF--LVG
15How do we find similar sequences?
- Start at the National Center for Biotechnology
Information - http//www.ncbi.nlm.nih.gov/
16How do we find similar sequences?
- Nucleotide Sequence Databases
17How do we find similar sequences?
- Protein Sequence Databases
18How do we find similar sequences?
- Similarity Search BLAST
- Basic Local Alignment Search Tool
19BLAST is very quick but
- Only local alignments
- Alignments arent great
- Only pair-wise alignments
20Want better alignments
- Multiple alignment
- Multiple sequences
- Better signal to noise
- More Sequences Better alignment
- More accurate reflection of evolution
- ClustalW
- Commonly used
- Easy to use
21Visualize the Multiple Alignment
22Use the Alignment to Calculate Evolutionary
Distances
- See how close sequences are to each other
- Best way to tell what is most similar
- Can calculate simple tree from clustalW
Taubenberger et al., Nature 437, 889-893, 2005
23Caveats!
- In reality
- Sequences (even parts of sequences) can evolve at
different rates - Dont have a good understanding of sequence and
function - High sequence identity does not always mean the
same function - Getting good alignments and good trees can be
very hard
24Bioinformatics Sequence Analysis
- Start with unknown sequence
- Find similar sequences
- Create alignment
- Create phylogenetic tree
- Use information about knowns to make predictions
about unknown
25Mini Virus Intro
- Often considered not alive
- Extremely small (much smaller than a cell)
- Cellular parasites
- Has a genome but can only reproduce inside a host
cell
26Different Viruses
- RNA DNA viruses
- Both single and double-stranded
27Different Viruses
- RNA DNA viruses
- Both single and double-stranded
28Influenza Virus (flu)
- Small genome8 RNA molecules
- Evolves quickly genetic drift, antigenic shift
29Influenza Virus (flu)
Reverse Transcriptase
Sequencing
Genomic Nucleotide Sequence
DNA
30Influenza Pandemics
- 1918 Flu
- Killed from 50-100 Mil. people worldwide
- Considered to be one of the most deadly pandemics
- Killed many of the young and healthy
- Influenza A, Type H1N1
- Thought to have derived from Avian Influenza
- Recently reconstituted from recovered human
samples - Considerable ethical debate
31Avian Influenza
- Current fear of pandemic
- High mortality rate (including young and healthy)
- Current concern is Influenza A, Type H5N1
- Still only transmitted by contact with birds
- Is now in Asia and Eastern and Western Europe
32This lab Nucleotide Protein Informatics
- Sequence analysis
- Finding similar sequences
- Multiple sequence alignment
- Phylogenetic analysis
33(No Transcript)