Title: Analysis of DNA Sequences: Bioinformatics
1Analysis of DNA SequencesBioinformatics
- Julia Saxonov
- Tom Chen
- Montville High School
2 Biological Databases
- The simplest tasks used in bioinformatics concern
the creation and maintenance of databases of
biological information. - Nucleic acid sequences (and the protein
sequences derived from them) comprise the
majority of such databases.
3Most Pressing Tasks of Bioinformatics
- Finding the genes in the DNA sequences of various
organism - Developing methods to predict the structure
and/or function of newly discovered proteins and
structural RNA sequences. - Clustering protein sequences into families of
related sequences and the development of protein
models. - Aligning similar proteins and generating
phylogenetic trees to examine evolutionary
relationships.
4So how do we go about doing this?
5Sequence Analysis Flow Chart
6Use a Waveform program to briefly analyze the
quality of your DNA sequence
- We will be using the 4Peaks program
- Briefly scan through the sequence using the arrow
keys - Make sure the sequence is fairly clean with no
long runs of Ns in the middle - Long runs of Ns are not good and should be edited
from the sequence
7(No Transcript)
8The linker used
-
- AATTCGCGGCCGCT- cDNA insert AGCGGCCGCG
- GCGCCGGCGA- cDNA insert -
TCGCCGGCGCTTA
9Remove vector and bad sequences from your 5-EX
DNA sequence
- Look for the EcoRI site GAATTC
- If the site was defective, it would be useful to
look for a different restriction site, such as
the SmaI site CCCGGG
- This is a starting point. Now in order to
isolate the cDNA, you must find the linker site.
Remove it and the bases before.
10Bad Sequences
- Examine the sequence until you are not confident
with the base-calling, which could be lots of Ns
or problems in the sequence itself
11(No Transcript)
12Just as you did for the 5-EX DNA sequence,
remove vector and bad sequences from the 3-EX
sequence
- The sequence using the 3-EX end will be read
from right to left - This means that although on the DNA, the 3
sequence begins on the right, your sequences 3
end is on the left.
5
GCTAGCTA
3
Sequence will appear
ATCGATCG
13So now repeat the steps the same way as with the
5 sequence
- Again, look for the EcoRI site
- If this is defective, you can search for either
the PstI site, or the BamHI site - Remember, since it is being read from right to
left, these restriction sites will be at the
beginning of the sequence.
14BLAST for Beginners
- A step-by-step tutorial to searching for DNA
sequences
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26NEB Cutter
27(No Transcript)
28Size of the open reading frame.
These are the different restriction sites that
are found in this section of DNA.
29Blasting Sanger
30(No Transcript)
31This range from 59536 to 62018 is important to
remember
32The range of our match found from before is
within this range of cosmid Y50D7A.
There is no specific gene shown so you must click
the name column to find it yourself.
33These are the four sections where the query
matched the DNA in the library.
34(No Transcript)
35Performing Structural Analysis
- How does it fold?
- Is it composed of one or multiple domains?
- If it is an enzyme, how does it bind substrate?
- Where is the active site?
- Are the regions that are conserved between your
protein and other homologs on the interior or
exterior of the protein structure? - If there are mutants, where would the structure
be affected? - Why is this important
- Understanding how structure relates to function
benefits in the successful design of drugs to
activate or deactivate proteins. Understanding
that a structure is related to proteins in
another homologous organisms, such as humans,
also allows us to test drugs on lower life forms.
36The End
- Disclaimer
- Presentation was the original work of Tom Chen
and Julia Saxonov. - NCBI blast slides courtesy of http//www.geospiza.
com/outreach/BLAST/