Title: Introduction to Bioinformatics
1Introduction to Bioinformatics
2Genetic Material
- DNA (deoxyribonucleic acid) is the genetic
material - Information stored in DNA
- the basis of inheritance
- distinguishes living things from nonliving things
- Genes
- various units that govern living things
characteristics at the genetic level
3Nucleotides
- Genes themselves contain their information as a
specific sequence of nucleotides found in DNA
molecules - Only four different bases in DNA molecules
- Guanine (G)
- Adenine (A)
- Thymine (T)
- Cytosine (C)
- Each base is attached to a phosphate group and a
deoxyribose sugar to form a nucleotide. - The only thing that makes one nucleotide
different from another is which nitrogenous base
it contains
Base
P
Sugar
4Purine
Pyrimidine
Nucleoside
5Nucleotides
- Complicated genes can be many thousands of
nucleotides long - All of an organisms genetic instructions, its
genome, can be maintained in millions or even
billions of nucleotides
6Orientation
- Strings of nucleotides can be attached to each
other to make long polynucleotide chains - 5 (5 prime) end
- The end of a string of nucleotides with a 5'
carbon not attached to another nucleotide - 3 (3 prime) end
- The other end of the molecule with an unattached
3' carbon
75
1
2
4
3
8Base Pairing
- Structure of DNA
- Double helix
- Seminal paper by Watson and Crick in 1953
- Rosalind Franklins contribution
- Information content on one of those strands
essentially redundant with the information on the
other - Not exactly the sameit is complementary
- Base pair
- G paired with C (G ? C)
- A paired with T (A T)
9(No Transcript)
10Base Pairing
- Reverse complements
- 5' end of one strand corresponding to the 3' end
of its complementary strand and vice versa - Example
- one strand 5'-GTATCC-3'
- the other strand 3'-CATAGG-5' ? 5'-GGATAC-3'
- Upstream Sequence features that are 5' to a
particular reference point - Downstream Sequence features that are 3' to a
particular reference point
5'
3'
Upstream
Downstream
11DNA Structure
12DNA Structure
13Chromosome
- Threadlike "packages" of genes and other DNA in
the nucleus of a cell
14(No Transcript)
15Chromosome
- Different kinds of organisms have different
numbers of chromosomes - Humans
- 23 pairs
- 46 in all
16Central Dogma of Molecular Biology
- DNA information storage
- Protein function unit, such as enzyme
- Gene instructions needed to make protein
- Central dogma
17Central Dogma of Molecular Biology
reverse transcription (reverse transcriptase)
replication (DNA polymerase)
- DNA obtained from reverse transcription is called
complementary DNA (cDNA) - Difference between DNA and cDNA will be
discussed later
18Central Dogma of Molecular Biology
- RNA (ribonucleic acid)
- Single-stranded polynucleotide
- Bases
- A
- G
- C
- U (uracil), instead of T
- Transcription (simplified )
- A ? A, G ?G, C ? C, T ? U
DNA
H
RNA
OH
19(No Transcript)
20(No Transcript)
21DNA Replication (DNA ? DNA)
22DNA Replication (DNA ? DNA)
23DNA Replication Animation
Courtesy of Rob Rutherford, St. Olaf University
24Transcription (DNA ? RNA)
- Messenger RNA (mRNA)
- carries information to be translated
- Ribosomal RNA (rRNA)
- the working spine of the ribosome
- Transfer RNA (tRNA)
- the decoder keys that will translate nucleic
acids to amino acids
25 Transcription Animation
Courtesy of Rob Rutherford, St. Olaf University
26Peptides and Proteins
- mRNA ? Sequence of amino acids connected by
peptide bond - Amino acid sequence
- Peptide lt 30 50 amino acids
- Protein longer peptide
27(No Transcript)
28(No Transcript)
29Genetic Code Codon
- Codon
- 3-base RNA sequence
Stop codons
Start codon
30List of Amino Acids
- Amino acid Symbol Codon
- A Alanine Ala GC
- C Cysteine Cys UGU, UGC
- D Aspartic Acid Asp GAU, GAC
- E Glutamic Acid Glu GAA, GAG
- F Phenylalanine Phe UUU, UUC
- G Glycine Gly GG
- H Histidine His CAU, CAC
- I Isoleucine Ile AUU, AUC, AUA
- K Lysine Lys AAA, AAG
- L Leucine Leu UUA, UUG, CU
31List of Amino Acids
- Amino acid Symbol Codon
- M Methionine Met AUG
- N Asparagine Asn AAU, AAC
- P Proline Pro CC
- Q Glutamine Gln CAA, CAG
- R Arginine Arg CG, AGA, AGG
- S Serine Ser UC, AGU, AGC
- T Threonine Thr AC
- V Valine Val GU
- W Tryptophan Trp UGG
- Y Tyrosine Tyr UAU, UAC
- 20 letters, no B J O U X Z
32Codon and Reading Frame
- 4 AA letters ? 43 64 triplet possibilities
- 20 (lt 64) known amino acids
- Wobbling 3rd base
- Redundant ? Resistant to mutation
- Reading frame linear sequence of codons in a
gene - Open Reading Frame (ORF), definition varies
- a reading frame that begins with a start codon
and end at a stop codon - a series of codons in a DNA sequence
uninterrupted by the presence of a stop codon - ? a potential protein-coding region of DNA
sequence
33Open Reading Frame
- Given a nucleotide sequence
- How many reading frames? __
- __ forward and __ backward
- Example Given a DNA sequence,
- 5-ATGACCGTGGGCTCTTAA-3
- ATG ACC GTG GGC TCT TAA ? M T V G S
- TGA CCG TGG GCT CTT AA ? P W A L
- GAC CGT GGG CTC TTA A ? D R G L L
- Figure out the three backward reading frames
- In random sequence, a stop codon will follow a
Met in 20 AAs - Substantially longer ORFs are often genes or
parts of them
34Translation (RNA ? Protein)
35 Translation Animation
Courtesy of Rob Rutherford, St. Olaf University
36Gene Expression
- Gene expression
- Process of using the information stored in DNA to
make an RNA molecule and then a corresponding
protein - Cells controlling gene expression by
- reliably distinguishing between those parts of an
organisms genome that correspond to the
beginnings of genes and those that do not - determining which genes code for proteins that
are needed at any particular time.
37Promoter
- The probability (P) that a string of nucleotides
will occur by chance alone if all nucleotides are
present at the same frequency P (1/4)n, where n
is the strings length - Promoter sequences
- Sequences recognized by RNA polymerases as being
associated with a gene - Example
- Prokaryotic RNA polymerases scan along DNA
looking for a specific set of approximately 13
nucleotides marking the beginning of genes - 1 nucleotide that serves as a transcriptional
start site - 6 that are 10 nucleotides 5' to the start site,
and - 6 more that are 35 nucleotides 5' to the start
site - What is the frequency for the sequence to occur?
38Gene Regulation
- Regulatory proteins
- Capable of binding to a cells DNA near the
promoter of the genes - Control gene expression in some circumstances but
not in others - Positive regulation
- binding of regulatory proteins makes it easier
for an RNA polymerase to initiate transcription - Negative regulation
- binding of the regulatory proteins prevents
transcription from occurring
39Promoter and Regulatory Example
- Low tryptophan concentration
- ? RNA polymerase binds to promoter
- ? genes transcribed
- High tryptophan concentration
- ? repressor protein becomes active and binds to
operator - ? blocks the binding of RNA polymerase to the
promoter - Tryptophan concentration drops
- ? repressor releases its tryptophan and is
released from DNA - ? polymerase again transcribes genes
40Gene Structure
41Exons and Introns
42Exons and Introns Example
43Protein Structure and Function
- Genes encode the recipes for proteins
44Protein Structure and Function
- Proteins are amino acid polymers
45Proteins Molecular Machines
- Proteins in your muscles allows you to
movemyosinandactin
46Proteins Molecular Machines
- Digestion, catalysis (enzymes)
- Structure (collagen)
47Proteins Molecular Machines
- Signaling(hormones, kinases)
- Transport(energy, oxygen)
48Protein Structures
49Information Flow in Nucleated Cell
50Point Mutation Example Sickle-cell Disease
- Wild-type hemoglobin
- DNA
- 3----CTT----5
- mRNA
- 5----GAA----3
- Normal hemoglobin
- ------Glu------
- Mutant hemoglobin
- DNA
- 3----CAT----5
- mRNA
- 5----GUA----3
- Mutant hemoglobin
- ------Val------
51image credit U.S. Department of Energy Human
Genome Program, http//www.ornl.gov/hgmis.
52Thinking about the Human Genome
- 50 is high copy number repeats
- About 10 is transcribed
- (made into RNA)
- Only 1.5 actually codes for protein
- 98.5 Junk DNA
53Thinking about the Human Genome
- 3 X 109 bps
- (3 billion base pairs)
- If each base were one mm long
- 2000 miles, across the center of Africa
- Average gene about 30 meters long
- Occur about every 270 meters between them
- Once spliced the message would only be
- 1 meter long