Title: How are genes or proteins related at the sequence level
1MULTIPLE SEQUENCE ALIGNMENTS
- How are genes or proteins related at the sequence
level? - Genes or proteins within an organism that harbor
similar function. - Example Cytochome P450 genes/proteins are
oxidative enzymes that metabolize substrates.
Each subtype of this group metabolizes a
different type of chemical/biochemical substrate.
A comparison of the sequences of the
genes/proteins can reveal - a) a relationship between the sequence of each
genes/protein and - the known chemical/biochemical substrate.
- b) the conserved domains of the gene/protein
that are required for - the enzymes to function, localize in the cell,
bind to co-factors, etc. - 2) Genes or proteins across different organisms
that harbor similar function. -
2Humans have 18 families of cytochrome P450 genes
and 43 subfamilies CYP1 drug metabolism (3
subfamilies, 3 genes, 1 pseudogene) CYP2 drug
and steroid metabolism (13 subfamilies, 16 genes,
16 pseudogenes) CYP3 drug metabolism (1
subfamily, 4 genes, 2 pseudogenes) CYP4
arachidonic acid or fatty acid metabolism (5
subfamilies, 11 genes, 10 pseudogenes) CYP5
Thromboxane A2 synthase (1 subfamily, 1
gene) CYP7A bile acid biosynthesis 7-alpha
hydroxylase of steroid nucleus (1 subfamily
member) CYP7B brain specific form of 7-alpha
hydroxylase (1 subfamily member) CYP8A
prostacyclin synthase (1 subfamily member) CYP8B
bile acid biosynthesis (1 subfamily member) CYP11
steroid biosynthesis (2 subfamilies, 3
genes) CYP17 steroid biosynthesis (1 subfamily, 1
gene) 17-alpha hydroxylase CYP19 steroid
biosynthesis (1 subfamily, 1 gene) aromatase
forms estrogen CYP20 Unknown function (1
subfamily, 1 gene) CYP21 steroid biosynthesis (1
subfamily, 1 gene, 1 pseudogene) CYP24 vitamin D
degradation (1 subfamily, 1 gene) CYP26A retinoic
acid hydroxylase important in development (1
subfamily member) CYP26B probable retinoic acid
hydroxylase (1 subfamily member) CYP26C probabvle
retinoic acid hydroxylase (1 subfamily
member) CYP27A bile acid biosynthesis (1
subfamily member) CYP27B Vitamin D3 1-alpha
hydroxylase activates vitamin D3 (1 subfamily
member) CYP27C Unknown function (1 subfamily
member) CYP39 7 alpha hydroxylation of 24 hydroxy
cholesterol (1 subfamily member) CYP46
cholesterol 24-hydroxylase (1 subfamily
member) CYP51 cholesterol biosynthesis (1
subfamily, 1 gene, 3 pseudogenes) lanosterol
14-alpha demethylase
A pseudogene is a nucleotide sequence that is
part of the DNA of an organism that appears to
have once (earlier in evolution) coded a gene
product (protein) but no longer does so.
3P450 - CYP2 drug and steroid metabolism (13
subfamilies, 16 genes, 16 pseudogenes) CYP2B is
inducible by barbiturates in rodents. It was one
of the first P450s to be purified from mammals,
but its role in humans is not understood. CYP2C8
is known to catalyze the 6-alpha hydroxylation of
taxol. This is a drug used in treating breast
cancer. CYP2C9 is one of two human P450s that
has a known crystal structure. CYP2C9 structure
was published in Nature (Williams PA, Cosme J,
Ward A, Angove HC, Matak Vinkovic D, Jhoti H.
Crystal structure of human cytochrome P450 2C9
with bound warfarin. Nature. 2003 Jul 24424,
464-468.) CYP2C19 metabolizes omeprazole, a
common ulcer medication. Polymorphisms (SNPs) in
this gene cause a higher incidence of poor
metabolizer phenotypes in Asians (23) vs.
Caucasians (3-5). CYP2D6 is perhaps the best
studied P450 with a drug metabolism polymorphism.
This enzyme is responsible for more than 70
different drug oxidations. Since there may be no
other way to clear these drugs from the system,
poor metabolizers may be at severe risk for
adverse drug reactions. CYP2E1 is induced by
ethanol (high levels of expression in
alcoholics). There is a polymorphism associated
with this gene that is more common in Asians. The
mutation correlates with a 2-fold increased risk
of nasopharyngeal cancer linked to smoking. This
P450 enzyme that may be related to smoking
induced cancer.
4ClustalW is one example of an application that
aligns multiple sequences (there are many
others). Uses the Smith-Waterman
algorithm. Generates multiple sequence
alignment, and a phylogenetic tree. http//www
.ebi.ac.uk/clustalw/
5(No Transcript)
6(No Transcript)
7These genes are paralogous, which are genes
within a species that have duplicated themselves,
then evolved slightly different
functions/activities.
8Comparative genomics based on protein sequence of
(P450 CYP11A1 in agricultural organisms Pig,
Horse, Goat, Sheep, and Cow.
9(No Transcript)
10These genes are orthologous, which are
functionally the same gene conserved through
evolution and now exist in different organisms.
11- Comparative Genomics
- Organisms are related by
- Homologous Sequence (gene or protein).
- Presence of Common Sequence.
- Functional Motifs within Sequences.
- REVIEW HOMEWORK ASSIGNMENT!!!
12Protein Sequence Analysis
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18- Protein Structures Categories
- Primary Structure specific amino acid sequence.
- Secondary Structure a string of amino acids
(within a protein) twists or folds back upon
itself to form a alpha helix, beta sheet or a
variety of other possible structures/domains. - Tertiary Structure the way a complete protein
folds, and involves different domain-domain
interactions within the protein. - Quaternary Structure the interaction of multiple
proteins to form larger functional structures. - Many proteins bind to themselves to form
homodimers and homopolymers. - Many proteins bind to other proteins to form
heterodimers and heteropolymers.
GLTHAKWVM
Beta sheet alpha helix
Ribbon figure Ball Stick figure.
Each color represents a different protein, this
is hemoglobin (made up of 4 different proteins).
The red color is the edge of the heme (iron
porphyrin) group.
19- Protein Structures Data Representation
- Primary Structure character string.
- Secondary Structure
- Tertiary Structure
- Quaternary Structure
Identifying sub-structures in a large protein
based on sequence. 3-Dimensional
Representation Protein Database Bank (PDB) This
is a complicated file format structure that
support numerous programs, and contains
information regarding the primary structure
(sequence), 3-D structures (x, y, z coordinates),
size and linking of specific atoms in structures,
etc.
20Secondary Structure Prediction 1) Hydropathy
Plot 2) Alpha Helix 3) Beta Sheet
A Hydropathy plot identifies domains within a
protein that are soluble (region of charged
amino acids) or insoluble (region of uncharged
amino acids).
An alpha helix is a group of amino acids within a
proteins that arrange themselves in a helical
structure.
A beta sheet is a group of amino acids within a
protein that arrange themselves in a stable
aligned (parallel) configuration.
21(No Transcript)
22(No Transcript)
23Secondary Structure Prediction Hydropathy
Plot Commonly used to identify alpha helices that
span a membrane (i.e. anchor protein to cell
wall).
1) Choose a moving window that travels along
the protein sequence a) calculates the overall
solubility of the amino acids in the
window. b) moves in amino acid c) repeat
calculation d) continue this though the entire
protein sequence.
Transmembrane domains are 20 amino acids, but any
size window can be used.
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLL
LNGSYSENRT
1) Calculate average using amino acids-specific
constants.
24Secondary Structure Prediction Hydropathy Plot
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLL
LNGSYSENRT
X (-3.5)(3.8)(-4.5)(3.8)(-4.5)(-1.3)(2.5)
(1.8)(-1.6)(1.8)(-0.4)(2.8)(1.8)(3.8)(3.8)
(-3.9)(2.5)(-3.5)(-3.5)(1.8)
WINDOW SIZE 20
Solubility Constants (Kyte Doolittle) A
Alanine 1.8 R Arginine -4.5 N Asparagine
-3.5 D Aspartic acid -3.5 C Cysteine 2.5 Z
Glutamine -3.5 E Glutamic acid -3.5 G Glycine
-0.4 H Histidine -3.2 I Isoleucine 4.5 L
Leucine 3.8 K Lysine -3.9 M Methionine 1.9 F
Phenylalanine 2.8 P Proline -1.6 S Serine
-0.8 T Threonine -0.7 W Tryptophan -0.9 Y
Tyrosine -1.3 V Valine 4.2
X 30.05 / 20 X 1.503
25(No Transcript)
26Hydropathy Plot Demonstration
27Protein Folding Computationally Modeling
Biochemistry
28- OBJECTIVE
- Utilize the sequence information, along with
temperature-dependent biomolecular interaction
constants, to computationally predict a
proteins tertiary structure. - CHALLENGES
- It is NOT known how proteins fold in nature.
- More detailed or mathematically-intensive methods
cant be completed in a reasonable time (given
current computer capabilities). - There are essentially no experimental methods to
verify or validate that a predicted protein is
correct or how correct.
29Monte Carlo simulation of a folding event. Each
frame displays the average position of a 48-mer
chain during a 104 iteration time window. The
color of each bead represents the variance of the
position of the bead during this time interval,
with yellow/green indicating large fluctuations
and blue indicating small fluctuations. The
entire folding event takes 8 x 105 iterations.
30- Evolution of Protein Folding Methods
- 1) Lattice Methods 3D lattice of residue or
atomic positions. - 2) Off-Lattice Methods Not reliant on
predetermined 3D positions. Can include solvent
effects. - 3) All Atoms Methods/Modeling EXTREMELY
computationally intensive.
- Tactics
- Initially calculate secondary structures minimums
(fold sheets and helices), then calculate minima
for remaining sequence. - Emulate Protein synthesis process, starting from
amino-terminus. - Utilize existing NMR and X-ray crystal structures
that match sequence under investigation.
31Protein Self-Assembly Good AND Bad
Quaternary Structure the interaction of multiple
proteins to form larger functional
structures. Many proteins bind to themselves to
form homodimers and homopolymers. Many proteins
bind to other proteins to form heterodimers and
heteropolymers.
32Many diseases involve self-aggregating proteins
(especially neurodegenerative diseases). Mad Cow
Disease (Prion Proteins) Alzheimers Disease
(beta-Amyloid Peptide) Huntingtons Disease Why
neuro-diseases? 1) Because the blood flow
(nutrients) to the brain is highly regulated, and
proteins that aggregate tend to collect and are
NEUROTOXIC. Note that these proteins ALSO
aggregate in peripheral tissues, but are
cleared and do not appear to be sufficiently
toxic. 2) Brain cells (neurons) do NOT
regenerate in a manner equivalent to peripheral
tissues (particularly in older people). 3) Loss
of neuronal cells leads to altered cognitive
capabilities, which is not the case in peripheral
tissues (e.g. slight muscle atrophy).
33Neurodegenerative Protein Diseases Beta Sheet
Structures!!! Beta-sheet structures are
sometimes called amyloid structures. Hence the
term Amyloidopathy NOTE The molecular forces
that assemble beta-sheet structures ALSO cause
them to self-assemble!
342 key concepts regarding age-related
diseases. 1) Increased human health longevity
invents diseases. Before the modern age, nature
had rarely seen a 60 year old human. Imagine the
age-related diseases of the future when the
average human life span is gt120 years. 2)
Evolutionary pressures did not select for humans
to live much longer than 35-40 years. So
inherited mutations that lead to age-related
diseases were not selected out of the human
population. This fact has NOT changed in modern
times.
Alzheimers Disease 40-90 (sporadic at 60,
familial at 40), increases with age Men more
common under the age of 80 yrs Women more common
over the age of 80 yrs (J Neurol Neurosurg
Psychiatry 199966177 in BMJ 1999 Feb
27318(7183)614)
35Alzheimers Disease
Amyloid Precursor Protein
Beta Amyloid Protein
42 amino acids long
Self Aggregation
Neuronal cell nuclei (blue circles)
Senile Plaque
36Beta-Amyloid Aggregated in Water
500 nm
37Huntingtons Disease Incidence 2-8 persons per
100,000 worldwide with focal population
clusters Cause Known excess of trinucleotide
(CAG) repeats (encode glutamine) CAG
repeats 6-34 Normal Gene 36-120 HD Mutation
(majority 40-50 CAG repeats, 33-40 yr
onset) Number of repeats inversely related to
age of onset. Juvenile onset is rare and
involves CAG repeats gt60.
38Huntingtin Gene
10-30 CAG codons
Normal
Abnormal
gt 40 CAG codons
Huntingtin Protein
Abnormal
Normal
39Figure 1. Specific localization of huntingtin
aggregates in HD-repeat mutant mouse
brain.Low-magnification micrographs are shown of
brain sections from HD-repeat mutant (a) and
wild-type (b) mice at 27 months of age. Only the
striatum (Str) in the HD-repeat mutant mouse
brain was immunoreactive with EM48. Ctx, cortex.
High-magnification light micrograph (c) and
electron microscopy (d) show EM48-immunoreactive
aggregates in the neuronal nucleus (arrows). n,
Nucleus. Immunofluorescent double labelling shows
that striatal neurons containing intranuclear
EM48-reactive aggregates are labelled by
antibodies to calbindin-D (stars in e), but not
by antibodies to nitric oxide synthase (NOS f)
or parvalbumin (PARV g). Scale bars, 10 m
(a-c,f- g) and 0.5 m (d).
40Prion Protein Diseases
1) Inter-species effect due to similarity between
prion protein sequences. 2) The role of the
normal prion protein in nature is not
understood. 3) The disease involves a
mis-folding of the prion protein to a beta-sheet
structure, which then self-aggregates.
41The illustration below compares a normal prion
protein (PrpC) to a disease-causing form (PrpSc).
The two structures exhibit two different, classic
protein motifs, called "alpha helices," and "beta
sheets." Alpha helices, seen here in the normal
prion (left), consist of linked amino-acid
building blocks that spiral around like a coiled
spring. Beta sheets form when amino acid chains
line up in a flat plane within the protein, as in
the disease-causing protein shown here.
Transmissible Spongiform Encephalopathy
Disease Form (self aggregating)
Normal Form