26S Proteasome and Protein Stability - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

26S Proteasome and Protein Stability

Description:

CATH and SCOP are two of these, each containing 950-1400 protein superfamilies ... SCOP database. Classification scheme: Class, ... HMM's also useful at SCOP ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 30

Provided by: MICROCOM1

Category:

more less

Transcript and Presenter's Notes

Title: 26S Proteasome and Protein Stability

1
Algorithms and databases for sequence and
structural analysis
Biology thru homology or analogy
2
In biomolecular sequences (DNA, RNA, or amino
acid sequences), high sequence similarity usually
implies significant functional or structural
similarity.
However Evolutionary and functionally
related molecular strings can differ
significantly throughout much of the string and
yet preserve the same three-dimensional
structure(s), or the same two dimensional
substructure(s) (motifs, domains), or the same
active sites, or the same or related dispersed
residues (DNA or amino acid). Dan Gusfield.
Algorithms on Strings, Trees, and Sequences.
1997. University of Cambridge Press. p.334
3
Objectives

What is the function of this gene?
Do other genes have this functional motif?
Can I predict the higher order structure of this
protein?
Is this gene a member of a known gene family?
Do other organisms have this gene?

4
Intuition

Similar sequences should have (long) regions of
similar/identical residues.
Why?
Evolution descent from a common ancestral
sequence
Functional/structural convergence

5
General Database Search Issues

Search using amino acid sequence if possible
Why? Protein evolution is slower than DNA
sequence evolution
Statistical theory is based on unrealistic
assumptions consider results as predictions.

6
Sequence Alignment

Sequence alignment is simply the optimal
assignment of substitution and indel events to a
pair of sequences.
Global alignment align entire sequences
Local alignment find best matching regions of
sequences

7
Measuring Alignment Quality

Good alignments should have
many exact matches
few mismatches
many of the mismatches should be similar
residues
few gaps

8
Measuring Alignment Quality
Begin with...
Longest Exact Match
QTRPQNVLNPP STRQNVINPWAAQ
S 3a
Salignment score amatch score
9
Measuring Alignment Quality
allow some mismatches
QTRPQNVLNPP STRQNVINPWAAQ
Salignment score amatch score bmismatch penalty
S 5a - 1b
10
Measuring Alignment Quality
and finally, introduce some gaps
QTRPQNVLNPP STR-QNVINPWAAQ
Salignment score amatch score bmismatch
penalty cgap penalty
S 7a - 1b -1c
11
Scoring Issues

Relative costs of matches, mismatches, and gaps
should depend on their probabilities (rare events
receive higher penalties)
In practice, the appropriate costs are rarely
known.
A variety of scoring matrices are available.

12
BLAST (www.ncbi.nlm.nih.gov/BLAST)Basic Local
Alignment Search Tool
BLAST is based on a systematic search of
conserved words. The query sequence is decomposed
into words of length W (W3 for amino acids 11
for nucleotides), a list of these words and
similar words from entries in the relational
database are compared. Sequences scoring below
a threshold are deleted from the list.
13
Scoring Matrices

Scoring matrix specifies a score, sij, for
aligning sequence I with sequence II.
Choice of matrix depends on the divergence level
of desired/expected hits.
Examples PAM, BLOSUM
Both can be modified for different divergence
levels (eg, BLOSUM40, BLOSUM62)
Advice try several matrices when possible.

14
(No Transcript)
15
(No Transcript)
16
PSI-BLASTPosition Specific Iterated BLAST
1. BLAST with query 2. Keep hits w/ E lt E
(adjustable constant) 3. Multiple alignment of
HSPs from step (2) 4. Build profile 5. BLAST with
profile 6. Iterate (1)-(5) until no new hits are
found
17
PSI-BLASTPosition Specific Iterated BLAST
Use with great caution!!! Once an unrelated
sequence is mistakenly incorporated into the
profile, subsequent iterations will incorporate
homologues of the unrelated sequence
(catastrophic transitivity). Human intervention
is essential.
18
The COG database new developments in
phylogenetic classification of proteins from
complete genomes. Tatusov RL, Natale DA,
Garkavtsev IV, Tatusova TA, Shankavaram UT,
Rao BS, Kiryutin B, Galperin MY, Fedorova ND,
Koonin EV.
All vs. all blastp of genome sequence (primarily
microbial) database. Each COG consists of
individual orthologous genes or orthologous
groups of paralogs from three or more
phylogenetic lineages. In other words, any two
proteins from different lineages that belong to
the same COG are orthologs. Each COG is assumed
to have evolved from an individual ancestral
gene through a series of speciation and
duplication events.
19
Domains and insight into protein function

Proteins are modular, exhibiting discrete folding
units known as domains
Switching and swapping domains is a mechanism for
functional diversity in proteins
Domains can exhibit intrinsic function

20
Examples

SH2 binds phosphorylated tyrosine residues in
protein partners
PDZ mediates protein-protein interactions
between enzymes
HTH binds DNA in site-specific manner
Once a domain acquires selectable functionality,
it can be distributed to other gene products and
providing a mechanism for evolution

21
Hidden Markov Models are sensitive tools for
domain detection
www.pfam.wustl.edu www.tigr.org/TIGRFAMs/ www.s
mart.embl-heidelberg.de/ These tools use
profiles generated from multiple sequence
alignments.
22
Rossman fold - Profile HMM and PROSITE

GLGFFGV
GVGYFGV
GLGFFGL
GLGFFGL
GQGVLGL

23
Transition to structural classifications

Several useful databases link sequence analysis
and protein structure information
CATH and SCOP are two of these, each containing
950-1400 protein superfamilies
Since structure is more highly conserved than
sequence during evolution, structural alignment
algorithms and classifications enable more
distant evolutionary relatives to be identified.

24
CATH

Contains 200,000 sequence domains, assigned to
1200 CATH homologous superfamilies
Classification Scheme Class, Architecture,
Topology and Homology
Class secondary structure composition and
packing
Architecture orientation of secondary
structures in 3D, regardless of connectivity
Topology both orientation and connectivity of
secondary structure is accounted for
Homologous superfamily grouped based on whether
an evolutionary relationship exists (clustered at
different levels of sequence ID)

25
SCOP database

Classification scheme Class, Fold, Superfamily,
and Family,
Class Type and organization of secondary
structure
Fold Share common core structure, same
secondary structure elements in the same
arrangement with the same topological connections
Superfamily share very common structure and
function
Family protein domains share a clear common
evolutionary origin as evidenced by sequence
identity or similar structure/function

26
HMMs also useful at SCOP

For instance, SCOP (http//scop.mrc-lmb.cam.ac.uk/
scop/) HMMs are derived from the PDB databank at
www.rcsb.org
Identify sequence signatures for specific domains

27
Structural Alignments

Various algorithms allow structure vs. structure
comparisons
VAST, DALI
CATH (http//www.biochem.ucl.ac.uk/bsm/cath/)
also has SSAP and GRATH (one computationally
intensive, one not)
Sequence similarity to structural families for
modeling often extracted using PSI-BLAST
(Gene3D)

28
Comparison of sequence and structure alignments
1 Taylor WR, Orengo CA, 1989, Protein structure
alignment. J Mol Biol 2081-224 Mueller L,
2003, Protein structure alignment. Paper
presentations 27.51630h
29
Multiple structural alignments