Title: Protein structure viewing:
1- Lecture 9
- Protein structure viewing
- PDB database, pdbsum, QuickPDB.
- NCBI MMDB (Molecular Modeling DataBase), Cn3D.
- 2. Protein Classification SCOP and CATH
databases. - 3. Families, Patterns, Motifs - InterPro.
- Databases prosite.
- Blocks.
- Pfam.
- eMOTIF.
- Other interesting protein features.
- 5. Other Diseases Related to Protein Folding.
2How Many Folds Are There ?
- Structural Classification
- of proteins (SCOP)
- Status (1 Mar 2002)
- based on 13220 PDB entries.
- How many more folds
- are there ?
- Estimation
- Number of possible
- folds 4,000.
- Database of 930
- folds covers 90
- of protein families.
-
http//scop.berkeley.edu/count.html
3Clustering of Structures
Structural Classification of Proteins
(SCOP) http//scop.mrc-lmb.cam.ac.uk/scop/
all a
Globin- like
Nearly all proteins have structural similarities
with other proteins and sometimes share a common
evolutionary origin. The SCOP database aims to
classify proteins according to structural and
evolutionary relationship.
Globin- like
globins
myoglobin
4Clustering of Structures
- Class - similar secondary structures
- all a, all b, ab?
- Fold major structural similarity
- (similar secondary structures).
- Super-family - low sequence identity,
- probable common ancestry.
- Family - clear evolutionary
- relationship (usually sequence
- identity gt 30).
- Individual protein.
all a
Globin- like
Globin- like
globins
myoglobin
5SCOP - Results
6http//www.biochem.ucl.ac.uk/bsm/cath_new/index.ht
ml http//www.biochem.ucl.ac.uk/bsm/cath_new/cath
_info.html
Classification of protein domain structures.
C - Class - determined according to secondary
structure composition. A - Architecture -
describes the overall shape of the domain
structure. T - Topology (FOLD) - major structural
similarities. H - Homology Super-family -
Protein domains which share a common ancestor.
Click on 3D figure
Domains for 1fupA2
7- Lecture 9
- Protein structure viewing
- PDB database, pdbsum, QuickPDB.
- NCBI MMDB (Molecular Modeling DataBase), Cn3D.
- 2. Protein Classification SCOP and CATH
databases. - 3. Families, Patterns, Motifs - InterPro.
- Databases prosite.
- Blocks.
- Pfam.
- eMOTIF.
- Other interesting protein features.
- 5. Other Diseases Related to Protein Folding.
8Higher Level Structures Motifs Domains
Family is a set of sequences that are related
(functionally/structurally). Motif is a simple
combination of a few secondary structures, that
appear in several different proteins in nature.
A collection of motifs forms a domain. Domain
is a more complex combination of secondary
structures, that is common in a family
(consensus pattern). It has a very specific
function, (contains an active site). A protein
may contain more than one domain.
For further reading http//www.expasy.org/swissmo
d/course/text/chapter4.htm http//www.ii.uib.no/i
nge/talks/sverige00/sld003.htm
9Grouping of Secondary Structures Elements -
Super-secondary Structures or Motifs.
b-hairpin
bab
aa
?-barrels
http//www.expasy.org/swissmod/course/text/chapter
4.htm
10Example DNA Pattern Search
- Patterns most often
- examined in DNA
- sequences are
- Examples
- Recognition sites of restriction
- enzymes.
- Codons specifying the amino
- acid sequence of a protein.
- Intron splice sites.
- Promoter.
- Binding sites for regulatory
- proteins.
http//www.blc.arizona.edu/courses/bioinformatics/
patterns.html
11Example Calmodulin-Binding Motif
(calcium-binding proteins)
12Example Leucine Zipper Motif -
(Transcription factor)
http//www.blc.arizona.edu/courses/bioinformatics/
patt-lab.html
13Example Zinc-Finger Motif - (DNA binding
proteins)
http//www.blc.arizona.edu/courses/bioinformatics/
patt-lab.html
14Example Zinc-Finger Motif
http//www.ii.uib.no/inge/talks/ebi-nov-99/sld006
.htm
15Motifs in Protein Analysis
http//www.ii.uib.no/inge/talks/ebi-nov-99/sld009
.htm
16Protein Sequence Motif Databases
http//www.ii.uib.no/inge/talks/sverige00/sld003.
htm
17Conserved Protein Regions Profiles, Motifs and
Domains
We will tour various web servers and databases
identifying conserved regions within protein
families.
!
Warning Definitions, formats outputs vary
significantly from one server to the next. The
field is still relatively young and very
dynamic, so no standards have been established
yet !
18http//www.expasy.ch/prosite/
ProSite determines the function of
uncharacterized protein, and to which known
family of proteins it belongs. A pattern
describes a group of amino acids that constitutes
an usually short but characteristic motif within
a protein sequence.
For example The pattern AC - x - V - x(4) -
ED. is interpreted as Ala or Cys - any -
Val - any-any-any-any- any but Glu or Asp.
Note Search by full text search.
19PROSITE SYNTAX
For example The pattern AC - x - V - X(4) -
ED. is interpreted as Ala or Cys - any -
Val - any-any-any-any- any but Glu or Asp.
- The standard one-letter code for amino acids.
- x' any amino acid.
- ' residues allowed at the position.
- ' residues forbidden at the position.
- ( )' repetition of a pattern element are
indicated in parenthesis. - X(n) or X(n, m) to indicate the number or
range of repetition. - -' separates each pattern element.
- ' indicated a N-terminal restriction of
the pattern. - ' indicated a C-terminal restriction of
the pattern. - .' the period ends the pattern..
20http//www.blocks.fhcrc.org/
Blocks are multiply aligned un-gapped segments
corresponding to the most highly conserved
regions of proteins. The Blocks Database is a
collection of blocks representing known protein
families.
Input The amino acid sequence of a
protein. Outputs (1) Protein families with
similar block structure. (2) Blocks
inside families.
21http//blocks.fhcrc.org/blocks/blocks search.html
Blocks segments corresponding to the most highly
conserved regions of proteins documented in
PROSITE.
InterPro (IP) families
4 Blocks for saposin - IPB003119A-D
Help http//blocks.fhcrc.org/blocks/help/tutorial
/tutorial.html
22A collection of protein families domains.
http//www.sanger.ac.uk/Pfam/
Query amino acid sequence of human prosaposin.
View graphics
23(No Transcript)
24Protein families database
Representative SAPA family proteins
25The eBLOCKS server
Search a sequence
http//eblocks.stanford.edu/
- eBLOCKs is a database of protein sequence
blocks - ungapped alignments of - highly conserved regions among a protein
family or superfamily. - eBLOCKs is generated automatically from
PSI-BLAST results, using protein - sequences contained in SWISS-PROT.
- The PSI-BLAST result is then analysed by a
clustering algorithm to build - protein groups with different levels of
similarities. - Each group of sequences are aligned and
trimmed into blocks. - The current eBLOCKs database contains 81,413
eBLOCKs.
26http//eblocks.stanford.edu/eblocks/kwsearch.html
Logos Select display format GIF PDF
Postscript
27eBLOCKS - Results (cont.)
Sequence Logos A graphical way to display
consensus sequences. Amino acids are colored
according to their chemical and physical
characteristics Red for acidic amino acids
Glu, Asp Blue for basic amino acids Lys,
Arg, His White for polar OH/SH amino acids
Ser, Thr, Cys (light grey) Green for amide amino
acids Asn and Gln Yellow (sulphur) for
Metionine Black for hydrophobic amino acids
Ala, Val, Leu, Ile Orange for aromatic amino
acids Tyr, Phe, Trp Purple for proline
Pro Grey for glycine Gly All other letters light
blue so they would stand out
consensus
The motif of saposin, found using PSSM (from
PSI-BLAST analysis). Larger letters indicate more
significant amino acid position. The consensus
is the top amino acid in each column.
28http//motif.stanford.edu/emotif/
Discrete motifs represent specific function.
eMOTIF search
Results
motif