The Genome Access Course Protein Structure presentation

About This Presentation

Transcript and Presenter's Notes

Title: The Genome Access Course Protein Structure

1
TheGenomeAccessCourseProtein Structure
HSP 70 (1DKG, 1DKZ) and prefoldin (1FXK)
2
Protein structure

What is the correct amino acid sequence?
Is the predicted protein complete (ATG real?)
To be sure - use ORF finder at NCBI

3
ORF finder to BLAST
http//www.ncbi.nlm.nih.gov/gorf/gorf.html
4
Protein Structural Elements

2o Structural Elements
a-Helix
ß-Sheet
Globular regions
Domains
SH2
Leucine Zipper

5
Protein function - different categories

Protein of known function
Protein of inferred function
Protein of unknown function

6
Protein of known function

Work already done
Ancillary databases (e.g Pubmed. OMIM, MGI, other
organism specific databases)
Warning - make sure it really is the SAME protein
First port of call - LocusLink/Entrez Gene

7
Human genes and OMIM
http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbO
MIM
8
Mouse genes and MGI
http//www.informatics.jax.org/
9
And the list goes on.
10
Protein of inferred function

Similar to protein of known function
Annotated
BLAST
Paralogue (same species) or orthologue (different
species) or just similar
Make sure key residues are conserved e.g Pairwise
or Multiple alignment

11
Protein of inferred function
Human X Chr
12
Protein of Unknown Function

Not similar at the primary sequence level to a
protein of known function
Can you predict function - so many caveats!
Transmembrane protein?
TMPred http//www.ch.embnet.org/software/TMPRED_fo
rm.html
Protein domains
can infer function e.g Homeobox
Warning - some domains are poorly and/or widely
predicted

13
Domains

Discrete structural units
Can infer boundaries from sequence analysis
25 500 residues long
Most lt 200 residues
Less than 50 residues usually stabilized by SS
bonds or metal ions

14
LipoxygenaseDomain
gt500 residues
15
WW Domain
33 residues
16
Domain Determination

Internal duplications
Detect with a dotplot
Transmembrane segments
Hydrophobic, 1535 residues
Segments easy to predict
Topology and multiple segments harder to predict
PHD, TMHMM, TMpred
Low complexity segments
Composition typically non-random
Non-compact folds coiled coils, rods, flexible
domain linkers
Complexity function (SEG)
Small-pitch overlapping repeats (XNU)

17
Protein sequence databases

Non curated
Trembl - automatically predicts proteins from CDS
in Genbank/EMBL/ddBJ
Entrez protein www.ncbi.nlm.nih.gov80/entrez/quer
y.fcgi?dbProtein
Curated
Swisprot - proteins identified with confidence
manually added to database
Uniprot (e.g hosted at EBI http//www.expasy.unip
rot.org/index.shtml

18
Proteins of Unknown function
Protein domain databases e.g. Interpro
http//www.ebi.ac.uk/interpro/index.html
19
Comparison of Protein Family DBs
Pfam
SMART
CDD
PROSITE
SRS
20

Conserved Domain Database (NCBI)
Linked into other NCBI resources
Includes Pfam and SMART domains (but does not
necessarily give the same answer)

21
Proteins in Ensembl
22
Proteins in UCSC
23

HMM family profiles constructed by hand
Structural data in alignments
No hierarchy
No specific compositional bias
Good graphical output

24
Pfam-A and Pfam-B

Pfam-A (75)
Curated, annotated families
Pfam-B (19)
Families derived automatically from ProDom
Other

Protein fingerprint database (fingerprints are
groups of conserved motifs that characterize a
protein family)
Regular grammar for describing profiles (e.g.
EDQ-x-G-x-DN-A-x-x-GALI)
Profile search is sensitive, but low coverage
(signaling)
Pattern search has high false positive rate

Highly conserved, ungapped MSAs
Derived from PROSITE

Fingerprints are sets of ungapped weight matrices
Hierarchical classification for important
families
Families, domains, and proteins

Simple Modular Architecture Research Tool
Collected by Ponting and Bork (641 HMMs)
Focuses on
Signaling Domains
Extracellular domains
Nuclear domains
High quality nice graphics

29
Alignment of Representative Members
Profile-HMM built with HMMer 2.0
Search Protein DB
Description
Full alignment
30

Profiles automatically built from PSI-BLAST
alignments of Swiss-PROT
No annotation
As with other automated DBs (Pfam-B, DOMO),
useful for seeing if region appears in different
contexts

31
Protein Sequence Analysis

Biochemical/biophysical properties
Secondary Structure
Super-secondary (signal peptides, domains,
motifs)
3D prediction (Threading)

32
Amphipathic Helix
Edge Strand
Buried Strand
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Viewing 3D Structures

Cn3d
Chime
RasMol
Protein Explorer

37
(No Transcript)
38
Protein of inferred or unknown function

All predictions must be taken as exactly that
PREDICTIONS!!
The true function of a protein is NOT known until
it has been proven in the lab

Write a Comment

User Comments (0)

About PowerShow.com

The Genome Access Course Protein Structure PowerPoint PPT Presentation