Bioinformatics and Proteomics - PowerPoint PPT Presentation

About This Presentation
Title:

Bioinformatics and Proteomics

Description:

Myo (Cod) HbA (Cod) HbB (Cod) Myo (Frog) HbA (Frog) HbB (Frog) Myo (Rat) ... LCA of Vertebrata. Orthologs and Paralogs. Myo (Cod) HbA (Cod) HbB (Cod) Myo (Frog) ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 79
Provided by: anastasian
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics and Proteomics


1
Bioinformatics and Proteomics
  • March 28, 2003
  • NIH Proteomics Workshop
  • Bethesda, MD
  • Anastasia Nikolskaya, Ph.D.
  • Research Assistant Professor
  • Protein Information Resource
  • Department of Biochemistry and Molecular
    Biology
  • Georgetown University Medical Center

2
Overview
  • Role of Bioinformatics/Computational Biology in
    Proteomics Research
  • Genomics
  • Functional Annotation of Proteins
  • Classification of Proteins
  • Bioinformatics Databases and Analytical Tools
    (Dr. Yeh and Dr. Hu)
  • Sequence function

3
Functional Genomics/Proteomics
  • Proteomics studies biological systems based
    on global knowledge of genomes, transcriptomes,
    proteomes, metabolomes. Functional genomics
    studies biological functions of proteins,
    complexes, pathways based on the analysis of
    genome sequences. Includes functional
    assignments for protein sequences.
  • Genome All the Genetic Material in the
    Chromosomes
  • Transcriptome Entire Set of Gene Transcripts
  • Proteome Entire Set of Proteins
  • Metabolome Entire Set of Metabolites

Genome
Transcriptome
Proteome
Metabolome
4
Proteomics
  • Data Gene Expression Profiling
  • - Genome-Wide Analyses of Gene Expression
  • Data Structural Genomics
  • - Determine 3D Structures of All Protein
    Families
  • Data Genome Projects (Sequencing)
  • - Functional genomics
  • - Knowing complete genome sequences of a
    number of organisms is the basis of the
    proteomics research

5
DNA Sequence Gene Protein
Sequence Function
6
Bioinformatics and Genomics/Proteomics
7
Most new proteins come from genome sequencing
projects
  • Mycoplasma genitalium - 484 proteins
  • Escherichia coli - 4,288 proteins
  • S. cerevisiae (yeast) - 5,932 proteins
  • C. elegans (worm) 19,000 proteins
  • Homo sapiens 40,000 proteins

... and have unknown functions
8
Advantages of knowing the complete genome
sequence
  • All encoded proteins can be predicted and
    identified
  • The missing functions can be identified and
    analyzed
  • Peculiarities and novelties in each organism can
    be studied
  • Predictions can be made and verified

9
The changing face of protein science
  • 20th century
  • Few well-studied proteins
  • Mostly globular with enzymatic activity
  • Biased protein set
  • 21st century
  • Many hypotheti- cal proteins
  • Various, often with no enzymatic activity
  • Natural protein set

10
Properties of the natural protein set
  • Unexpected diversity of even common enzymes
    (analogous, paralogous, xenologous, etc.
    enzymes )
  • Conservation of the reaction chemistry, but not
    the substrate specificity
  • Functional diversity in closely related proteins
  • Abundance of new structures

11
Objectives of functional analysis for different
groups of proteins
  • Experimentally characterized
  • Best annotated protein database SwissProt
  • Knowns Characterized by similarity (closely
    related to experimentally characterized)
  • Make sure the assignment is plausible
  • Function can be predicted
  • Extract maximum possible information
  • Avoid errors and overpredictions
  • Fill the gaps in metabolic pathways
  • Unknowns (conserved or unique)
  • Rank by importance

12

E. coli M. jannaschii S. cerevisiae
H. sapiens Characterized
experimentally 2046 97
3307 10189
Characterized by similarity 1083
1025 1055 10901
Unknown, conserved 285
211 1007
2723 Unknown, no similarity
874 411 966
7965
from Koonin and Galperin,
2003, with modifications
13
Problems in functional assignments for knowns
  • Previous low quality annotations

- misinterpreted experimental results
(e.g. suppressors, cofactors) -
biologically senseless annotations
Deinococcus head morphogenesis protein
Arabidopsis separation anxiety protein-like
Helicobacter brute force protein
Methanococcus centromere-binding protein
Plasmodium frameshift - propagated
mistakes of sequence comparison
14
Problems in functional assignments for knowns
  • Multi-domain organization of proteins

Histidine kinase
His kinase domain
Periplasmic sensor domain
Periplasmic sensor domain
Uncharacterized domain
15
Problems in functional assignments for knowns
  • Low sequence complexity (coiled-coil,
    non-globular regions)
  • Non-orthologous gene displacement
  • Enzyme evolution (divergence in sequence and
    function)

16
Enzyme recruitment Minor mutational changes
convert a glycerol kinase into gluconate kinase
Differences between gluconate and
glycerol/xylulose kinases
Differences between gluconate and
glycerol/xylulose kinases
Differences between gluconate and
glycerol/xylulose kinases
Leads to non-orthologous gene displacement
17
Objectives of functional analysis for different
groups of proteins
  • Experimentally characterized
  • Knowns Characterized by similarity (closely
    related to experimentally characterized)
  • Make sure the assignment is plausible
  • Function can be predicted
  • Extract maximum possible information
  • Avoid errors and overpredictions
  • Fill the gaps in metabolic pathways
  • Unknowns (conserved or unique)
  • Rank by importance

18
Functional PredictionDealing with
hypothetical proteins
  • Computational analysis
  • Sequence analysis of the new ORFs
  • Mutational analysis
  • Functional analysis
  • Expression profiling
  • Tracking of cellular localization
  • Structural analysis
  • Determination of the 3D structure

19
Structural Genomics
  • Protein Structure Initiative Determine 3D
    Structures of All Proteins
  • Family Classification
  • Organize Protein Sequences into Families, collect
    families without known structures
  • Target Selection
  • Select Family Representatives as Targets
  • Structure Determination
  • X-Ray Crystallography or NMR Spectroscopy
  • Homology Modeling
  • Build Models for Other Proteins by Homology
  • Functional prediction based on structure

20
Structural Genomics Structure-Based Functional
Assignments
Methanococcus jannaschii MJ0577 (Hypothetical
Protein) Contains bound ATP gt ATPase or
ATP-Mediated Molecular Switch Confirmed by
biochemical experiments
21
Crystal structure is not a function!
22
Improving functional assignments for unknowns
(Functional Prediction)
  • Detailed manual analysis of sequence
    similarities
  • Cluster analysis of protein families (family
    databases)
  • Use of sophisticated database searches
    (PSI-BLAST, HMM)

23
Using comparative genomics for protein analysis
  • Those amino acids that are conserved in
    divergent proteins (archaeal and bacterial,
    hyperthermophilic and mesophilic) are likely
    to be important for catalytic activity.
  • Comparative analysis allows us to find
    subtle sequence similarities in proteins that
    would not have been noticed otherwise
  • Prediction of the 3D fold and general function
    is much easier than prediction of exact
    biological (or biochemical) function.

24
Using comparative genomics for protein analysis
  • For some reason, the reaction chemistry often
    remains conserved even when sequence diverges
    almost beyond recognition
  • Sequence database searches that use exotic or
    highly divergent query sequences often reveal
    more subtle relationships than those using
    queries from humans or standard model organisms
    (E. coli, yeast, worm, fly).
  • Sequence analysis complements structural
    comparisons and can greatly benefit from them

25
Poorly characterized protein families
  • Enzyme activity can be predicted, the substrate
    remains unknown (ATPases, GTPases,
    oxidoreductases, methyltransferases,
    acetyltransferases)
  • Helix-turn-helix motif proteins (predicted
    transcriptional regulators)
  • Membrane transporters

26
Improving functional assignments for unknowns
  • Phylogenetic distribution
  • Wide - most likely essential
  • Narrow - probably clade-specific
  • Patchy - most intriguing, niche-specific
  • Domain association Rosetta Stone for
    multidomain proteins
  • Gene neighborhood (operon organization)

27
Using genome context for functional prediction
28
Problems in functional assignments/predictions
  • Identification of protein-coding regions
  • Delineation of potential function(s) for
    distant paralogs
  • Identification of domains in the absense of
    close homologs
  • Analysis of proteins with low sequence
    complexity

29
Unknown unknowns
  • Phylogenetic distribution
  • Wide - most likely essential
  • Narrow - probably clade-specific
  • Patchy - most intriguing, niche-specific

30
To deal with the ocean of new sequences, need
natural protein classification
Discovery of New Knowledge by Using Information
Embedded within Families of Homologous Sequences
and Their Structures
  • Protein families are real and reflect
    evolutionary relationships
  • Protein classification systems can be used to
  • Improve sensitivity of protein identification
  • Provide new protein sequence annotation,
    simplifying the search for non-obvious
    relationships
  • Detect and correct genome annotation errors
    systematically
  • Drive other annotations (actve site etc)
  • Provide basis for evolution, genomics and
    proteomics research

31
The ideal system would be
  • Comprehensive, with each sequence classified
    either as a member of a family or as an orphan
    sequence, a family of one
  • Hierarchical, with families united into
    superfamilies on the basis of distant homology
  • Allow for simultaneous use of the whole protein
    and domain information (domains mapped onto
    proteins)
  • Allow for automatic classification/annotation of
    new sequences when these sequences are
    classifiable into the existing families
  • Expertly curated (family name, function, evidence
    attribution (experimental vs predicted),
    background etc). This is the only way to avoid
    annotation errors and prevent error propagation

32
The ideal system has yet to be created, but there
are several very useful systems
33
Levels of Protein Classification
Class ?/? Composition of structural elements No relationships
Fold TIM-Barrel Topology of folded backbone Possible monophyly above and below
Superfamily Aldolase Recognizable sequence similarity (motifs) basic biochemistry Monophyletic origin
Family Class I Aldolase High sequence similarity (alignments) biochemical properties Evolution by ancient duplications
COG 2-keto-3-deoxy-6-phosphogluconate aldolase Orthology for a given set of species biochemical activity biological function Origin traceable to a single gene in LCA
LSE PA3131 and PA3181 Paralogy within a lineage Evolution by recent duplication and loss
34
Protein Evolution
  • Tree of Life Evolution of Protein Families
    (Dayhoff, 1978)
  • Can build a tree representing evolution of a
    protein family, based on sequences
  • Othologus Gene Family Organismal and Sequence
    Trees Match Well

35
Protein Evolution
  • Homolog
  • Common Ancestors
  • Common 3D Structure
  • Common Active Sites or Binding Domains
  • Ortholog
  • Derived from Speciation
  • Paralog
  • Derived from Duplication

36
Orthologs and Paralogs
Myo (Hagfish)
Hb (Hagfish)
HbA (Frog)
HbB (Frog)
Myo (Frog)
HbA (Cod)
HbB (Cod)
Myo (Cod)
HbA (Rat)
Myo (Rat)
HbB (Rat)
Amphibia
Mammalia
Teleostomi
Myxinidae
Tetrapoda
Vertebrata
Craniata
37
Orthologs and Paralogs
Myo (Hagfish)
Hb (Hagfish)
HbA (Frog)
HbB (Frog)
Myo (Frog)
HbA (Cod)
HbB (Cod)
Myo (Cod)
HbA (Rat)
Myo (Rat)
HbB (Rat)
COG myoglobins
COG hemoglobins
38
Orthologs and Paralogs
Myo (Hagfish)
Myo (Cod)
Orthologs (COG Myo)
Myo (Frog)
Myo (Rat)
Out-paralogs (globin family)
Hb (Hagfish)
HbA (Cod)
SubCOG
HbA (Frog)
Orthologs (COG Hb)
In-paralogs (LSE in Vertebrata)
HbA (Rat)
HbB (Cod)
SubCOG
HbB (Frog)
HbB (Rat)
39
Orthologs and Paralogs
Myo (Hagfish)
Hb (Hagfish)
HbA (Frog)
HbB (Frog)
Myo (Frog)
HbA (Cod)
HbB (Cod)
Myo (Cod)
HbA (Rat)
Myo (Rat)
HbB (Rat)
COG myoglobins
COG hemoglobins
COG hemoglobins A
40
Orthologs and Paralogs
Myo (Cod)
Orthologs (COG Myo)
Myo (Frog)
Myo (Rat)
HbA (Cod)
Out-paralogs (globin family)
Orthologs (COG HbA)
HbA (Frog)
HbA (Rat)
HbB (Cod)
Orthologs (COG HbB)
HbB (Frog)
HbB (Rat)
41
Levels of Protein Classification
Class ?/? Composition of structural elements No relationships
Fold TIM-Barrel Topology of folded backbone Possible monophyly above and below
Superfamily Aldolase Recognizable sequence similarity (motifs) basic biochemistry Monophyletic origin
Family Class I Aldolase High sequence similarity (alignments) biochemical properties Evolution by ancient duplications
COG 2-keto-3-deoxy-6-phosphogluconate aldolase Orthology for a given set of species biochemical activity biological function Origin traceable to a single gene in LCA
LSE PA3131 and PA3181 Paralogy within a lineage Evolution by recent duplication and loss
42
Protein Family-Domain-Motif
  • Domain Evolutionary/Functional/Structural Unit
  • Domain structurally compact, independently
    folding unit that forms a stable
    three-dimentional structure and shows a certain
    level of evolutionary conservation. Usually,
    corresponds to an evolutionary unit.
  • A protein can consist of a single domain or
    multiple domains. Proteins have modular
    structure.
  • Motif Conserved Functional/Structural Site

43
Protein EvolutionSequence Change vs. Domain
Shuffling
44
Recent Domain Shuffling
SF006786
CM (AroQ type)
PDH
SF001501
CM (AroQ type)
SF001499
PDH
SF005547
ACT
PDH
SF001424
PDT
ACT
SF001500
PDT
ACT
CM (AroQ type)
45
Protein classification proteins and domains
  • Option 1 classify domains
  • - take individual domain sequences, consider
    them as independently evolving units and build a
    classification system
  • allows to go all the way to the deepest possible
    level, the last point of traceable homology and
    common origin (fold)
  • domain databases (Pfam, SMART, CDD)
  • allow to map domains onto a query sequence

46
Protein classification proteins and domains
  • Option 2 classify full-length proteins
  • In cases of multidomain proteins, does not allow
    to go deep along the evolutionary tree
  • All proteins in a family will often have a common
    biological function, which is very convenient for
    annotation
  • Domains will be mapped onto protein families

47
Practical Classification of ProteinsSetting
Realistic Goals
We strive to reconstruct the natural
classification of proteins to the fullest
possible extent
48
Clasification current status
  • PIR Superfamilies
  • Proteins in PIRPSD 283,289
  • Proteins  classified   187,871
  • 2/3 of the PIR proteins
  • COGs
  • 70 of each microbial genome
  • 50 of each Eukaryotic genome in 3-clade COG
  • 20 ? of each Eukaryotic genome in LSEs

49
PIR Web Site (http//pir.georgetown.edu)
50
PIR Superfamily Concept
  • Whole (Full-Length) Proteins
  • Homeomorphic (Common Domain Architecture)
  • Monophyletic (Common Evolutionary Origin)
  • Hierarchical structure (Family and Superfamily)
  • Non-Overlapping placement within each level
  • PIR Superfamily vs. Other Concepts
  • Evolution Superfamily hierarchy reflects
    orthology and paralogy
  • Structure PIR superfamily generally corresponds
    to SCOP family
  • Domain Domains are mapped onto the Superfamily
  • Motif Motifs (functional/structural sites) are
    mapped onto the Superfamily
  • Function a Superfamily may contain divergent
    functions

51
PIR Superfamilies
  • Created by automated clustering by identity
    with coverage-by-length requirements. Creation
    of new Superfamilies is an ongoing process.
  • Automated classification rules are refined by
    expert curation
  • - Evolution rates are very different in
    different branches of the protein universe, so
    need very different score cutoffs
  • Verify/add members
  • Annotation (at level of orthology) Superfamily
    Name, Description, Bibliography
  • In some cases, more than one orthologous group
    will be included into a single Superfamily these
    Superfamilies will often be very large and
    diverse
  • Depth of hierarchy will be different for
    single-domain and multidomain proteins
  • This is work in progress and will become
    available through PIR (iProClass) and InterPro

52
CM-Related Superfamilies
  • Chorismate Mutase (CM), AroQ class
  • SF001501 CM (Prokaryotic type) PF01817
  • SF001499 tyrA bifunctional enzyme (Prok)
    PF01817-PF02153
  • SF001500 pheA bifunctional enzyme (Prok)
    PF01817-PF00800
  • SF017318 CM (Eukaryotic type) Regulatory
    Dom-PF01817
  • Chorismate Mutase, AroH class
  • SF005965 CM PF01817

53
iProClass Superfamily Report (I)
54
iProClass Superfamily Report (II)
55
InterPro
  • InterPro is an integrated resource for protein
    families, domains and sites.
  • - InterPro combines a number of databases that
    use different methodologies. By uniting the
    member databases, InterPro capitalizes on their
    individual strengths, producing a powerful
    integrated diagnostic tool.
  • Member databases PROSITE, PRINTS, Pfam, SMART,
    ProDom, and TIGRFAMs
  • PIR to be added soon
  • SWISSPROT and TrEMBL matches used as examples

56
InterPro Entry
InterPro Entry Type defines the entry as a
Family, Domain, Repeat, or Post-translational
modification site (other sites to be added
binding site, active site). Family protein
family. PIR SFs will generally belong to this
type. Contains field lists domains within this
protein Found in for domain entries, lists
families which contain this domain
57
PIR Superfamilies are being integrated into
InterPro
InterPro Entry Type Family SF001500 Bifunctional
chorismate mutase / prephenate dehydratase
(P-protein)
58
complete genomes- reciprocal best hits- no
score cutoffs Comparative genomics - a branch
of computational biology that uses complete
genome sequences
COGs Clusters of Orthologous Groups
59
Construction of COGs
Genome 2
Genome 1
60
Construction of COGs
Yeast YLR377c
Bidirectional best hit
Triangle - the simplest COG
Bidirectional best hit
E. coli fbp
Bidirectional best hit
Synechocystis slr0952
61
Construction of COGs
Merge triangles
62
Construction of COGsAdd all homologs
New protein
?
Yeast YLR377c
E. coli fbp
Synechocystis slr0952
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
In COGs, the dilemma between the depth of
analysis and protein integrity is approached by
keeping proteins intact whenever possible, and
dividing into modules (single- or multidomain)
when necessary
67
Case Study 1 Prediction verified GGDEF domain
  • Proteins containing this domain Caulobacter
    crescentus PleD controls swarmer cell - stalk
    cell transition (Hecht and Newton, 1995). In
    Rhizobium leguminosarum, Acetobacter xylinum,
    required for cellulose biosynthesis (regulation)
  • Predicted to be involved in signal transduction
    because it is found in fusions with other
    signaling domains (receiver, etc)
  • In Acetobacter xylinum, cyclic di-GMP is a
    specific nucleotide regulator of cellulose
    synthase (signalling molecule). Multidomain
    protein with GGDEF domain was shown to have
    diguanylate cyclase activity (Tal et al., 1998)
  • Detailed sequence analysis tentatively predicts
    GGDEF to be a diguanylate cyclase domain (Pei and
    Grishin, 2001)
  • Complementation experiments prove diguanylate
    cyclase activity of GGDEF (Ausmees et al., 2001)

68
Case study 2 Defining a novel domain family
Prokaryotic Response Regulatiors (RRs)
Variable - DNA-binding - Enzymatic
CheY-like receiver
Output
What if domain is not described yet?
CheY receiver
PSY-BLAST with C-terminal portion alone
69
Two Groups of Unusual RRs Receiver-X
SF006198, COG3279
  • 1. AlgR-related
  • Pseudomonas aeruginosa (AlgR) alginate
    biosynthesis
  • Klebsiella pneumoniae (MrkE) formation of
    adhesive fimbriae
  • Clostridium perfringens (VirR) virulence factors
  • 2. Regulators of autoinduced
    peptide-controlled regulons
  • Staphylococcus aureus (AgrA) virulence factors
  • Lactobacillus plantarum (PlnC, PlnD) bacteriocin
    production
  • Streptococcus pneumoniae (ComE) competence
  • Properties of the CheY- LytTR transcriptional
    regulators
  • Regulate secreted and extracellular factors
  • Often regulate their own expression
  • Bind to imperfect direct repeat sites in -80 to -
    40 area (or in UAS)
  • Can be phosphorylated by His kinases, but form
    operons with HisK-type sensor ATPases
  • Contain a conserved LytTR-type DNA-binding domain

70
LytTR - a new DNA-binding domain not similar to
HTH, winged helix, or ribbon-helix-helix
DNA-binding domains
71
Domain organization of LytTR proteinsother than
CheY-LytTR
  • Stand-alone LytTR Streptococcus pneumoniae
    BlpS Pseudomonas phage D3 Orf50
  • 40aa - LytTR Lactococcus lactis
    L121252 Listeria monocytogenes
    Lmo0984 Staphylococcus aureus
    SA2153 Streptococcus pneumoniae SP0161
  • ABC - LytTR Bacillus halodurans BH3894
  • MHYT - LytTR Oligotropha carboxydovora CoxC,
    CoxH
  • 3TM - LytTR Xanthomonas campestris
    RpfD Caulobacter crescentus CC1610 Mesorhizo
    bium loti mll0891
  • 3TM - LytTR Caulobacter crescentus CC0295
  • 4TM - LytTR Caulobacter crescentus CC0330,
    CC3036
  • 8TM - LytTR Caulobacter crescentus CC0551
  • PAS - LytTR Burkholderia cepacia Geobacter
    sulfurreducens

72
Consensus binding site for the LytTR domains
  •  
  •  

73
Predicted LytTR-regulated genes
  • Expected
  • Bacillus subtilis natAB (Na-ATPase)
  • Oligotropha carboxidovorans   comC, comH (CO
    growth)
  • Staphylococcus aureus lrgAB (autolysis)
  • Streptococcus pneumoniae hld (hemolysin delta)
  • Unexpected 
  • Bacillus subtilis alr, dinB, rapI, veg,
  • ybaJ, ybbI, yceA, ydbS, ydjL, yebB,
    yfiV, ykuA
  • Staphylococcus aureus capO, coa, hsdR, SA0096,
    SA0257, SA0285, SA0302, SA0357, SA0358,
    SA0513

74
Impact of genomics
  • Single protein level
  • Discovery of new enzymes and superfamilies
  • Prediction of active sites and 3D structures
  • Pathway level
  • Identification of missing enzymes
  • Prediction of alternative enzyme forms
  • Identification of potential drug targets
  • Cellular metabolism level
  • Multisubunit protein systems
  • Membrane energy transducers
  • Cellular signaling systems

75
Examples for analysis
  • 1. Retrieve one of the following protein
    sequences
  • PIR C69086 D64376 GenBank
    GI15679635. Using analysis tools available on
    the web, check if the functional annotation is
    correct, and provide correct annotation without
    looking at internal PIR or COG annotations (Run
    BLAST with CDsearch and SMART to start with).
    When you are done, look at the PIR curated SF
    annotation (still at internal site only)
  • http//pir.georgetown.edu/test-cgi/sf/pirclassif.p
    l?idSF006549
  • http//pir.georgetown.edu/cgi-bin/ipcSF1?idSF0065
    49 (compare with original automatic SF
    annotation at the public site), and at COG
    annotations. What caused the wrong annotations?
    In BLAST outputs for these sequences, do you see
    other wrongly annotated proteins?
  • Next, analyze the C-terminal domain of these
    proteins by PSI-BLAST (and alignment analysis)
    and suggest any speculations as to its function
    (homework).
  •  

76
Examples for analysis
  • 2.
  • Retrieve the following sequence GI7019521
  • Take a look at the associated publication
    (reference).
  • Analyze the sequence to see if any additional
    information can be obtained (run PSI-BLAST, and
    (as a homework) construct multiple alignment).
  • Take a look at taxonomy report what does it tell
    you?
  • Find experimental paper associated with one of
    the sequences found by PSI-BLAST. What
    annotation is appropriate for this sequence and
    for the entire family?

77
Examples for analysis
  • 3.
  • Predict the function of the following proteins
  • GenBank GI 27716853
  • E. coli YjeE protein
  • Verify and/or correct the following functional
    annotations. Can you explain why the erroneous
    annotations were made?
  • PIR H87387
  • GenBank GI15606003 GI15807219
  • PIR F70338

78
Examples for analysis
  • 4. Homework an exercise in transitive
    relationshipsStart withgtgi20093648refNP_6134
    95.1 Uncharacterized membrane protein, conserved
    in Archaea Methanopyrus kandleri AV19(this is
    a short membrane protein) run PSI-BLAST, make
    sure you have filtering, complexity and CD-search
    off. There are no good hits but a bunch of
    sub-threshold ones. Collect "suspect" relations,
    use them as queries and expand the net. You will
    be able to come up with two proteinsgtgi21227474
    refNP_633396.1 hypothetical protein
    Methanosarcina mazei Goe1 andgtgi14324537dbjB
    AB59464.1 hypothetical protein Thermoplasma
    volcaniumWhen used as a PSI-BLAST query, the
    first will tie the Methanopyrus protein into a
    group, while the second will tie this group to
    the Sec61 subunit of preprotein
    translocase.Then, of course, you can obtain the
    same result with CD-search in a single step ?.
Write a Comment
User Comments (0)
About PowerShow.com