Pattern databases in protein analysis - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Pattern databases in protein analysis

Description:

Genpept protein sequence database translated from GenBank ... hits to unannotated proteins will no unravel the possible function of the query protein ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 18
Provided by: coccidi
Category:

less

Transcript and Presenter's Notes

Title: Pattern databases in protein analysis


1
Pattern databases in protein analysis
  • Arthur Gruber

Instituto de Ciências Biomédicas Universidade de
São Paulo
AG-ICB-USP
2
Protein databases
  • Genpept protein sequence database translated
    from GenBank
  • UniProtKB/TrEMBL is a computer-annotated
    protein sequence database complementing the
    UniProtKB/Swiss-Prot Protein Knowledgebase.
  • UniProtKB/Swiss-Prot is a curated protein
    sequence database that provides a high level of
    annotation, a minimal level of redundancy and a
    high level of integration with other databases.

AG-ICB-USP
3
How to assign protein functions?
  • Similar proteins may share common functions, but
    proteins that share common domains may have
    evolved to perform distinct functions
  • Proteins that exert similar function may share
    common domains, but domain sequences are not
    always very similar more refined are requires
    than simply similarity searches
  • Proteins may share common domains, but have
    different architectures no single domain are
    necessarily involved with protein function. Many
    proteins use multiple domains to perform their
    activities

AG-ICB-USP
4
Some conclusions
  • Similarity searches may reveal proteins that
    share very similar sequences and functions
    high similarity over the full length of the query
    sequence
  • An output with no significant hits or with hits
    to unannotated proteins will no unravel the
    possible function of the query protein
  • Similarity searches do not differentiate
    orthologues from paralogues
  • When matching multidomain proteins, it may not be
    appropriate to transfer the functional annotation
    the context is important!

AG-ICB-USP
5
So what do proteins with similar function have in
common?
AG-ICB-USP
6
residues, motifs, domains, architecture
AG-ICB-USP
7
Pattern databases
  • Databases that contain patterns of residue
    conservation within groups of related sequences
  • There are several methods to determine patterns
  • There are many different pattern databases

AG-ICB-USP
8
Pattern databases
AG-ICB-USP
9
Common protein pattern databases
  • Prosite patterns regular expressions
  • Prosite profiles weight matrices (profiles)
  • Pfam database of protein domain families.
    Contains curated multiple sequence alignments for
    each family and corresponding HMMs
  • Prints database of groupf of motifs that in the
    context of being together, are more potent for
    assign protein function
  • Prodom automatedly generated databases based on
    a recursive use of PSI-BLAST similarity searches
  • Interpro an integrated databaes that combines
    different protein signature recognition methods
    in one single resource

AG-ICB-USP
10
How to start building a pattern database?
  • Prosite patterns regular expressions
  • Prosite profiles weight matrices (profiles)
  • Pfam database of protein domain families.
    Contains curated multiple sequence alignments for
    each family and corresponding HMMs
  • Prints database of groupf of motifs that in the
    context of being together, are more potent for
    assign protein function
  • Prodom automatedly generated databases based on
    a recursive use of PSI-BLAST similarity searches
  • Interpro an integrated databaes that combines
    different protein signature recognition methods
    in one single resource

AG-ICB-USP
11
How to start building a pattern database?
AG-ICB-USP
12
How to start building a pattern database?
  • With multiple sequence alignments of functionally
    related proteins

AG-ICB-USP
13
Some definitions
  • Protein motif a single conserved region
  • Prosite pattern a consensus expression of a
    conserved region
  • Frequency matrices (PRINTS) matrices that
    contain the frequencies in which residures occur
    in a given motif
  • PSSM position specific score (weight) matrices
    (BLOCKS) add a scoring scheme to the frequency
    matrices
  • HMMs profiles probabilistic models derived from
    alignment profiles
  • Protein domain - is a part of protein sequence
    and structure that can evolve, function, and
    exist independently of the rest of the protein
    chain.

AG-ICB-USP
14
AG-ICB-USP
15
AG-ICB-USP
16
AG-ICB-USP
17
AG-ICB-USP
Write a Comment
User Comments (0)
About PowerShow.com