protein - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

protein

Description:

Predicting Protein Function protein RNA DNA Y14 PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNIHLNLDRRTGFSKGYALVEYETHKQALAAKEALNGAEIMGQTIQVDWCFVKG G Y14 DOES NOT BIND RNA ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 42
Provided by: est3
Category:

less

Transcript and Presenter's Notes

Title: protein


1
Predicting Protein Function
protein
RNA
DNA
2
Biochemical function (molecular function)
What does it do? Kinase??? Ligase???
Page 245
3
Function based on ligand binding specificity
What (who) does it bind ??
Page 245
4
Function based on biological process
What is it good for ?? Amino acid metabolism?
Page 245
5
Function based on cellular location
DNA
RNA
Where is it active?? Nucleolus ?? Cytoplasm??
Page 245
6
Function based on cellular location
DNA
RNA
Where is the Protein Expressed ?? Brain? Testis?
Where it is under expressed??
Page 245
7
GO (gene ontology)http//www.geneontology.org/
  • The GO project is aimed to develop three
    structured, controlled vocabularies (ontologies)
    that describe gene products in terms of their
    associated
  • molecular functions (F)
  • biological processes (P)
  • cellular components (C)

Ontology is a description of the concepts and
relationships that can exist for an agent or a
community of agents
8
Inferring protein function Bioinformatics
approach
  • Based on homology
  • Based on functional characteristics
  • protein signature

9
Homologous proteins
  • Rule of thumbProteins are homologous if 25
    identical (length gt100)

10

Proteins with a common
evolutionary origin
Homologous proteins
Orthologs - Proteins from different species that
evolved by speciation.
Hemoglobin human vs Hemoglobin mouse
Paralogs - Proteins encoded within a given
species that arose from one or more gene
duplication events.
Hemoglobin human vs Myoglobin human
11
COGsClusters of Orthologous Groups of proteins
  • gt Each COG consists of individual orthologous
    proteins or orthologous sets of paralogs.
  • gt Orthologs typically have the same function,
    allowing transfer of functional information from
    one member to an entire COG.

Refence Classification of conserved genes
according to their homologous relationships.
(Koonin et al., NAR)
DATABASE
12
Inferring protein function based on the protein
signature
13
The Protein Signature
  • Expression Pattern
  • Where it is expressed ?
  • Motif (or fingerprint)
  • a short, conserved region of a protein
  • typically 10 to 20 contiguous amino acid
    residues
  • Domain
  • A region of a protein that can adopt a 3
    dimensional structure

14
Protein Motifs
Protein motifs can be represented as a consensus
or a profile
1
50 ecblc MRLLPLVAAA TAAFLVVACS
SPTPPRGVTV VNNFDAKRYL GTWYEIARFD vc
MRAIFLILCS V...LLNGCL G..MPESVKP VSDFELNNYL
GKWYEVARLD hsrbp MKWVWAL LLLAAWAAAE
RDCRVSSFRV KENFDKARFS GTWYAMAKKD
GTWYEI K AV M
GXWYFEAIVLM
15
Searching for Protein Motifs
- ProSite a database of protein patterns that can
be searched by either regular expression
patterns or sequence profiles. - PHI BLAST
Searching a specific protein sequence pattern
with local alignments surrounding the match.
-MEME searching for a common motifs in
unaligned sequences
16
Protein Domains
  • Domains can be considered as building blocks of
    proteins.
  • Some domains can be found in many proteins with
    different functions, while others are only found
    in proteins with a certain function.

17
DNA Binding domainZinc-Finger
18
Varieties of protein domains
Extending along the length of a protein
Occupying a subset of a protein sequence
Occurring one or more times
Page 228
19
Example of a protein with 2 domains Methyl CpG
binding protein 2 (MeCP2)
MBD
TRD
The protein includes a Methylated DNA Binding
Domain (MBD) and a Transcriptional Repression
Domain (TRD). MeCP2 is a transcriptional
repressor.
20
Result of an MeCP2 blastp search A
methyl-binding domain shared by several proteins
21
Are proteins that share only a domain homologous?
22
Pfam
  • gt Database that contains a large collection of
    multiple sequence alignments of protein domains
  • Based on
  • Profile hidden Markov Models (HMMs).

23
Profile HMM (Hidden Markov Model)
HMM is a probabilistic model of the MSA
consisting of a number of interconnected states
D19
D16
D17
D18
100
delete
100
16 17 18 19
50
M16
M17
M18
M19
D R T R D R T S S - - S S P T R D R T R D P
T S D - - S D - - S D - - S D - - R
100
100
50
D 0.8 S 0.2
P 0.4 R 0.6
R 0.4 S 0.6
Match
T 1.0
I16
I19
I18
I17
insert
X
X
X
X
24
Pfam
gt Database that contains a large collection of
multiple sequence alignments of protein
domains Based on Profile Hidden Markov Models
(HMMs).
  • gt The Pfam database is based on two distinct
    classes of alignments
  • Seed alignments which are deemed to be accurate
    and used to produce Pfam A
  • -Alignments derived by automatic clustering of
    SwissProt, which are less reliable and give rise
    to Pfam B

25
Physical properties of proteins
26
DNA binding domains have relatively high
frequency of basic (positive) amino acids
M K D P A A L K R A R N T E A A R R S S R A R K L
Q R M
GCN4
zif268
M E R P Y A C P V E S C D R R F S R S D E L T R H
I R I H T
S K V N E A F E T L K R C T S S N P N Q R L P K
V E I L R N A I R
myoD
27
Transmembrane proteins have a unique
hydrophobicity pattern
28
Knowledge Based Approach
  • IDEA
  • Find the common properties of a protein
    family (or any group of proteins of interest)
  • which are unique to the group and different
    from all the other proteins.
  • Generate a model for the group and predict new
    members of the family which have similar
    properties.

29
Knowledge Based Approach
Basic Steps 1. Building a Model
  • Generate a dataset of proteins with a common
    function (DNA binding protein)
  • Generate a control dataset
  • Calculate the different properties which are
    characteristic of the protein family you are
    interested for all the proteins in the data (DNA
    binding proteins and the non-DNA binding proteins
  • Represent each protein in a set by a vector of
    calculated features and build a statistical model
    to split the groups

30
Basic Steps 2. Predicting the function of a new
protein
  • Calculate the properties for a new protein
  • And represent them in a vector
  • Predict whether the tested protein belongs to the
    family

31
TEST CASE
Y14 A protein sequence translated from an ORF
(Open Reading Frame) Obtained from the
Drosophila complete Genome
gtY14 PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNIHLNLDRRTG
FSKGYALVEYETHKQALAAKEALNGAEIMGQTIQVDWCFVKG G
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
gtY14 PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNIHLNLDRRTG
FSKGYALVEYETHKQALAAKEALNGAEIMGQTIQVDWCFVKG G
Y14 DOES NOT BIND RNA
38
Projects 2011-12
39
Instructions for the final project Introduction
to Bioinformatics 2011-12
Key dates 19.12 lists of suggested projects
published You are highly encouraged to choose
a project yourself or find a relevant project
which can help in your research 29.1 Submission
project overview (power point presentation Max 5
slides) -Title -Main question -Major Tools you
are planning to use to answer the
questions 30.1/31.1 Presentation of project
overview 7.3 Poster submission 14.3 Poster
presentation
40
2. Planning your research After you have
described the main question or questions of your
project, you should carefully plan your next
steps A. Make sure you understand the problem and
read the necessary background to proceed B.
formulate your working plan, step by step C.
After you have a plan, start from extracting the
necessary data and decide on the relevant tools
to use at the first step. When running a tool
make sure to summarize the results and extract
the relevant information you need to answer your
question, it is recommended to save the raw data
for your records , don't present raw data in your
final written project. Your initial results
should guide you towards your next steps. D. When
you feel you explored all tools you can apply to
answer your question you should summarize and get
to conclusions. Remember NO is also an answer as
long as you are sure it is NO. Also remember this
is a course project not only a HW exercise. .

41
  • Summarizing final project in a poster (in pairs)
  • Prepare in PPT poster size 90-120 cm
  • Title of the project
  • Names and affiliation of the students presenting
  • The poster should include 5 sections
  • Background should include description of your
    question (can add figure)
  • Goal and Research Plan
  • Describe the main objective and the research plan
  • Results (main section) Present your results in
    3-4 figures, describe each figure (figure
    legends) and give a title to each result
  • Conclusions summarized in points the
    conclusions of your project
  • References List the references of
    paper/databases/tools used for your project

Examples of posters will be presented in class
Write a Comment
User Comments (0)
About PowerShow.com