Title: My Masters Work
1My Masters Work
2Outline of the talk
- Analysis of Phylogeny Tree Evaluation Approaches
(Project done in CS641). - Proteomics and 2-D Gel Electrophoresis (Study
done for CS) - Coexpression analysis of dimerization between
bZIP proteins in groups C, S1 and S2 in
Arabidopsis Thaliana, under the conditions of
differential light and CO2 levels (Project done
for BST676).
3Analysis of Phylogeny Tree Evaluation Approaches
4Phylogenetic Analysis
- Alignment of the sequences
- Determining the presence of relationship between
sequences - Decision of most appropriate tree building
algorithm - Scrutinize the tree to determine level of
confidence
5- Algorithmic Method
- Defines an algorithm that leads to the
determination of a tree. - Criteria Based Method
- Defines a criterion for comparing different
phylogenies and therefore phylogenies can be
ranked, and comparison possible.
6(No Transcript)
7Maximum Parsimony Method
- Most parsimonious tree will explain the observed
character distribution with a tree that have the
minimum tree length. - Tree selection criterion - Minimum tree length
- (Fewest character state transformation)
8Maximum Likelihood (ML)
- ML evaluates the probability that the chosen
evolutionary model will have generated the
observed sequences. - Evolutionary Model Accounts for the changes in
sequences. - Phylogenies are then inferred by finding those
trees that yield the highest likelihood.
9Distance Based Method
- Distance-based methods attempts to find the
distance that is the total changes between the
two taxons from the point where they last shared
an ancestor. - It is a cluster based method.
10Software used.
- PHYLIP
- To compare the three phylogeny methods.
Programs used from the package are - Maximum Parsimony DNAPARS
- Maximum Likelihood DNAML
- Distance-based DNADIST and Neighbor
- Tree constructed using DRAWGRAM
- Consensus tree constructed using CONSENSUS
11Using Sample data
Maximum parsimony
Maximum likelihood
Distance Based
DNAPARS DNAML Neighbor
12Consensus tree for given example
------Human
--1.0- ------Orang
------ ------Rhesus
--1.0-
------Gorilla --------------------Chimp
------Human
--1.0- --1.0-
------Chimp ------
-------------Orang
--------------------Rhesus
---------------------------Gorilla
Parsimony Method
Maximum Likelihood
-------------Orang
--1.0-
------Chimp ------ --1.0-
------Human
--------------------Rhesus
---------------------------Gorilla
Distance Based/Neighbor joining
13Observation
- Reliability of branch length estimates
- NJ and MLgt MP
- Computational speed (ngt500)
- NJ/DNADIST 0.005 seconds
- DNAPARS 0.5 seconds
- DNAML 230.0 seconds
14Conclusion
- Our experiments and the results obtained indicate
that the Distance Based method is better than the
other two methods in terms of Fastness,
Simplicity and good performance for high number
of taxa. - Also we can say that if you have a fast computer
and large dataset Maximum likelihood method is
better than Maximum parsimony.
15Proteomics and 2-D gel Electrophoresis
16Introduction
- The entire set of proteins expressed by the
genome in a cell, organ or organism is referred
to as the proteome. -
- Proteomics Methods that discover and quantify
proteins and their biochemical changes.
17Application of Proteomics
- Protein Mining
- Network Mapping
- Mapping Protein Modifications
18Proteomics Analysis
Reference www.mbi.osu.edu/sciprograms/prfmaterial
s/vandre.ppt
192-D Gel Electrophoresis
- The horizontal position tells us about the charge
of a protein, whereas the intensity of the gel
spot tells us about the amount of that protein in
the system. - Steps-
- 1. Prepare protein sample in solution
- 2. Separate proteins (in each dimension)
- I. Based on pH
- Using isoelectric focusing (IEF)
- Using immobilized pH gradient (IPG) strips
- II. Based on molecular weight (size)
- Using gel electrophoresis
- 3. Stain proteins to enable visualization.
20Introduction to the project
- This project focuses on 2D gel electrophoretic
separation of proteins. - We analyzed few random spots from the 2D gels of
rat mammary tissue. - Statistical methods to find the variance in pI of
the same protein in different gels. - Analyzed the reasons for these differences.
- Inferred the relationship between the
experimental values and the predicted values.
21Images of the gels used in the project.
22One of the gels with Protein Spots
23- The Gels we used were from an already done
experiment. 28 Random protein spots were selected
based on the their intensity from each of the
three gels. - Mass Spectrometry
- Differentially expressed proteins identified by
image analysis were excised from 2D gels and
trypsin digested. The resulting peptide fragments
were analyzed on a MALDI mass spectrometer (MS).
The MALDI spectra displays a peptide
fingerprint of the protein using corresponding
peptide masses.
24MALDI TOF MS
25- Proteins were identified by entering the masses
(ions from MALDI spectrum) of the peptides into
a peptide mapping database. Some examples of such
protein search engine are- - Mascot - very popular and also used in this
project - Sequest
- Aldente
- ProteinLynx
- Phenyx
26Image of a search data base
27Results
- We tabulated the result obtained from the
database internet search and the one we obtained
from the experiment. - We observed that the pI values as well as the
molecular weight were not same in all gels for
same protein. - The pI values of the three gels were quite
similar but they were different from the
predicted pI values.
28- In a 2D gel the position of protein spot can
change due to various reasons and because of
which the molecular weight and pI values may also
differ.
29Graphical representation of pI values of three
gels
30Graph showing the variance among the predicted pI
and observed pI
31Observations
- We saw that the difference between the pI values
of the three gels that is the experimental values
are not very different from each other. - So we can interpret that the difference due to
non biological reason is very less in the
experiment. - There were few protein spots for which internet
search revealed the same result as same protein
name. But our experiment gave different results
which can be because of different group (like
phosphate or sulphate) getting attached to it.
There can be other reasons for it too. -
32- Average deviations between the three observed
proteins and the predicted pI values were
calculated as - (pI (gel 12_5)- pred. pI) (pI (gel 12_5)-
pred. pI) (pI (gel 12_5)- pred. pI) / 3 - This gave the results shown in the next slide.
We obtained positive as well as negative values
for the deviations.
33Average deviations between the three gels and the
predicted pI
34- We can interpret that the proteins were modified
more by negatively charged group such that there
pI values decreased. - The addition of one phosphate groups to serine,
threonine, and tyrosine residues typically
decreases their isoelectric points by 0.1 pH
unit.
35Regression results
- A statistical analysis test was performed to
determine which of the three gels were closest to
the predicted pI values. That is in which of the
three gels had the proteins being least modified. - The test was Clibration test. We prepared a
regression model for each gel. The inverse
regression equation used was - Predicted pI Observed pI from Gel Intercept
slope
36Predicted pI values from the Calibration test and
internet database
37- The result we obtained showed us that all the
three gels predicted almost same pI values and
they were quite away from the original predicted
pI values. - All these similarities between the three gels
show us that the difference between the pI values
of proteins between the predicted and the
experimented values is not very much because of
non biological factors, but because of chemical
modifications in the proteins.
38Coexpression analysis of dimerization between
bZIP proteins in groups C, S1 and S2 in
Arabidopsis Thaliana, under the conditions of
differential light and CO2 levels.
39IntroductionTranscription factor
- Transcription factor are proteins involved in the
regulation of gene expression, that bind to
promoter region upstream of genes. - They are composed of two essential functional
regions - DNA binding domain It binds to DNA.
- Activator Domain It interacts with other
regulatory proteins there by affecting the
efficiency of DNA binding.
40bZIP proteins
- bZIP proteins are a class of transcription factor
which has leucine zipper motif consisting of a
periodic repetition of a leucine residue at every
seventh position forming an alpha-helical
confirmation. - The segment that comprises the basic region and
the periodic array of leucine residues is
referred to as basic-region leucine zipper or
bZIP motif.
41(No Transcript)
42Some facts
- There are 792 bZIP proteins recorded in
nonredundant database. - The no of bZIP proteins in the cell of selected
organisms are as follows - yeast 16
- fruitfly 110
- plant (Arabidopsis thaliana) 75
- Human - 114
43Arabidopsis
- The Arabidopsis genome sequence contains 75
distinct members of the bZIP family, of which 50
of them are not well studied. - Using common domains the bZIP family can be
subdivided into 10 groups Groups A - S.
44(No Transcript)
45(No Transcript)
46C S protein interaction
- Elhert et al measured interactions between C and
S proteins. - C and S1 heterodimerized
- Two S2 proteins dimerized.
47Effect of Light CO2 on C S proteins
- Carbohydrate signaling
- Increase of carbohydrate partitioning in
elevated CO2, and a decrease in low light. - Seed development
- Photosensory system detects the quality,
quantity, direction and duration of light.
Controls developmental pattern. - Stress
- Light dependent generation of active oxygen
species is a type of stress called photo
oxidative stress.
48Experiment Selection Criteria
- a) Chose C and S bZIP proteins
- Coexpression Engine http//www.ssg.uab.edu/coexpr
ession - b) Selected tissue and array type
- c) Chose specific experiment
49a) Chose C and S bZIP proteins
50b) Selected tissue and array type
51c) Chose specific experimentNASC Experiments
52Justification
- Biologically feasible comparisons due to similar
- Tissue types
- Experiment conditions
- Statistical
- Measurement protocol
53The tool used
- Co-expression Analysis Tool, version 2.0
developed at the Section on Statistical Genetics,
UAB http//obiwan.ssg.uab.edu8080/coexpression/se
rvlets/CoexpReleasesResponseManager - mainly built to analyze the co-expression in
Arabidopsis plant. - NASC Experiments to study affymetrix gene chip
profiling of light and CO2 effect in leaf
development in Arabidopsis used.
54- Uses the database built from Nottingham
Arabidopsis Stock Center (NASC) AffyWatch
Service. - Version 2 used in this project contains total of
566 microarray chips out of which 486 ATH1 micro
array chips were used.
55NASC Experiments used
- 4 experiments conducted to examine the effect of
developing leaf insertions under varying
conditions of light and CO2. - The sampling was done at time interval of 0th,
2nd, 4th, 12th, 24th, 48th and 96th hour using a
batch of 24 plants. - Four replicates were produced for each of the
seven time points per experiment.
56Working of the tool
- Linear regression analysis is done on the probe
sets. - Result of regression gives three important
values- slope parameter (indicating the direction
of co-expression), p-value (stating the
confidence in the correlation) and R squared
values (strength of correlation).
57Procedure
- 4 genes of C group, 5 genes of S1 group and 3
genes of S2 group were studied in the project. - We submit the AGI IDs, the tissue type (here
leaf) and the experiment number (in our case 156,
157 158 and 159) in the tool. - Our genes of interest are regressed on all the
22,810 ATH1 probe sets and a p-value, R squared
value and slope parameter is obtained.
58- Those genes were subsequently sorted according to
the R squared value and p-value and ranked such
that - Higher the R squared value, higher is the
rank. - An arbitrary cut-off 15 of the top ranked genes
were identified as highly co-expressed.
59Hypothesis
- Genes coding for dimerizing proteins should be
coexpressed at the same time. - If genes in group C and S1 lead to
heterodimerization then they should be
coexpressed at the same time.
60Table 2 Mapping information between AtbZIP AGI
ATH Probeset AtbZIP Group Ids
61Table 3 Regression estimates between Group C
AtbZIIP63 (245925_at) and Probes in Group S1, C
and S2.
62Table 4 Regression estimates between Group C
AtbZIIP25 (251848_at) and Probes in Group S1, C
and S2.
63Regression estimates between Group C AtbZIIP9
(246962_s_at) and Probes in Group S1, C and S2.
64(No Transcript)
65(No Transcript)
66(No Transcript)
67(No Transcript)
68(No Transcript)
69(No Transcript)
70(No Transcript)
71Results
- bZIP1(Group S1) coexpresses well with bZIP63
(S1) under conditions of Ambient Co2 and low
light but the same coexpression interaction is
weak under conditions of Elevated Co2 and Ambient
Light. - Also, very minimal interaction was found between
genes of Group C (bZIP25, bZIP10, bZIP9, and
bZIP63) and bZIP9 (Group C
72Conclusion
- This bZIP study was a good litmus test for the
SSG Coexpression Tool. - Results presented in this study provide evidence
that a good if not significant number of AtbZIP
proteins interacting as heterodimers are
co-regulating under varying conditions of stress.
- This study shows evidence that coexpression
patterns in genes can be studied by pooling
publicly available microarray data and that the
use of simple linear regression procedure is
feasible.
73Discussion
- Varying trends in the coexpression proposes some
theories - Different genes are expressed in diff tissues. Is
study on leaf good enough to support our
hypothesis? - Time-course data is valuable and should be
accounted for in the analysis. However, this kind
of analysis requires more observation recorded at
different timepoints. - Linear regression is good but will a robust
time-series based approach be appropriate in our
study?