My Masters Work - PowerPoint PPT Presentation

1 / 73

About This Presentation

Title:

My Masters Work

Description:

... of active oxygen species is a type of stress called photo oxidative stress. ... database built from Nottingham Arabidopsis Stock Center (NASC) AffyWatch Service. ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 74

Provided by: Office2004652

Category:

more less

Transcript and Presenter's Notes

Title: My Masters Work

1
My Masters Work

Richa Tiwari

2
Outline of the talk

Analysis of Phylogeny Tree Evaluation Approaches
(Project done in CS641).
Proteomics and 2-D Gel Electrophoresis (Study
done for CS)
Coexpression analysis of dimerization between
bZIP proteins in groups C, S1 and S2 in
Arabidopsis Thaliana, under the conditions of
differential light and CO2 levels (Project done
for BST676).

3
Analysis of Phylogeny Tree Evaluation Approaches
4
Phylogenetic Analysis

Alignment of the sequences
Determining the presence of relationship between
sequences
Decision of most appropriate tree building
algorithm
Scrutinize the tree to determine level of
confidence

Algorithmic Method
Defines an algorithm that leads to the
determination of a tree.
Criteria Based Method
Defines a criterion for comparing different
phylogenies and therefore phylogenies can be
ranked, and comparison possible.

6
(No Transcript)
7
Maximum Parsimony Method

Most parsimonious tree will explain the observed
character distribution with a tree that have the
minimum tree length.
Tree selection criterion - Minimum tree length
(Fewest character state transformation)

8
Maximum Likelihood (ML)

ML evaluates the probability that the chosen
evolutionary model will have generated the
observed sequences.
Evolutionary Model Accounts for the changes in
sequences.
Phylogenies are then inferred by finding those
trees that yield the highest likelihood.

9
Distance Based Method

Distance-based methods attempts to find the
distance that is the total changes between the
two taxons from the point where they last shared
an ancestor.
It is a cluster based method.

10
Software used.

PHYLIP
To compare the three phylogeny methods.
Programs used from the package are
Maximum Parsimony DNAPARS
Maximum Likelihood DNAML
Distance-based DNADIST and Neighbor
Tree constructed using DRAWGRAM
Consensus tree constructed using CONSENSUS

11
Using Sample data
Maximum parsimony
Maximum likelihood
Distance Based

DNAPARS DNAML Neighbor
12
Consensus tree for given example
------Human
--1.0- ------Orang
------ ------Rhesus
--1.0-
------Gorilla --------------------Chimp
------Human
--1.0- --1.0-
------Chimp ------
-------------Orang
--------------------Rhesus
---------------------------Gorilla
Parsimony Method
Maximum Likelihood
-------------Orang
--1.0-
------Chimp ------ --1.0-
------Human
--------------------Rhesus
---------------------------Gorilla
Distance Based/Neighbor joining
13
Observation

Reliability of branch length estimates
NJ and MLgt MP
Computational speed (ngt500)
NJ/DNADIST 0.005 seconds
DNAPARS 0.5 seconds
DNAML 230.0 seconds

14
Conclusion

Our experiments and the results obtained indicate
that the Distance Based method is better than the
other two methods in terms of Fastness,
Simplicity and good performance for high number
of taxa.
Also we can say that if you have a fast computer
and large dataset Maximum likelihood method is
better than Maximum parsimony.

15
Proteomics and 2-D gel Electrophoresis
16
Introduction

The entire set of proteins expressed by the
genome in a cell, organ or organism is referred
to as the proteome.
Proteomics Methods that discover and quantify
proteins and their biochemical changes.

17
Application of Proteomics

Protein Mining
Network Mapping
Mapping Protein Modifications

18
Proteomics Analysis
Reference www.mbi.osu.edu/sciprograms/prfmaterial
s/vandre.ppt
19
2-D Gel Electrophoresis

The horizontal position tells us about the charge
of a protein, whereas the intensity of the gel
spot tells us about the amount of that protein in
the system.
Steps-
1. Prepare protein sample in solution
2. Separate proteins (in each dimension)
I. Based on pH
Using isoelectric focusing (IEF)
Using immobilized pH gradient (IPG) strips
II. Based on molecular weight (size)
Using gel electrophoresis
3. Stain proteins to enable visualization.

20
Introduction to the project

This project focuses on 2D gel electrophoretic
separation of proteins.
We analyzed few random spots from the 2D gels of
rat mammary tissue.
Statistical methods to find the variance in pI of
the same protein in different gels.
Analyzed the reasons for these differences.
Inferred the relationship between the
experimental values and the predicted values.

21
Images of the gels used in the project.
22
One of the gels with Protein Spots
23

The Gels we used were from an already done
experiment. 28 Random protein spots were selected
based on the their intensity from each of the
three gels.
Mass Spectrometry
Differentially expressed proteins identified by
image analysis were excised from 2D gels and
trypsin digested. The resulting peptide fragments
were analyzed on a MALDI mass spectrometer (MS).
The MALDI spectra displays a peptide
fingerprint of the protein using corresponding
peptide masses.

24
MALDI TOF MS
25

Proteins were identified by entering the masses
(ions from MALDI spectrum) of the peptides into
a peptide mapping database. Some examples of such
protein search engine are-
Mascot - very popular and also used in this
project
Sequest
Aldente
ProteinLynx
Phenyx

26
Image of a search data base
27
Results

We tabulated the result obtained from the
database internet search and the one we obtained
from the experiment.
We observed that the pI values as well as the
molecular weight were not same in all gels for
same protein.
The pI values of the three gels were quite
similar but they were different from the
predicted pI values.

In a 2D gel the position of protein spot can
change due to various reasons and because of
which the molecular weight and pI values may also
differ.

29
Graphical representation of pI values of three
gels
30
Graph showing the variance among the predicted pI
and observed pI
31
Observations

We saw that the difference between the pI values
of the three gels that is the experimental values
are not very different from each other.
So we can interpret that the difference due to
non biological reason is very less in the
experiment.
There were few protein spots for which internet
search revealed the same result as same protein
name. But our experiment gave different results
which can be because of different group (like
phosphate or sulphate) getting attached to it.
There can be other reasons for it too.

Average deviations between the three observed
proteins and the predicted pI values were
calculated as
(pI (gel 12_5)- pred. pI) (pI (gel 12_5)-
pred. pI) (pI (gel 12_5)- pred. pI) / 3
This gave the results shown in the next slide.
We obtained positive as well as negative values
for the deviations.

33
Average deviations between the three gels and the
predicted pI
34

We can interpret that the proteins were modified
more by negatively charged group such that there
pI values decreased.
The addition of one phosphate groups to serine,
threonine, and tyrosine residues typically
decreases their isoelectric points by 0.1 pH
unit.

35
Regression results

A statistical analysis test was performed to
determine which of the three gels were closest to
the predicted pI values. That is in which of the
three gels had the proteins being least modified.
The test was Clibration test. We prepared a
regression model for each gel. The inverse
regression equation used was
Predicted pI Observed pI from Gel Intercept
slope

36
Predicted pI values from the Calibration test and
internet database
37

The result we obtained showed us that all the
three gels predicted almost same pI values and
they were quite away from the original predicted
pI values.
All these similarities between the three gels
show us that the difference between the pI values
of proteins between the predicted and the
experimented values is not very much because of
non biological factors, but because of chemical
modifications in the proteins.

38
Coexpression analysis of dimerization between
bZIP proteins in groups C, S1 and S2 in
Arabidopsis Thaliana, under the conditions of
differential light and CO2 levels.
39
IntroductionTranscription factor

Transcription factor are proteins involved in the
regulation of gene expression, that bind to
promoter region upstream of genes.
They are composed of two essential functional
regions
DNA binding domain It binds to DNA.
Activator Domain It interacts with other
regulatory proteins there by affecting the
efficiency of DNA binding.

40
bZIP proteins

bZIP proteins are a class of transcription factor
which has leucine zipper motif consisting of a
periodic repetition of a leucine residue at every
seventh position forming an alpha-helical
confirmation.
The segment that comprises the basic region and
the periodic array of leucine residues is
referred to as basic-region leucine zipper or
bZIP motif.

41
(No Transcript)
42
Some facts

There are 792 bZIP proteins recorded in
nonredundant database.
The no of bZIP proteins in the cell of selected
organisms are as follows
yeast 16
fruitfly 110
plant (Arabidopsis thaliana) 75
Human - 114

43
Arabidopsis

The Arabidopsis genome sequence contains 75
distinct members of the bZIP family, of which 50
of them are not well studied.
Using common domains the bZIP family can be
subdivided into 10 groups Groups A - S.

44
(No Transcript)
45
(No Transcript)
46
C S protein interaction

Elhert et al measured interactions between C and
S proteins.
C and S1 heterodimerized
Two S2 proteins dimerized.

47
Effect of Light CO2 on C S proteins

Carbohydrate signaling
Increase of carbohydrate partitioning in
elevated CO2, and a decrease in low light.
Seed development
Photosensory system detects the quality,
quantity, direction and duration of light.
Controls developmental pattern.
Stress
Light dependent generation of active oxygen
species is a type of stress called photo
oxidative stress.

48
Experiment Selection Criteria

a) Chose C and S bZIP proteins
Coexpression Engine http//www.ssg.uab.edu/coexpr
ession
b) Selected tissue and array type
c) Chose specific experiment

49
a) Chose C and S bZIP proteins
50
b) Selected tissue and array type
51
c) Chose specific experimentNASC Experiments
52
Justification

Biologically feasible comparisons due to similar
Tissue types
Experiment conditions
Statistical
Measurement protocol

53
The tool used

Co-expression Analysis Tool, version 2.0
developed at the Section on Statistical Genetics,
UAB http//obiwan.ssg.uab.edu8080/coexpression/se
rvlets/CoexpReleasesResponseManager
mainly built to analyze the co-expression in
Arabidopsis plant.
NASC Experiments to study affymetrix gene chip
profiling of light and CO2 effect in leaf
development in Arabidopsis used.

Uses the database built from Nottingham
Arabidopsis Stock Center (NASC) AffyWatch
Service.
Version 2 used in this project contains total of
566 microarray chips out of which 486 ATH1 micro
array chips were used.

55
NASC Experiments used

4 experiments conducted to examine the effect of
developing leaf insertions under varying
conditions of light and CO2.
The sampling was done at time interval of 0th,
2nd, 4th, 12th, 24th, 48th and 96th hour using a
batch of 24 plants.
Four replicates were produced for each of the
seven time points per experiment.

56
Working of the tool

Linear regression analysis is done on the probe
sets.
Result of regression gives three important
values- slope parameter (indicating the direction
of co-expression), p-value (stating the
confidence in the correlation) and R squared
values (strength of correlation).

57
Procedure

4 genes of C group, 5 genes of S1 group and 3
genes of S2 group were studied in the project.
We submit the AGI IDs, the tissue type (here
leaf) and the experiment number (in our case 156,
157 158 and 159) in the tool.
Our genes of interest are regressed on all the
22,810 ATH1 probe sets and a p-value, R squared
value and slope parameter is obtained.

Those genes were subsequently sorted according to
the R squared value and p-value and ranked such
that
Higher the R squared value, higher is the
rank.
An arbitrary cut-off 15 of the top ranked genes
were identified as highly co-expressed.

59
Hypothesis

Genes coding for dimerizing proteins should be
coexpressed at the same time.
If genes in group C and S1 lead to
heterodimerization then they should be
coexpressed at the same time.

60
Table 2 Mapping information between AtbZIP AGI
ATH Probeset AtbZIP Group Ids
61
Table 3 Regression estimates between Group C
AtbZIIP63 (245925_at) and Probes in Group S1, C
and S2.
62
Table 4 Regression estimates between Group C
AtbZIIP25 (251848_at) and Probes in Group S1, C
and S2.
63
Regression estimates between Group C AtbZIIP9
(246962_s_at) and Probes in Group S1, C and S2.
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
Results

bZIP1(Group S1) coexpresses well with bZIP63
(S1) under conditions of Ambient Co2 and low
light but the same coexpression interaction is
weak under conditions of Elevated Co2 and Ambient
Light.
Also, very minimal interaction was found between
genes of Group C (bZIP25, bZIP10, bZIP9, and
bZIP63) and bZIP9 (Group C

72
Conclusion

This bZIP study was a good litmus test for the
SSG Coexpression Tool.
Results presented in this study provide evidence
that a good if not significant number of AtbZIP
proteins interacting as heterodimers are
co-regulating under varying conditions of stress.
This study shows evidence that coexpression
patterns in genes can be studied by pooling
publicly available microarray data and that the
use of simple linear regression procedure is
feasible.

73
Discussion

Varying trends in the coexpression proposes some
theories
Different genes are expressed in diff tissues. Is
study on leaf good enough to support our
hypothesis?
Time-course data is valuable and should be
accounted for in the analysis. However, this kind
of analysis requires more observation recorded at
different timepoints.
Linear regression is good but will a robust
time-series based approach be appropriate in our
study?