Title: Conservation%20of%20transcription%20factor%20binding%20sites%20in%20plants
1Conservation of transcription factor binding
sites in plants
Elisabeth Wischnitzki, Klaas Vandepoele, Yves Van
de Peer Bioinformatics Evolutionary Genomics,
Department of Plant Systems Biology, Flanders
Interuniversity Institute for Biotechnology
(VIB), Ghent University, Technologiepark 927,
B-9052 Zwijnaarde, Belgium, E-mail
elwis_at_psb.vib-ugent.be
Introduction
Transcription factors regulate the transcription
of genes by binding to certain specific binding
sites in their promoter. These binding sites are
generally small motifs composed of 6-12
nucleotides on average. When performing a genome
wide search, its small size and degenerated
nature can lead to a high rate of false positive
instances. In order to control false positives
the individual instances have to be evaluated
further to identify the most significant
candidates. This work focuses on the evolutionary
conservation as a validation criterion. The
growing number of sequenced genomes gives the
unique opportunity to study the evolutionary
conservation of motifs over different species.
This approach is based on the assumption that a
functional binding site is more likely to be
conserved than a false positive motif instance. A
motif found in promoter sequences conserved over
orthologous genes is therefore more likely to be
a functional binding site than a non-conserved
instance.
Method Data sets
Results
Evolutionary conservation To estimate the
evolutionary conservation the candidate motifs
are evaluated using promoter sequences from
orthologous genes. The number of motif instances
found in this set is compared to a background
distribution integrating information of several
genomes. Using this approach a p-value can be
assigned and the significance of the result can
be estimated. The background distribution for
each motif and orthologous set is built in an
iterative process. Similar sets of genes are
randomly picked from the included genomes,
maintaining the composition of the input set. The
number of genes showing a motif instances is
counted in each set and used to build the
background distribution.
The test sets were analyzed for evolutionary
conservation of the putative TGA2 binding sites
in the 1kb promoter region. To account for the
evolutionary distance between the species
included in the orthologous groups the binding
site was analyzed additionally allowing one
substitution at a fixed position within one
orthologous group. The target set, containing
the ChIP confirmed targets, shows a significant
conservation for a high number of genes, whereas
the negative set showed a much lower amount of
conserved motif instances. The high sensitivity
and specificity of the approach indicate that our
method is able to detect functional binding sites
and distinguished them from false
positives. Additionally the test for a bias
within the promoter sequences was negative for
all the tested instances.
Motif Target set identified by ChIP-chip Target set identified by ChIP-chip Target set identified by ChIP-chip Target set identified by ChIP-chip Target set identified by ChIP-chip Negative sets with the particular motif Negative sets with the particular motif Negative sets with the particular motif Negative sets with the particular motif Negative sets with the particular motif Sensitivity Specificity
Motif Instances Conserved Conserved Conserved with fixed mutation Conserved with fixed mutation Instances Conserved Conserved Conserved with fixed mutation Conserved with fixed mutation Sensitivity Specificity
TGACG 23 3 13 11 48 100 1 1 12 12 48 88
TGACGT 19 3 16 12 63 100 8 8 37 37 63 63
TGACGTCA 16 3 19 13 81 100 8 8 52 52 81 48
TGACGTCATC 7 1 14 5 71 17 0 0 1 6 71 94
Promoter bias Additionally the motifs are tested
for a possible positional bias within the
examined promoter regions (e.g. GC rich monocot
sequences). The significant motifs will be tested
for such a bias by shuffling the promoter
sequences. This will maintain the nucleotide
frequencies. An iterative procedure of promoter
shuffling and motif detection will yield an
estimation of how the nucleotide composition
affects the motif detection results.
The still high number of conserved instances in
the negative sets is due to different reasons.
The motifs are very similar to other known
binding sites and allowing some degeneration
within the motif might have picked up another
conserved binding site. Additionally also other
transcription factors use this motif. The
conserved binding site might therefore be a
functional binding site but not for the
particular transcription factor used in the ChIP
experiment.
Conclusion
- Data Sets
- Motifs AGRIS, PLACE, additional motifs from
literature - Positive set
- 42 TGA2 transcription factor targets
- identified in ChIP-chip experiment orthologous
genes available - Negative sets
- Up to 100 genes with a motif instance which were
not bound in the ChIP-chip experiment - Known binding sites for TGA2
- TGACG TGACGTCA
- TGACGT TGACGTCATC
The results of this work indicate our approach
based on comparative genomics data to estimate
evolutionary conservation is a powerful method to
identify functional binding sites and to
distinguish them from false positives motif
instances. Our approach is also applicable to a
variety of datasets and scientific tasks. This
makes it a very powerful tool to detect conserved
and potentially functional transcription factor
binding sites.
Citations Thibaut-Nissen et al., Development of
Arabidopsis whole-genome microarrays and their
application to the discovery of binding sites for
the TGA2 transcription factor in salicylic
acid-treated plants. Plant J. 2006
Jul47(1)152-62.