Genome-Wide Mapping of in Vivo Protein-DNA Interactions - PowerPoint PPT Presentation

About This Presentation
Title:

Genome-Wide Mapping of in Vivo Protein-DNA Interactions

Description:

ChIP is a method to investigate protein-DNA interaction in vivo. The output of ChIP is enriched fragments of DNA that were bound by a particular protein. ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 19
Provided by: leoj2
Category:

less

Transcript and Presenter's Notes

Title: Genome-Wide Mapping of in Vivo Protein-DNA Interactions


1
Genome-Wide Mapping of in Vivo Protein-DNA
Interactions
  • Johnson et al (Science 2007)
  • Presented by Leo J. Lee

2
Outline
  • Background on ChIP based methods to study
    protein-DNA interactions
  • Salient features of ChIPSeq
  • Overview of the experimental protocol
  • Data analysis pipeline used in the paper
  • Important biological findings/contributions
  • General discussions

3
Protein-DNA interaction
  • DNA is the information carrier of almost all
    living organisms.
  • Protein is the major building block of life.
  • Interaction between DNA and protein play vital
    roles in the development and normal function of
    living organisms, and disease if something goes
    wrong.
  • An important mechanism of protein-DNA interaction
    is via direct binding, i.e., a protein binds to a
    particular fragment of the DNA.

4
Chromatin Immunoprecipitation (ChIP)
  • ChIP is a method to investigate protein-DNA
    interaction in vivo.
  • The output of ChIP is enriched fragments of DNA
    that were bound by a particular protein.
  • The identity of DNA fragments need to be further
    determined by a second method.

5
ChIP-chip (or ChIP-on-chip)
  • ChIP-chip uses microarray technology to determine
    the identity of DNA fragments produced by ChIP.
  • Typically a control sample (genomic DNA without
    going through ChIP) is used to properly define
    relative enrichment of specific sequences in the
    ChIP DNA.
  • It is the dominant high-throughput technique
    before the arrival of ChIPSeq.

6
ChIPSeq Workflow
ChIP
Size Selection(200-700bp for Exp 1 150-300bp
for Exp 2)
Solexa Sequencing
Mapping onto Genome
7
ChIPSeq vs. ChIP-chip
  • The experimental design of ChIPSeq is
    considerably simpler.
  • ChIPSeq typically can achieve higher genomic
    coverage than ChIP-chip (also depends on read
    length vs. probe length).
  • The data from ChIPSeq is arguably cleaner and
    easier to process.
  • Costs are comparable (?).

8
Nice things about NRSF (REST)
  • Considerable knowledge on NRSF has been
    accumulated from previous studies, which provides
    a set of true positives and negatives.
  • Yet there is still room to make new discoveries,
    as illustrated in the paper.
  • The DNA motif bound by NRSF (called NRSE) is long
    and well-specified.
  • There is a high-quality antibody that recognizes
    NRSF efficiently.

9
ChIPSeq Workflow
ChIP
Size Selection(200-700bp for Exp 1 150-300bp
for Exp 2)
Solexa Sequencing
Mapping onto Genome
10
Sequence Mapping Filtering
  • Only sequence reads mapped to a unique position
    on the human genome are kept (about 50).
  • Two mismatches were allowed to accommodate
    polymorphism (and sequencing error).
  • The resulting sequence read distributions are
    processed by a peak locator algorithm to find the
    local concentration of sequence hits and its
    peak.
  • A minimum five fold enrichment over the control
    sampled is required.

11
ChIPSeq Peak Locator Algorithm
  • Merge enriched regions within 500bp of one
    another.
  • Apply a triangular 5-point smoothing and identify
    the peak as the coordinate with the greatest
    number of overlapping reads.

12
Selecting a read count threshold
  • A ROC curve was obtained by analyzing true
    positives and negatives.
  • A sequence read threshold of 13 was selected to
    reach 98 specificity and 87 sensitivity.

13
Precision of ChIPSeq
  • Evaluated against the center of high-scoring
    canonical NRSE motifs.
  • 94 of these strong motifs fall within 50bp of
    the called experimental peak.

14
Comprehensiveness of ChIPSeq
  • Virtually all strong canonical NRSE motif
    instances are detectably occupied.
  • Most of the sites previously studies by
    transfection analysis are also detected.

15
Motif Visualization
16
Motif Discovery
  • Two new kinds of motifs are discovered
  • A noncanonical motif with variable spacing
    between the left and right half sites of the
    canonical motif
  • Half-site motifs
  • The enrichment of both kinds of motifs are highly
    statistically significant.
  • The authors are able to tell a nice evolutionary
    story about them.

17
GO enrichment analysis
  • As expected, NRSF-bound loci are highly enriched
    in gene ontology (GO) terms related to neurons
    and their development.
  • A group of genes encoding transcription factors
    that are critical in driving islet cell
    development in pancreas are newly discovered.
  • Sequence counts for this group are modest but
    comfortably above the threshold of 13.
  • The authors are able to provide strong arguments
    on the significance of this discovery.

18
Discussions
What makes this a Science paper?
Write a Comment
User Comments (0)
About PowerShow.com