Day 5-2 - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Day 5-2

Description:

Agilent. Custom oligonucleotide arrays. Nimblegen Systems, Inc. ... Agilent eArray probe database 21 million tiled CGH and ChIP-on-chip probes. Do it yourself ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 23
Provided by: dbrm
Category:
Tags: agilent | day | meme

less

Transcript and Presenter's Notes

Title: Day 5-2


1
Day 5-2
What bioinformatics tools can be used for
analysing ChIP data?
What bioinformatics tools can be used for
analysing ChIP data?
2
After this seminar
  • You should be able to
  • Understand the differences between CHip-chip and
    CHip-Seq and identify key decision making steps
    for choosing a platform
  • Identify bioinformatics steps needed for handling
    CHip-chip and Chip-Seq datasets
  • Understand underlying data from genome tiling
    arrays
  • Understand how to search for binding sites in
    genomic data
  • Understand the need for skills in handling large
    datasets

3
General problem
  • Find accessible regions of DNA that are bound to
    your protein.
  • What method is best?
  • What sort of bioinformatics skills are required?
  • What is real signal and what is noise?
  • What do we do with the regions once you have
    identified them?

Zheng, M. et al. (2007) ChIP-chip data, model,
and analysis. Biometrics, Vol 63, 787-796.
4
Experimental methods give different types of data
  • ChIP-chip
  • microarray data defining genomic regions
  • probe (with position usually defined)
    expression
  • ChIP-Seq
  • high throughput DNA sequence
  • ACGATGTCA sequence fragments (from
    Solexa/SOLID/454)?
  • sequence position undefined (search required)?
  • The same issues exist for microarray vs. deep
    sequencing in gene expression experiments
  • coverage
  • cost
  • practicality?

5
Raw (sequence) data
  • Flat files, processed from base-calls to fasta
    format
  • Solexa
  • 25-30 bp reads
  • Barcode is used to pool samples in one sequence
    run
  • ACGT Expt1
  • TGAC Expt2
  • ACGTSequence
  • TGACSequence

6
Choice of experiment
  • Choice of experiment depends on the focus you
    require
  • Whole genome broad coverage (of known genome)?
  • or focused genomic region?
  • or discovery based (known or unknown genome)?
  • How much coverage do you need?
  • Fewer broad experiments vs. many focused
    experiments?
  • Custom chips can be easily designed for focused
    regions and custom applications.

7
Chip- Workflow
CHip-chip
CHip-seq
  • Select antibody
  • Decide how deep to sequence ( vs. coverage)?
  • Sequence fragments
  • Map Sequence to genomic position (BLAST/BLAT)?
  • Identify peaks from data and minimise false
    positives
  • Analyse peaks to predict binding sites
  • Select antibody
  • Select chip or design and select probes
  • Map Array probes to genomic positon (BLAST/BLAT
    or lookup table from chip supplier)?
  • Identify peaks from data and minimise false
    positives
  • Analyse peaks to predict binding sites

8
Chip- Ringo Workflow example
9
Chip- output
  • Peaks on the genome
  • Score for each genomic position

BMC Bioinformatics 2007, 8219
10
Antibody selection
  • Success depends on your antibody
  • Select antibodies that are suitable for CHip-chip
    experiments
  • Only a small number so far!
  • List available from
  • http//www.chiponchip.org/antibody.html

11
Microarray companies
  • DNA microarrays suitable for ChIP-chip assays
  • Affymetrix
  • Human Chr2122 tiling microarrays
    (oligonucleotide arrays)?
  • Human ENCODE tiling arrays (oligonucleotide
    arrays)
  • Agilent
  • Custom oligonucleotide arrays
  • Nimblegen Systems, Inc.
  • Human promoter microarrays
  • Human ENCODE microarrays
  • Custom oligonucleotide arrays
  • Aviva Systems Biology
  • Hu5K promoter arrays (PCR product arrays)?
  • Hu20K promoter arrays (Oligo arrays)

12
Probe Design
  • Tiling
  • high-resolution arrays
  • target genomic regions of interest
  • whole genome or specific targeted regions?
  • Agilent eArray probe database
  • gt21 million tiled CGH and ChIP-on-chip probes
  • Do it yourself
  • unassembled genomes, etc...?

13
Mapping to genome
  • The genome is still not constant, especially for
    many organisms
  • You must map the probe/sequence to genomic
    location using
  • standard alignment software (BLAST/BLAT/vmatch/...
    )?
  • or rely on datafiles from the vendor (reccomended
    for most cases)?
  • R packages exist for annotating probes to genomic
    location

14
Mapping to genome
  • For sequence based methods this step is critical
    (and slow)?
  • need unix server to run (or VMware)
  • Do I need access to a computing cluster?
  • choice of parameters for short sequences
  • Filter raw sequences -gt representative sequence
    set
  • Do I need to pre-filter data (some seqs will
    account for most of the compute time)?
  • must be aware of speed vs. specificity for large
    datasets

Genome
15
Normalisation
  • A normalization procedure
  • (a) The MA plot before normalization shows a need
    for rotation to correct dye-bias.
  • (b) To determine the correct angle of rotation,
    the s(M) vs s(A) plot of the differences between
    probes is generated This circumvents the effect
    of binding signal in determining the rotating
    angle for original MA plot in (a).
  • (c) The MA plot after rotation by the angle
    determined in (b). The green line is the fitting
    line after rotation.
  • (d) The MA plot after normalization..
  • BMC Bioinformatics. 2007 8 219.

MA plot is a scatterplot with transformed axes.
The X-axis represents the average log intensity
from 2 channels while Y-axis represents the
log-ratios.
16
Peak detection
  • What regions of DNA contain signal peaks?
  • How to define a statistically significant peak?

Zheng, M. et al. (2007) ChIP-chip data, model,
and analysis. Biometrics, Vol 63, 787-796.
17
Normalisation
  • Before normalization
  • the mock control appears to show the same
    differential enrichment between genic and
    intergenic regions as the histone occupancy,
    suggesting that the differential enrichment may
    be an artifact.
  • After normalization
  • the mock control no longer shows significant
    differential enrichment while H3 and H4 profiles
    still do
  • Peng et al. BMC Bioinformatics 2007 8219
    doi10.1186/1471-2105-8-219

18
Noise
  • Contamination
  • Do sequences match the expected genome?
  • Sequencing errors
  • Can you determine where a sequencing error is?
  • Multiple-mapping sequences
  • Many sequences do not unique genome matches
  • Dye specific bias
  • ChIP-chip data for chromatin-associated proteins
    and histone modifications present additional
    challenges
  • as they often display broad regions of
    enrichment. This is in contrast to the isolated
    and sharp peaks that are typical for the binding
    of transcription factors.

19
Peak detection - replicates
  • Use replicates to improve detection
  • Peaks that are consistent between replications
    are more likely to be true

Zheng, M. et al. (2007) ChIP-chip data, model,
and analysis. Biometrics, Vol 63, 787-796.
20
What next?
  • Given that you've identified accessible regions
    in the genome
  • What information can be gathered from this
    sequence?
  • Use discovery methods to look for common patterns
    in the regions
  • MEME, etc
  • Use TFBS databases to look for known
    transcription factor binding sites in the
    sequence
  • Transfac
  • High coverage
  • Noisy database
  • Jaspar
  • Low coverage
  • Higher quality?

21
R packages for chip-chip
  • Ringo
  • Well documented workflow and good tutorial
  • BAC
  • Perfect example of minimal documentation
  • Bayesian Analysis of ChIP-chip data

22
Summary
  • You should be able to
  • Understand the differences between CHip-chip and
    CHip-Seq and identify key decision making steps
    for choosing a platform
  • Identify bioinformatics requirements for handling
    CHip-chip and Chip-Seq datasets
  • Find transcription factor binding sites in
    genomic data
  • Understand the need for skills in handling large
    datasets
Write a Comment
User Comments (0)
About PowerShow.com