Algorithms in Bioinformatics: A Practical Introduction - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Algorithms in Bioinformatics: A Practical Introduction

Description:

Input (I) From every peak, we get approximately +/-200 DNA sequence cmyc_1_chr1_4842133_4842148_range_chr1_4841934_4842348_intensity_20 ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 12
Provided by: Sung149
Category:

less

Transcript and Presenter's Notes

Title: Algorithms in Bioinformatics: A Practical Introduction


1
Algorithms in Bioinformatics A Practical
Introduction
  • Project
  • Motif finding using ChIP-seq peak data

2
Transcriptional Control (I)
3
Transcriptional Control (II)
TATAAT is the motif!
4
Motif model
TTGACA TCGACA TTGACA TTGAAA ATGACA TTGACA GTGACA T
TGACT TTGACC TTGACA
Consensus Pattern
TTGACA
Positional Weight Matrix (PWM)
  • Motif can be described in two ways based on the
    binding sites discovered

5
ChIP experiment
  • Chromatin immunoprecipitation experiment
  • Detect the interaction between protein
    (transcription factor) and DNA.

6
Peak data
  • Peak data represents the locations where a
    particular TF binding.
  • The data tells us the locations and intensities.
  • (Note that due to experimental error, peaks of
    low intensity may be noise.)

ChIP-seq data for Human (MCF7) E2 treatment at
45min
chr1883,686-958,485
7
Our aim
  • Given the DNA sequences of those peaks, find
    motifs which occur in those peak regions.
  • For the example below, we have two motifs TTGACA
    and GCATC.
  • Note that each instance has at most 1 mutation.

GCACGCGGTATCGTTAGCTTGACAATGAAGAATCCCCCCGCTCGACAGT
GCATACTTTGACACTGACTTCGCTTCTTTAATGTTTAATGAAACATGCG
CCCTCTGGAAATTAGTGCGGCATCTCACAACCCGAGGAATGACCAAATG
GTATTGAAAGTAAGGCAACGGTGATCCCCATGACACCAAAGATGCTAAG
CAACGCTCAGGCAACGTTGACAGGTGACACGTTGACTGCGGCCTCCTGC
GTCTCTTGACCGCTTAATCCTAAAGGCCTCCTATTAGTATCCGCAATGT
GAACAGGAGCGCGAGCCATCAATTGAAGCGAAGTTGACACCTAATAACT
8
Input (I)
  • From every peak, we get approximately /-200 DNA
    sequence
  • gtcmyc_1_chr1_4842133_4842148_range_chr1_4841934_48
    42348_intensity_20
  • CCTCCATACCAGCCCCAATGTTCTGCGTTCCCGAATGAAAGACACACAAC
    ACAGCCTTTATATTTTGATATGCCTAAAACTGCTCAATGGCTGGGCCACT
    TCCTAGCTAGTATCCACGTGGCTATCCCACCTCTCTCTGATATTCCCAAG
    TCATTACTTACTAAAATCTGTAATTACATCTTTGCTGCCCTAGGCCCAAT
    CTGGCAGCCCTCCTGTGGCCCCTCAGGCTACTACATGGCAGCTAAGCTCT
    CTGACCCACATCTTCTCAGGCACCGTGCCTCCTCTTCTCCACCTTATTCA
    AACATGGTGGCTCTCCTTCCTCCTTCTTCCTGTCTGTCCCCAGCCTGGGA
    ATTCTAAAAGTCCCACCTCTGTCTGCCCTGTTCAGCCATTGGCTGTCGGC
    ATCTTTATTTACGAG
  • gtcmyc_2_chr1_5073201_5073215_range_chr1_5073002_50
    73415_intensity_15
  • GGTCATAAACCAAGCTTCTTCAAAGATTTTTGGCTTTTTGGCACCAGTGG
    CCTGCAGGGTGGCGAGCTCTGCCAGTTTGAAGTGACCAAGTTAAGTGGCC
    TGGGAAAGGCCATTTGGTGCGCGGTCCAGCAGTTTTGGGCGCTCTCGGCT
    TCCGCCCTCAGCTGCGGTCACGTGCGGCTGCTCACGTGCCAGACGCTGCT
    GTCACTTCGTAGCTGTTCCGGCTTCCTCTGAGTGAGGCTCGCAACGTCTC
    CCACGGAGTCGCCTTCGTTCTGCTCTGGGTCTCCCGTGGCCACTGAGACC
    TCGGAGCTCGACCGGCGCCTGCCCGCCCGTGCGGCCCTCACTCCCCGAGG
    CTATCCAGGTGAGGCCGCCTGGGGTCCCTCCCCGGCTCCGGAGAGCCGAC
    TGGTTTCCCTGCCG
  • gtcmyc_3_chr1_9530642_9530652_range_chr1_9530443_95
    30852_intensity_36
  • GTAGTCCCAACCAGGTCCTGAGCTGGTTAGCCAACCCTCAGCGCCAGTCG
    GGCCAACATCCGGTGACGAATCCAAGTCCCGCCTCTAAGCCCATCTGCTG
    TCCAATGCCGCCCTCTGCCGGTCTTTACCTCCCCGCCTAGCTGTGAGCCG
    CTTCCAGACAACCCGGAAGTGATCTTTCCTCTTCCGGATTACGGGTCCGG
    ACGTCCGCACGTGGTTGCCGGTTTAGGGTGCTGCTGTAGTGGCGATACGT
    CCCGCCGCTGTCCCGAAGTGAGGGATCCGAGCCGCAGCGAGAGCCATGGA
    GGGCCAGCGCGTGGAGGAGCTGCTGGCCAAGGCAGAGCAGGAGGAGGCGG
    AGAAGCTGCAGCGCATCACGGTGCACAAGGAGCTGGAGCTGGAGTTCGAC
    CTGGGCAACC

9
Input (II)
  • A set of sequences which are likely containing no
    motif.
  • gtSEQ_1
  • AACAAGGGAAAGAGTAGTGAGTGCTTCTTTCTATTCAGAGGGAGGGGAAG
    TTGCTGTTAGCTAAGACAGTCAGGACTGAGAAGGGGGGGGGGGGTTTAAC
    TCTCCTGGAGGGAGCTGAGAGGTAAAGGGAGGGGCGTGAGGTAGAACAAG
    CCGAGAACACAGGGCAGGTTGGTCTGACTCCAGAGCACAGTGCAGGAGCC
    CGGAAGTTGACTCAGTTCAGTTAGCAAGTATTTTCACACAAGGCGTGAAC
    ACTGAAGACAAAAGCAAGAGACACAGCTCTATCTCTAAGAAGATTTTCAG
    AGCCAAGATCGATGGGGCACACCTGTTAATCCCAGCACTTAGGAGGCTGA
    GGCAGGAGGATCCCAAGTTCAAAACCAGCCTGGACTTGTTTTAAGGAAAA
  • gtSEQ_2
  • AAAAAAAAAAAAAAGACTTCCAGTTTAATAAATGACCAATTCAGGAATGG
    AGATTAGGGCTGGATGACAAGTTTTTAATTGTCAAGGACTCAATTCTGTT
    TATCAGTTGGTATGGAATTATGTAAGCTTTTAGCGATATGACCGCACGGA
    GCAGTGTAGAGAGTGATCTGAGAGACGCTTGGGGGTCAGGATGGAGATAG
    AACTCCCTCTCTATTAGAAGGTGTTTGGTGGTAGGTAACCCTGGGCTAGC
    ATGGTGGGTCTCTTCTTACTTAGGCTTCCATCTTTGTGGTTCAAATCCAA
    GAAGGACCTGCGTTCCCTCCCTCCTTGTGATCAGCTGATTGCTAGAGCAT
    AACTCATCTTAACTTCTCATGTACTCTCCGGGTACAGGAAGGGAGGGGGC
  • gtSEQ_3
  • CCACTGCTGACAGTGGAGCATGAAACGACCGGCTTCCTGACTATGTTGGT
    ACCCTTTCAGGAGCCTAAAACAGTGCTTTCAATACTTGTGTCTATGTCTG
    TTAGCCACAACTTTCTAGTTTCCCAGAGAGATTTTGAAGTGTAGTTTTGT
    ATTTGCTCAAATATATATTCATATGGTGAGGTGCACATTTTTTATATTAT
    ATTTTTATTCATTTATTTTTGGTGCTTGGGAATTATACTCTAGGAATAAA
    GCGCCTGGTAGAAAGTGGCACACATCTTTAATCCCAGCACTCAGGAAGCA
    GAGGCAGACAAATCTCTGCGTTCCAGGACAGCCTGGTCTATAGAGCAAGG
    TCCAAGCCAGCCAGGTTTACACAAAGAAACCTAGTGTGGAAAAGACAAAA

10
Output
  • You need to output a list of candidate (ranked)
    motifs.
  • You can model the motif as PWM or consensus
    sequence.
  • If you model the motif as a PWM, one of the
    answer for the previous dataset is
  • You may also return other significant motifs.

11
Aim of the project
  • Given a sample file and a background file,
  • you need to implement a method which output a
    list of motifs.
  • You need to take advantage of the fact that this
    is a ChIP-seq dataset
  • Hint Read papers on ChIP-seq and understand its
    properties.
Write a Comment
User Comments (0)
About PowerShow.com