Automated cisRegulatory Annotation of genomes - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Automated cisRegulatory Annotation of genomes

Description:

... of the body plan of the sea urchin embryo Davidson et al., Science, 295(5560):1669-1678. ... Biological processes, including development, are coordinated ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 20
Provided by: saurabh7
Category:

less

Transcript and Presenter's Notes

Title: Automated cisRegulatory Annotation of genomes


1
Automated cis-Regulatory Annotation of genomes
  • Saurabh Sinha
  • Dept. of Computer Science,
  • UIUC.

2
Automated genome annotation
  • Routine steps today
  • Gene prediction
  • Orthology maps
  • Gene functions

3
Genes are not the whole story
Genetic regulatory network controlling the
development of the body plan of the sea urchin
embryo Davidson et al., Science,
295(5560)1669-1678.
4
Gene Regulation
  • Biological processes, including development, are
    coordinated by spatio-temporal interactions among
    genes
  • Some genes (transcription factors) regulate the
    expression of other genes
  • GENE REGULATORY NETWORK (GRN)
  • GRN is a key substrate of evolution
  • morphological diversity arises out of evolution
    tinkering with GRN

5
Annotating cis-Regulatory elements
  • Goal is to unravel the GRN for some biological
    process, in some species
  • Each edge of GRN is of the form transcription
    factor X regulates gene Y
  • Many other possible forms, this is the most
    well-studied
  • Molecular implementation of this edge binding
    site for the transcription factor, located near
    the gene
  • Sub-goal Find these binding sites in the genome

6
Preview
  • Different bioinformatics methods to annotate
    genomic footprints of cis-regulatory interactions
  • Different methods differ in terms of
  • Input Data e.g., single species or multiple
    species known motifs or unknown motifs etc.
  • Precise goal Find actual binding sites Find
    clusters of binding sites Find target genes
    etc.

7
(1) Annotation of transcription factor binding
sites (TFBS)
  • Output Predicted binding sites (10 bp long) of
    a given TF
  • Input Prior characterization of binding site
    affinity of a transcription factor motifs
  • From high throughput techniques (e.g., Chromatin
    Immunoprecipation, Bacterial 1 Hybrid, etc.)
  • Input multiple, closely related genomes

8
Examples
  • Harbison et al. (Nature 2004) on yeast
  • gt 3000 binding sites for over 100 TFs
  • Stark et al. (Genome Res. 2007) on Drosophila
  • gt 46000 regulatory interactions

Binding site prediction
9
(2) Genome-wide prediction of motifs
  • If the binding specificity of a TF is not known
    from experiments, it can be computationally
    predicted
  • Output A set of motifs that probably represent
    TF binding affinities
  • Input multiple, closely related genomes. Some
    methods attempt to find motifs from single
    species alone.

10
Examples
  • Kellis et al. (Nature 2003) on yeast
  • 72 motifs identified.
  • Xie et al. (Nature 2005) on human
  • 174 motifs identified
  • Out of how many ? Perhaps 2500, but not all have
    distinct motifs
  • Down et al. (PLoS CB, 2006) on Drosophila
  • 120 motifs identified (out of 700)

11
(3) Prediction of clusters of TFBS
  • In many cases, binding sites do not work alone
  • Clusters of binding sites, of the same or
    different TFs, together mediate a particular
    expression pattern
  • Such clusters are called cis-regulatory modules
    (CRMs). Typically, CRM prediction is more
    accurate than individual TFBS prediction
  • Output Annotation of CRMs (1000 bp long)
    involved in a certain biological process. Expect
    1-3 per gene.
  • Input Set of motifs involved in a biological
    process
  • Input (Optional) multiple, closely related
    genomes

12
Examples
  • Blanchette et al. (Genome Res. 2006)
  • human vs. mouse comparison. gt 100000 modules
    predicted.
  • motifs taken from large database (TRANSFAC)
  • Noyes et al. (Unpublished, 2007)
  • multiple Drosophila species comparison
  • motifs taken from high throughput assay (B1H)
  • Based on our earlier work (ISMB 2003, ISMB 2006)
  • Smith et al. (Mol. Sys. Biol. 2007)
  • multiple mammalian species
  • motifs taken from large databases
  • tissue-specific gene expression data

13
(4) Prediction of TF-gene interactions
  • Individual TFBS prediction may suffer from high
    false positive rate
  • May be more practical to only predict TF X
    targets Gene Y
  • Output Pairs of (TF, Gene) regulatory
    interactions
  • Input Motifs of known TFs
  • Input (Optional) multiple, closely related
    genomes

14
Examples
  • Sinha et al. (PNAS 2006)
  • Honeybee genome
  • Motifs from another insect (fruitfly)
  • Sinha et al. (Genome Res. 2007)
  • Human genome
  • Motifs from TRANSFAC
  • Penacchhio et al. (Genome Res. 2007)
  • Human genome
  • Multiple mammalian species analyzed

15
(5) Annotation of miRNA targets
  • Transcription factor binding to DNA is not the
    only mode of gene regulation
  • MicroRNAs binding to 3 UTRs of mRNA is another
    major mode of gene regulation
  • Output (miRNA, Gene) interactions
  • Input miRNA sequence
  • Input multiple, closely related genomes

16
Examples
  • Krek et al. (Nature Genetics 2005.)
  • human genome
  • Lewis et al. (Cell 2005)
  • human genome
  • Grun et al. (PLoS CB, 2005)
  • Drosophila genome

17
(6) CRM Prediction without known motifs
  • More on the research side
  • Output CRMs in a genome
  • Input Set of functionally related CRMs in the
    same genome
  • Current estimates (unpublished data)
  • 50 specificity in Drosophila
  • Joint work with Gene Robinson, preliminary work
    in ISMB 2007.

18
(7) CRM Prediction across evolutionary gaps
  • Output CRMs in a new genome
  • Input Orthologous CRMs in a different genome
  • Assumption Alignment not available as a guide
  • E.g., Knowledge of CRMs in fruitfly, export
    this knowledge to the honeybee or wasp genome
  • Ongoing work joint work with Gene Robinson.

19
Summary
  • Already achievable Genome-wide prediction of
  • binding sites
  • binding site affinities (motifs)
  • Cis-regulatory modules (CRMs)
  • TF-gene interactions
  • miRNA targets
  • Issue Accuracy not very high but still very
    useful !
  • One direction for the future of automatic
    annotation
  • Exporting annotation from one species to
    another, without the aid of alignments
Write a Comment
User Comments (0)
About PowerShow.com