Essential Transcriptomics: Understanding the New TRANSFAC Professional Database of Transcription Fac - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Essential Transcriptomics: Understanding the New TRANSFAC Professional Database of Transcription Fac

Description:

Orthology data. Gene families. Homology data. Related proteins. Classified by species ... binding observed in species Y ('orthology-based transitive assignment' ... – PowerPoint PPT presentation

Number of Views:178
Avg rating:3.0/5.0
Slides: 42
Provided by: laneSt
Category:

less

Transcript and Presenter's Notes

Title: Essential Transcriptomics: Understanding the New TRANSFAC Professional Database of Transcription Fac


1
Essential Transcriptomics Understanding the New
TRANSFAC Professional Database of Transcription
Factors
  • Yannick Pouliot, PhD
  • Bioresearch Informationist
  • Lane Medical Library Knowledge Management
    Center
  • lanebioresearch_at_stanford.edu
  • 11/12/2008

2
The Bioresearch Informationist At Your Service
  • Yannick Pouliot, PhD, Lane Medical Library
    Knowledge Management Center
  • Bioresearch Informationist computational
    biologist in residence
  • Role Support laboratory researchers regarding
    biocomputational resources and their use
  • Contact lanebioresearch_at_stanford.edu

3
Contents
  • What has been licensed
  • Access
  • TRANSFAC database contents
  • Components
  • Contents
  • Data types
  • Example computational biology applications of
    TRANSFAC data and tools

And please Dont get hung up - ask questions!
4
Part I What is provided and how to access it
5
BKL What Is Available
  • BKL components licensed by Stanford
  • Proteome
  • TRANSFAC Pro (commercial version transcription
    factor database)
  • Parts of ExPlain
  • MATCH, CATCH tools TF binding site search tools
    integrated with TRANSFAC
  • Whats not licensed
  • Full-version of ExPlain
  • Human Gene Mutation Database
  • BRENDA (commercial version)

6
So What Is BKL?
  • BKL is a knowledge base of curated biomedical
    data extracted from selected primary literature
    sources
  • Knowledge base ? ordinary database
  • BKL useful for
  • Querying the literature in a robust manner
  • particularly for visualizing complex
    information
  • Analyzing your experiment data in the context of
    what is known (biological significance)
  • Used following data acquisition, clean-up and
    statistical analysis

7
BKLs Data Curation Enables Robust Querying
  • BKL provides much of its value by applying
    rigorous curation and indexing of the data it
    provides
  • Systematic extraction of information from a
    source (more later)
  • Encoding the information ? enforcing structure ?
    making it truly usable
  • Compensating for weaknesses in source data
  • E.g, applying controlled vocabulary to compensate
    for original text
  • NCBI and NLM provide somewhat similar curation,
    but significantly less so
  • Closest equivalent system Ingenuity Pathways
    Analysis

8
BKL Content Sources
  • Data extracted from
  • Primary scientific literature
  • Expert curation applied
  • BKL DB updated weekly
  • Public databases
  • GO, OMIM, Ensembl, etc, etc.

9
BKL Data Types
  • Protein physical properties
  • Sequence, isoelectric point, molecular mass,
    transmembrane domain(s), structure, protein
    domains, alternative splice forms
  • Gene ontology classification
  • GO Molecular Function/Biological Process/Cellular
    Component
  • Interaction data
  • Protein-protein
  • Protein complexes
  • Protein binding
  • Expression pattern
  • Organ/Tissue/Cell type/Tumor type
  • Orthology data
  • Gene families
  • Homology data
  • Related proteins
  • Classified by species
  • BLAST results available, starting with summary
    view
  • Disease association
  • Biomarker or therapeutic target
  • Disease mechanism

10
Species Covered by BKL
  • Homo sapiens
  • Rattus novergicus
  • Mus musculus
  • Caenorhabditis elegans
  • Yeast
  • Saccharomyces cerevisae
  • Saccharomyces pombe
  • Large number of pathogenic fungi
  • Overall, gt200 species
  • In short, the usual mammals C. elegans and fungi

11
Example Applications of BKL TF Data
  • Determining what gene regions bind a TF
  • Identifying
  • The expression pattern of a TF
  • The consequence of TF binding (activation,
    inhibition)
  • Genes known to be regulated by a TF
  • Obtaining the consensus DNA sequence that binds a
    TF
  • TRANSFAC provides a position-specific matrix that
    specifies possible nucleotide substitutions for
    each position in the consensus sequence
  • ? Can use a position-specific site matrix (PSSM)
    to perform sequence similarity searches using
    MATCH or other program
  • Identifying group of TFs that interact with each
    other to effect regulating gene transcription

12
More Use Cases Of BKL
  • Gaining a rapid understanding of a protein and
    its properties, e.g.
  • Interactions
  • Understanding what proteins interact with your
    protein
  • Expression
  • Determining where a protein is present or absent
  • Understanding how a protein is regulated
  • Comparative genomics
  • finding homologs, orthologs
  • Understanding large data sets from e.g.
  • Microarray gene expression results
  • Proteomics (mass spec data)
  • Inter-relating identifiers

13
New Developments Since Previous Version of
TRANSFAC Pro BKL
  • Very different interface
  • Proteome, TRANSFAC no longer exist as independent
    applications
  • Instead, they are now integrated together AND
    with other BIOBASE products
  • Much better visualizer
  • Though problematical
  • New additional application ExPlain
  • used for complex tasks that include MATCH, CATCH
    and others
  • Essentially a wizard

14
Accessing BKL
  • User-unlimited site license purchased by Lane
    Library
  • SUNet ID required
  • If you want your own environment, need to obtain
    user login (free)
  • Now fully browser-based
  • Nothing to install ? new
  • except Flash plugin (for visualizer) if you
    dont already have it
  • No known issues with different browsers
  • Modern browsers preferable (IE7, FF2)

15
Accessing BKL
lane.stanford.edu ? bioresearch
16
Part II Data Contents
17
Contents of TRANSFAC Professional 11.4
(12/14/2007)
861 TFs added since June 2007 (8.8 increase)
18
TRANSFAC Contents Data Types
  • Data on gt10,000 transcription factors and their
    properties
  • Genes that express transcription factors
  • Structural features of a transcriptionfactor
  • Expression pattern
  • ? TRANSFAC lists microarrays that include a TF
    gene
  • Regulatory networks (NEW now uses viewer)
  • Functional properties
  • Interacting factors
  • Position-specific matrices that can be used for
    similarity searching of DNA sequences that might
    bind a factor (e.g., using MEME and MAST).
  • CHIP-on-chip data

19
Magnitude of Transcription Factor Universe
Provided by TRANSFAC (Feb 2008)
20
Part III Searching and Understanding BKLs TF
Data
21
Options Relevant for TF Searching
  • Easiest Use Locus Report for your gene ?ties
    everything together especially useful for
    complex genes with e.g., 3 promoters (next slide
    for more)
  • Otherwise, for TFs the following options are
    relevant
  • Site gene sites that are bound by TFs or
    complexes of TFs
  • Promoter
  • Composite element minimal functional unit
    within which both protein-DNA and protein-protein
    interactions contribute to a highly specific
    pattern of transcriptional regulation
  • Matrix nucleotide distribution matrices for the
    binding sites of transcription factors
  • Functional region contains details about
    regulatory regions of a gene ? broader than
    Promoter, would include enhancers distal to
    promoter

22
Advanced Search Engine Site Searching
23
The Best Starting Point The Locus Report
Relevant to TFs
24
Expression Panel Comprehensive Expression Data
25
Regulatory and Binding Elementswithin Gene
Regulation panel
26
Understanding Binding Sites Regulatory Elements
27
Sometimes Messy Data What Is Going On Here?
28
TF Target Gene Binding and Regulation
Describes the protein binding to DNA and other
gene regulation activities attributed to the
protein
Note non-human source of ER
29
ChIP-Chip Data (click on a promoter within Gene
Regulation panel)
Problem what is the source of CHIP-ON-CHIP data?
30
TF Transcriptional Network(In Protein Binding
and Regulatory Activity)
  • Requires launching the viewer and requesting
    Regulates or Regulated By
  • Slow
  • May not return if too much data returned
  • User interface not great (what is that gene??)

Lists protein-protein interactions with OTHER TFs
ONLY
Regulated by
Regulates
31
Annotation Section (Overlap?)
Q Is there overlap between data in Annotations
and other sections of Locus Report (e.g.,
Expression)? A Unknown
32
Useful Identifiers and Links to Other Databases
33
The Binding Sequence panel A Tool for
Evaluating TF Binding Sites
34
Part IV Example Biocomputational Applications
35
MATCH and PATCH For Finding TF Binding Sites
  • MATCH Uses PSSM searching to find TF binding
    sites
  • PATCH Uses pattern searching to find TF binding
    sites
  • Nice tools, but not industry standard
  • ? Problem How good are they? Who knows?

36
What is a PSSM/PWM?
  • PSSM Frequency or likelihood matrix that
    describes the nucleotide variance at a position
    in a DNA or protein sequence
  • Description from Bioinformatics Sequence,
    Structure and Databanks A Practical Approach
    eBook
  • Can be derived from TRANSFACs binding matrix
  • Can be used as input for MATCH
  • How well does MATCH perform? Who knows
  • Another option using MEME MAST are classic
    programs for searching DNA or protein using a
    PSSM
  • ? Lane FAQ on MEME

37
BioBases MATCH Program
  • Now part of ExPlain tool ? essentially a wizard
  • Need to login with (free) personal account
  • MATCH is useful for e.g.
  • Simple Searching for binding sites in individual
    sequences
  • More sophisticated Identifying those genes
    expressed in specific tissues or cell cycle
    stages that include binding site by first
    assembling collection of potential target genes
    and searching that group
  • E.g., muscle-specific, immune-specific, etc
  • Reminder Stanford has not licensed full ExPlain
  • Some functions not available, although they are
    listed in the documentation

38
Part V Summary of Limitations
39
Content Limitations
  • Standard caution TRANSFAC should not be
    considered comprehensive or fully up to date.
  • ? Use as first step in collecting TFs
  • All data in a TRANSFAC record are derived from
    the primary experimental literature
  • However, there are exceptions, e.g.
  • TFs without a known binding site in species X are
    sometimes included on the basis of binding
    observed in species Y (orthology-based
    transitive assignment)
  • ? note they dont make that clear
  • Search engine can return TFs based on data
    originating from computational analysis
  • Origin of data sometimes very unclear
  • Source paper?
  • Nature of evidence?
  • Quality score?
  • Application-specific jargon can be confusing
  • compelsite, isogroup
  • ? heavy requirement to read documentation to be
    clear on terms and how they are used in
    application

40
Search Engine Limitations
  • Use of querying topics can be confusing or
    limiting
  • Various usability issues, presumably associated
    with youth of application, e.g.
  • Data sets are incompletely integrated for search
    purposes
  • Can find ER-alpha binding factor but not ER-1
    (Locus Report lists them as synonyms)
  • Can find factor source using rec but not
    recombinant
  • Homo sapiens does not overlap with human

41
In Short
  • TRANSFAC is a solid source of TF data of many
    types
  • BKL provides nice integration of all aspects of a
    TF
  • Goes way beyond what TRANSFAC used to provide
  • However, text search engine is unreliable because
    of lack of thesaurus
Write a Comment
User Comments (0)
About PowerShow.com