Bioinformatics and cancer - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Bioinformatics and cancer

Description:

Application of computer technology to biological problems ... Rodent homologs. Still very little genomic sequence, so have to rely on ESTs alone ... – PowerPoint PPT presentation

Number of Views:1005
Avg rating:3.0/5.0
Slides: 25
Provided by: victorjo
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics and cancer


1
Bioinformatics and cancer
  • How an experimental science can benefit from
    information technology

2
What is bioinformatics?
  • Application of computer technology to biological
    problems
  • Extraction of biological information from raw
    data
  • Presentation of complex results in an
    understandable form
  • Generation of testable hypotheses for the
    experimentalists

3
How is bioinformatics relevant to cancer?
  • Record keeping, particularly in complex,
    multi-center studies
  • Immunoscope
  • Compiling and integrating information derived
    from multiple sources
  • SEREX
  • Leveraging the information derived from genome
    and transcriptome sequencing projects
  • ORESTES

4
The SEREX database
  • Single repository for all SEREX data
  • Unified format for storing and accessing data
  • Unified set of tools for first-level analysis of
    new sequences
  • Identification of recurrent epitopes
  • Curation and annotation of data

5
Primary SEREX data
  • Identity of clone
  • Library, origin of serum, clone identifiers
  • Sequence of clone
  • Nucleotide, protein if unambiguous
  • Experimental data
  • Reactivity with autologous and hetereologous sera
  • Tissue-specific expression

6
Derived SEREX data
  • Identity of gene from which cDNA was derived
  • Full sequence of the cDNA
  • Sequence of the corresponding protein
  • Similarities in the databases
  • Chromosomal localization

7
Nucleotide sequence analysis methods
  • Compare to SEREX database
  • Find if epitope already identified by others
  • Compare to human section of EMBL
  • Find if sequence is derived from known gene
  • Compare to Unigene database
  • Identify gene or EST cluster from which sequence
    is derived
  • Get information about chromosomal localization,
    specificity of expression
  • Compare to protein databases
  • Get information about candidate ORFs, taking
    frameshifts into account

8
Protein sequence analysis methods
  • Compare to protein databases
  • Try to identify potential homologs
  • Compare to profile databases
  • Identify sequence motifs indicative of structure
    and/or function
  • Search for HLA-binding peptides
  • Intrinsic methods
  • Look for coiled-coils, transmembrane segments,
    signal peptides, etc.

9
Current status of database
  • 1806 entries (Sep. 2000)
  • Reactivity 359 testis, 206 stomach, 310 breast,
    157 kidney, 102 colon, 37 lung, 73 melanoma, 180
    cell lines
  • 1480 entries in the public part of the database
  • 593 sequences match at least one other sequence
  • Many genes are novel

10
Re-engineering the SEREX database
  • Convert to true relational database
  • Standardize the annotation
  • Link to external databases
  • SWISS-PROT, GeneCards, LocusLink, RefSeq
  • Integrate into Cancer Immunome Database

11
Gene discovery tools
  • General goal convert raw data to a form where it
    can be searched efficiently and successfully
  • Tools developed at the SIB
  • EST clustering and assembly software
  • Genome contig reconstitution
  • Program to find and correct coding regions in
    low-quality sequences

12
ESTScan, TrEST and TrGEN
  • ESTScan is a tool that uses a special hidden
    Markov model of coding regions to find and
    correct coding sequences in ESTs
  • TrEST is a virtual protein database derived from
    EST contigs using ESTScan
  • TrGEN is a virtual protein database derived from
    raw genome sequences using GENSCAN
  • Both are very rich sources of novel genes

13
Hunting for homologs of NY-ESO-1
  • NY-ESO-1 is a prototype CT antigen, to which
    cancer patients mount both humoral and cellular
    responses
  • There is a second human gene, LAGE-1, also
    located on chr X and highly similar to NY-ESO-1
  • Are there other human genes in the same family?
  • Are there homologs in other species?

14
Methodology
  • Identify similar regions in NY-ESO-1 and LAGE-1,
    and make profile
  • Use profile to search TrEST and TrGEN, plus
    traditional protein databases (SWISS-PROT,
    TrEMBL)
  • Use profile to search EST contigs and predicted
    genomic exons in framesearch tolerant mode
  • Use hits from profile search to refine profile
    and reiterate database searches

15
Human homologs
  • First search identifies a predicted CDS and a
    Unigene cluster mapping to Xp28
  • The corresponding mRNA and protein are in the
    databases (ITBA2), but translated in the wrong
    frame!
  • Second search identifies a new Unigene cluster,
    whose closest current homolog is a pseudogene on
    chr9
  • Therefore, there are at least four human family
    members

16
Rodent homologs
  • Still very little genomic sequence, so have to
    rely on ESTs alone
  • In mouse, pick up two clusters that are similar
    to each other and to ITBA2
  • In rat, pick up only one cluster similar to the
    mouse ones
  • One mouse mRNA checked for tissue specificity
    turned out not to be CT antigen

17
More distant homologs
  • For genes centrally involved in development or
    metabolism, expect homologs in most eukaryotes
  • Found ESO-1 homologs in several fully sequenced
    genomes Drosophila, C. elegans, S. pombe (but
    not S. cerevisiae)
  • In Drosophila, protein was misannotated as
    ribosomal, because is it located adjacent to L34
    protein from 60S subunit

18
Alignment of homologous regions
19
Lessons learned
  • EST and unannotated genome data are still a rich
    source of information for gene discovery
  • Much of the existing annotation is erroneous,
    even if it was not done automatically
  • Bioinformatics approaches can suggest new
    experimental avenues

20
The ORESTES project
  • Sponsors Ludwig Institute for Cancer Research,
    FAPESP (São Paulo State funding agency)
  • Goal to obtain EST sequences from the
    under-represented, often coding, central portions
    of mRNAs
  • Methodology use low-stringency semi-random
    priming followed by PCR, producing low complexity
    libraries
  • Results over 500000 ESTs produced, of which
    half produce novel information

21
The Human Transcriptome project
  • Sponsors NCI, LICR, FAPESP
  • Goal to provide a comprehensive and
    experimentally validated catalog of human
    transcripts
  • Methodology create a stable index of transcripts
    by using poly(A) tags, extend this by a
    combination of expermental and bioinformatic
    methods

22
The EUCIP project
  • Sponsors European Union, LICR
  • Goal to examine in detail the cancer biology of
    genes identified using the SEREX method
  • Methodology for each gene, examine
    cancer-related alterations, patterns of
    expression, tissue distribution and immune
    responses document in integrated database

23
The DNA chip consortium
  • Partners LICR, ICRF, Sanger Centre
  • Goal to produce cDNA chips representing the
    entire human transcriptome
  • Current state two 5000-feature chips produced
    routinely, expect non-redundant 10k chips at
    years end

24
Credits
  • Philipp Bucher (ISREC)
  • Christian Iseli (LICR)
  • Brian Stevenson (LICR)
  • Dmitry Kuznetsov (LICR)
  • Andy Simpson and Sandro de Souza (LICR São Paulo)
Write a Comment
User Comments (0)
About PowerShow.com