Bioinformatics in Cancer Biotechnology - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Bioinformatics in Cancer Biotechnology

Description:

Haplotype. Database (SNPinfo etc) Protein Interaction ... SNP & Haplotype data (SNPinfo; Disease association. Candidate gene sets. Candidate pathway sets ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 59
Provided by: bob4154
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics in Cancer Biotechnology


1
Bioinformatics in Cancer Biotechnology
  • Bob StephensAdvanced Biomedical Computing
    CenterAdvanced Technology ProgramSAIC-Frederick,
    Inc.National Cancer Institute at Frederick
  • April 19, 2007

2
Objectives
  • Overview/introduce bioinformatics concepts,
    applications and databases.
  • Describe interplay between bioinformatics,
    technologies and the web.
  • Profile importance of bioinformatics in cancer
    research.

3
What is bioinformatics ?
  • Bioinformatics is the application of
    computational methods to the analysis of any type
    of biological data.
  • Bioinformatics has become a diverse and
    multi-disciplined field that originally derived
    from computer science and biological science.

4
Evolution of bioinformatics
  • Rapid technological advances in sequence
    determination set the pace for data acquisition.
  • Similar advances in computing power and
    algorithmic approaches for sequence analysis,
    robotics enabled instruments.
  • Co-evolution with web browser and programming
    language technologies.

5
Bioinformatics evolution (contd.)
  • Additional high throughput technologies becoming
    available almost daily - microarrays, proteomics,
    population and genetic data, medical literature
    etc.
  • Data volume is increasing at the same time as
    data complexity.
  • Data distribution/synchronization becoming an
    increasingly difficult task.

6
Interplay between technology and bioinformatics
  • New HT Technologies, eg. mRNA microarray
  • Analysis and storage software
  • Computational infrastructure
  • Data integration

7
Example
  • mRNA expression chip (20000 genes x 16 probes per
    gene), a few mb per sample.
  • Data normalization software.
  • Exon array - multiple probes for each exon for
    each of the 20000 genes - one file about 1gb.
  • New normalization method requires all samples to
    be loaded simultaneously.
  • More complex analysis reveals alternative
    splicing etc.

8
Interface of technologies and biology
  • Experimental design very important in HT biology
  • Experiments shaped by data access and
    availability
  • Re-analysis of old data with new methods important

9
(No Transcript)
10
Bioinformatics historical perspective
  • Stage 1 - bioinformatics term is coined to
    represent what had been DNA and protein sequence
    analysis (ca. 1995)
  • Stage 2 - additional disciplines become rolled
    into bioinformatics including literature mining,
    statistical analysis, and virtually anything to
    do with computational analysis of biological
    data. (ca. 2000)

11
Bioinformatics - historical perspective (contd)
  • Realization that bioinformatics is too broad a
    term, other disciplines break away eg. OMICs
    fields (eg genomics, proteomics others (ca.
    2001).
  • Still later (current) realization is made that we
    wont be able to make any sense of individual
    disciplines without integrating them together,
    term now changed to integrative biology or
    systems biology (ca. 2003).

12
Importance of bioinformatics
  • Bioinformatics has become a major part of both
    the NCI 2015 directive and the NIH Roadmaps.
  • Virtually impossible to perform biological
    research without some form of computer aided
    analysis, especially in areas like genomics and
    proteomics.
  • Important to keep scientific community in touch
    with developing technologies and capabilities for
    highest return on research investment.

13
Bioinformatics infrastructures
  • Command-line implementations.
  • Primitive GUI implementations.
  • Sophisticated GUI interfaces and application
    packaging.
  • Web interface and Java language gives platform
    independent access.
  • PC-based, web-based and server-based
    architectures.
  • Multiple tier infrastructures distributes
    computational burden.

14
What does bioinformatics technology involve ?
  • Computer readable form of some type or types of
    biological data (instruments)
  • Automation also requires programmable robotics
    capabilities (process science).
  • Computer infrastructure for storing and analyzing
    the data.
  • As data volume and complexity grows, the
    dependency on computer analysis increases.

15
Sources of bioinformatics technology
  • Computer science leveraged technologies including
    algorithms and data representation models,
    visualization frameworks and programming
    languages.
  • Web industry leveraged technologies including
    communication protocols, web servers and secure
    access.
  • Database industry derived connectivity and
    technologies.
  • Robotics and process engineering technologies for
    faster, cheaper throughput.

16
What can bioinformatics technology do for
biological science ?
  • Develop uniform data standards and controlled
    vocabularies to allow for integration of
    disparate sources/types of data.
  • Connect scientists to entire wealth of knowledge
    from basic science results to clinical trial data
    in context-sensitive manner.
  • Fully integrate worldwide volume of knowledge,
    for example patient information
    disease-gttreatment-gtoutcome across multiple
    centers to allow for cross-comparisons.

17
NCI Resources
  • caBIG NCICB Initiatives to develop integrated
    data/tool environment..
  • Long term project requiring unprecedented
    cooperation, sharing.
  • Short term solutions for day-to-day problems.
  • Solution - use multiple approaches, staged
    implementation and layered technologies

18
(No Transcript)
19
ABCC hardware
  • 128 cpu linux cluster (3.0 ghz processors).
  • 256 cpu linux smp box with 1Tb memory.
  • 64 cpu IRIX smp box with 256gb memory.
  • 32 cpu IBM AIX smp computers.
  • 16 cpu IBM HPC AIX smp computer.
  • 8 x 8cpu IRIX computers.
  • Other miscellaneous computers, disk storage, tape
    backup and network connectivity.
  • Graphics visualization wall

20
(No Transcript)
21
ABCC Organization
  • Networking and Security
  • System administration
  • Scientific program development
  • Bioinformatics support
  • Staff 40

22
ABCC Training Programs
  • Classes for NIH/NCI scientists
  • Unix, GCG, Java, High throughput sequence
    analysis, Geospiza (LIMS)
  • Eudora, Advanced Eudora, Webmail
  • Homology, Docking, QSAR, Intro to Modeling,
    Phred, Phrap, Consed
  • One-on-one consulting services and training.
  • Organize and host vendor specific training in
    genomics, pathways, and modeling

23
ABCC Support within ATP
Proteomics and Analytical Technologies (LPAT)
Computational Support Database Tools/Pathways Mass
Storage and Archive Pattern Analysis and
Clustering
Molecular Technologies (LMT)
Image Analysis (IAL)
Computational Support Database Tools and
LIMS Mass Storage and Archive Bioinformatics/Web P
attern/SNP Analysis
ABCC
Algorithm and Software Image Database Mass
Storage and Archive Viz Technology Development
Gene Expression (GEL)
Protein Chemistry (PCL)
Software Support
Gene Assembly and Validation
Animal Sciences (LASP)
Protein Expression (PEL)
Mass Storage Database
POET/Web
24
ABCC applications
  • Sequence analysis - protein and nucleic acid, GCG
    and EMBOSS.
  • Sequence assembly, SNP detection.
  • Gene finders, analysis tools.
  • Molecular modeling, docking.
  • Molecular evolution and phylogeny.
  • Computational chemistry.
  • Linkage analysis.
  • Proteomics.
  • Classification tools (microarray and proteomics).

25
ABCC databases
  • Genbank and derived divisions.
  • Refseq, WGS, unigene divisions.
  • dbSNP, gene, OMIM, homologene.
  • UCSC, EBI and ncbi genome datasets.
  • LIMS systems, data management.
  • Uniprot, PDB, PIR, iProClass, Swissprot.
  • CGAP, MGC data files, pathways.
  • Medline, transfac and repeats data files.

26
ABCC web resources
  • ABCC General information web page
    http//www.abcc.ncifcrf.gov
  • ABCC account application information
    http//www.abcc.ncifcrf.gov/apps_apply.shtml
  • ABCC Training web page http//www.abcc.ncifcrf.gov
    /training/courses.shtml
  • ABCC scientific applications webpage
    http//www.abcc.ncifcrf.gov/app/htdocs/appdb/index
    .php
  • ABCC GRID Database web page http//grid.abcc/ncifc
    rf.gov
  • ABCC Pipelines web page http//www.abcc.ncifcrf.go
    v/app/login/login.php

27
The role of bioinformatics in cancer research
  • Diagnosis - identify classifiers to better
    sub-divide cancer etiologies into groups. Better
    individual data to put treatment and individual
    together.
  • Treatment - identify better methods to track
    treatment progress and indicate problems earlier.
  • Prevention - understand mechanisms for cancer
    initiation, progression and development and
    identify targets in this process.
  • Connect cancer patient data from geographically
    distributed cancer patients for more complete
    analysis.

28
Protein analysis tools
  • Protein composition, isoelectric point, molecular
    weight analysis tools.
  • Comparable alignment/searching tools for
    proteins.
  • Protein secondary structure prediction tools.
  • Protein structure modeling tools.

29
Genomics tools
  • Gene finder and general genome annotation tools.
  • Cross genome comparison tools and databases.
  • Large scale sequence assembly and polymorphism
    identification tools.
  • Genomic visualization tools (UCSC, NCBI,
    Ensembl).
  • Data cleansing tools - vector screening, repeat
    masking.

30
Gene expression tools
  • EST Clustering and differential expression
    analysis tools and databases.
  • SAGE Analysis tools and databases.
  • Microarray data collection, calibration and
    analysis tools and databases.
  • Gene clustering and visualization tools.
  • Integration tools - pathways, regulatory networks
    and medical literature.
  • Databases for housing and querying the data.

31
Proteomics tools
  • Mass spectroscopy tools for peptide
    identification.
  • Fragment classification tools for identification
    of diagnostics
  • Peptide fragment resolution tools -
    identification of protein mixtures from peptide
    sets.
  • Databases for storing and querying the data.

32
Inherent bioinformatics problems
  • Keeping data sources synchronized and up to date.
  • Keeping applications up to date.
  • Remaining aware of current palette of available
    tools and resources.
  • Separation between computer developers and
    biologist users of software and databases.
  • The silo concept- separate dysfunctional units.
  • Lack of common language or database schema.

33
Data Analysis
  • Pathway analysis
  • Polymorphism
  • Proteomics
  • Image analysis
  • Homology Modeling
  • Live polymorphism analysis (if time permits)

34
Pathway Analysis
  • Identify specific requirements of individual
    tumor.
  • Advance to detection from diagnosis.
  • Multiple points to cause aberrations and multiple
    points to act to correct them.
  • Identify/characterize tissue, cell specific
    targets.

35
Pathway Gene Set Analysis
  • Many experiments result in sets of genes, eg
    microarray, proteomics, literature searches etc.
  • Clustering genes based on expression etc.
    provides only first dimension.
  • View prospective pathways impacted by changes in
    expression, protein levels, phosphorylation etc.

36
G5G8Tg1Liver
G5G8Tg2Liver
G5G8-/-1Liver
G5G8-/-2Liver
G5G8-/-3Liver
37
G5G8Tg1Liver
G5G8Tg2Liver
G5G8-/-1Liver
G5G8-/-2Liver
G5G8-/-3Liver
38
Integrative Strategy for Microarray Analysis
Microarray Data
Clustering Analysis
Load into WPS
WSCP
Unassigned Genes
Integrate with WPS
Lists of Genes
Assign to uncharacterized pathway(s)
Assign to known pathway(s)
Putative Pathway
PSCP
PSCP
PSCP
39
Project Goal Integrate Biological Data and/or
Information Databases into Biological Networks
User input Microarray Data, Proteomics
Protein Interaction Database (BIND, DIP etc.)
Comparative Genomics
P1
P2
Protein Modification Phos., Glyco.
Gene regulation (Promoter etc)
Gene Ontology
SNP Haplotype Database (SNPinfo etc)
Literature DB (e.g. Pubgene ResNet)
NCBI resources OMIM etc

Statistical Evaluation Network Expansion (high,
low confidence)
40
One example of analysis scenario
microarray data pathway analysis or clustering
in local PC
Candidate gene sets Candidate pathway sets
Pre-computed DBs or Run-time computed
Internet-enabled
SNP Haplotype data (SNPinfo Disease association
Promoter Comparsion 1.CGI generator 2.CoreSearch 3
ConsInspector)
Protein interaction
Literature-based (Pubgene etc NCBI OMIM etc)
GO
Known gene training
Weighted scoring (Statistic analysis, filtering)
Final set of candidate genes (visualization and
re-creation of the new subnetwork within the
whole network)
Pathway expansion
41
Polymorphism Impacts
  • Variation within species as great as differences
    between closely related species
  • Confounds correlation analysis
  • Impacts gene structure and expression
  • Start with complete sequence for individual,
    obtain polymorphism data for populations/strains
    and breeds etc.
  • Strains/breeds allow for good start

42
Polymorphism Types
  • SNPs
  • Indels
  • STRs
  • Tandem
  • NonTandem (Copy number variation)
  • Retroelement
  • Complex
  • Inversion/translocation

43
STR Polymorphism View
44
Strain Trace and Contig Coverage View
45
InDel Polymorphism Information View
46
Location Polymorphism Locator Query
47
STR Query results
48
Polymorphism Visualization
49
Proteomics InitiativeABCC Projects
  • Disk Storage and Archiving (centralized storage)
  • LAN Support
  • Software Development
  • Spectral Filtering
  • Clustering/Biomarker Identification
  • Database Development and Update
  • Peptide identification DB
  • MS Integration with Pathways
  • ABCC Pathway tool
  • Provide Scalable Computational Resources
  • Software Optimization
  • Sequest (working with LPAT,Yates Lab, and
    Thermoelectron)

50
Raw Data
Binning
Biological Marker
Clustering
51
Need for effective classification schemes for
correlating large amounts of data with Cancer
markers
  • Large amounts of data.
  • Many features (data points) to fit but few
    samples
  • Problems are over-determined
  • Solutions may be purely mathematical with no
    biological basis

52
Image Processing
  • Confocal Microscopy Whole Animal Imaging
  • 3D Segmentation
  • Traditional/Real-time Microscopy
  • Automated Quantitative Feature Analysis

53
Confocal Imaging
  • Confocal Microscopy captures 3D volumes of tissue
    in situ
  • Cancer appearance / development is related to the
    cellular neighborhood
  • Therefore, segmentation and interpretation of
    cellular clusters is required
  • NCI Developed Algorithms
  • Segmentation needs human review

54
Imaging/Confocal Microscopy
55
Homology Modeling
  • Many new chemotheraputic molecules are specific
    enzyme inhibitors
  • Structural biology plays key role in
    design/enhancement of these compounds.
  • Identify better inhibitors, understand specific
    differences and mechanisms.

56
Homology Modeling of Cysteing Finger in 3 Human
Raf Proteins
57
ABCC Bioinformatics Support Group
  • Anney Che
  • Jack Chen
  • Jin Chen
  • Qingrong Chen
  • David Liu
  • Uma Mudunuri
  • Jigui Shan
  • Wei Shao
  • Gary Smythers
  • Hong Mei Sun
  • Natalia Volfovsky
  • Xinyu Wen
  • Ming Yi
  • Jack Zhu

58
Bob Stephensbobs_at_ncifcrf.govwww.abc.ncifcrf.gov
Query tool
GBrowse
Write a Comment
User Comments (0)
About PowerShow.com