High-throughput Biological Data The data deluge - PowerPoint PPT Presentation

About This Presentation
Title:

High-throughput Biological Data The data deluge

Description:

Title: PowerPoint Presentation Author: heringa Last modified by: heringa Created Date: 2/20/2003 5:34:46 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 13
Provided by: heringa
Category:

less

Transcript and Presenter's Notes

Title: High-throughput Biological Data The data deluge


1
High-throughput Biological DataThe data deluge
  • Hidden in these data is information that reflects
  • existence, organization, activity, functionality
    of biological machineries at different levels
    in living organisms

Most effectively utilising this information will
prove to be essential for Integrative
Bioinformatics
2
Data Issues
  • Data collection getting the data
  • Data representation data standards, data
    normalisation ..
  • Data organisation and storage database issues
    ..
  • Data analysis and data mining discovering
    knowledge, patterns/signals, from data,
    establishing associations among data patterns
  • Data utilisation and application from data
    patterns/signals to models for bio-machineries
  • Data visualization viewing complex data
  • Data transmission data collection, retrieval,
    ..

3
Bio-Data Analysis and Data Mining
  • Existing/emerging bio-data analysis and mining
    tools for
  • DNA sequence assembly
  • Genetic map construction
  • Sequence comparison and database searching
  • Gene finding
  • .
  • Gene expression data analysis
  • Phylogenetic tree analysis, e.g. to infer
    horizontally-transferred genes
  • Mass spec. data analysis for protein complex
    characterization
  • Current mode of work

Often enough developing ad hoc tools for each
individual application
4
Bio-Data Analysis and Data Mining
  • As the amount and types of data and their cross
    connections increase rapidly
  • the number of analysis tools needed will go up
    exponentially
  • blast, blastp, blastx, blastn, from BLAST
    family of tools
  • gene finding tools for human, mouse, fly, rice,
    cyanobacteria, ..
  • tools for finding various signals in genomic
    sequences, protein-binding sites, splice junction
    sites, translation start sites, ..

5
Bio-Data Analysis and Data Mining
Many of these data analysis problems are
fundamentally the same problem(s) and can be
solved using the same set of tools e.g.
clustering or optimal segmentation by Dynamic
Programming
Developing ad hoc tools for each application (by
each group of individual researchers) may soon
become inadequate as bio-data production
capabilities further ramp up
6
Bio-data Analysis, Data Mining and Integrative
Bioinformatics
To have analysis capabilities covering wide
range of problems, we need to discover the common
fundamental structures of these problems HOWEVER
in biology one size does NOT fit all
Goal is development of a data analysis
infrastructure in support of Genomics and beyond
7
Algorithms in bioinformatics
string algorithms dynamic programming
machine learning (Neural Netsworks, k-Nearest
Neighbour, Support Vector Machines, Genetic
Algorithm, ..) Markov chain models hidden
Markov models Markov Chain Monte Carlo (MCMC)
algorithms stochastic context free grammars
EM algorithms Gibbs sampling clustering
tree algorithms text analysis
hybrid/combinatorial techniques and more
8
Sequence analysis and homology searching
9
Finding genes and regulatory elements
10
Expression data
11
Functional genomics
Monte Carlo
12
Protein translation
Write a Comment
User Comments (0)
About PowerShow.com