Agenda - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Agenda

Description:

Different views on biological data management (VLDB 2004 Panel on Biological Data Management) ... American Academy of Microbiology Report, 2002 ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 18
Provided by: mges
Category:

less

Transcript and Presenter's Notes

Title: Agenda


1
Integrated Microbial Genomes (IMG) System
A Case Study in Biological Data Management
  • Different views on biological data management
    (VLDB 2004 Panel on Biological Data
    Management)
  • Computer Scientists
  • Source of problems for database research
  • Publication in database papers
  • Prototypes
  • Biologists
  • Vehicle for rapid data analysis
  • Publication in biology papers
  • Immediate solutions

Victor M. Markowitz Frank Korzeniewski Krishna
Palaniappan Ernest Szeto Biological Data
Management Technology Center Lawrence Berkeley
National Lab Nikos C. Kyrpides Natalia N.
Ivanova Microbial Genome Analysis Program Joint
Genome Institute
2
Biological Data Management Problem
  • Effective data analysis
  • involves combining data from multiple
    sources
  • single data type data generation
    collection
  • multiple data types data association
  • in the context of inherently imprecise data

3
Background Microbial Genomes
  • Applications
  • Healthcare, environmental cleanup, agriculture,
    industrial processes, alternative energy
    production

4
Microbial Genome Data Analysis Context
5
Data Analysis Example Occurrence Profiles
  • Key Challenges
  • Representing abstract concepts with experimental
    data
  • Specifying individual and composite operations
  • Data coherence, completeness, integration

6
Microbial Genomes Data Generation Collection
  • Process
  • Raw data
  • Small DNA sequence fragments
  • Assembled sequence fragments (contigs)
  • Complete (one contiguous) sequence
  • Interpreted data
  • Gene prediction (models)
  • Functional prediction (annotations)
  • Expert data validation (cleaning)
  • Expert annotations
  • Key Challenges
  • Diversity of data sources
  • Differences in models, depth/breadth of
    annotations
  • Consistency of the data transformation process

7
Data Transformation Process Example
Microbial Genome Annotation Pipeline (ORNL)
Preliminary Functional Annotation
Annotation Data Files
ORF Calling
Fetch
Post
Sequence Data Files
8
Microbial Genomes Data Association
  • Key Challenges
  • Data quality/precision for different types of
    data, sources
  • Transience of identifiers, relationships

9
Biological Data Management Problem Revisited
  • Effective data analysis involves
  • combining data from multiple sources
  • in the context of inherently imprecise data
  • while addressing
  • Data quality
  • Data semantics, precision, integrity, provenance
  • System quality
  • Comprehensibility, performance, reliability,
    scalability
  • Development strategy
  • Choice of technologies
  • Devising (cost, time) effective solutions

10
Needed System Development Framework
Deploy System
11
Requirement Analysis Example IMG Data Analysis
Find unique genes in a genome of interest ?0
wrt related genomes ?1 , , ?k
12
Data Model Abstraction
  • Motivation
  • Adds precision
  • Allows reasoning in an established framework
  • Analogies to traditional data domain
  • Biological data modeling
  • Data warehouse concepts
  • Proven technology for large scale biological data
    management applications
  • Data Structure
  • Multidimensional data space
  • Gene, genome, function/ pathway
  • Operations
  • Multidimensional space selections, projections,
    aggregations
  • Slice dice, roll up, drill down analogies

13
Data Model Abstraction Example IMG Data Model
14
Data Model Abstraction Example IMG Operations
Genes
Genomes
Functions/ Pathways
15
Data Analysis Example Searching for Unique Genes
parasite in horses
Causes human disease in tropical areas
(melioidosis)
16
Identifying Unique Genes of Interest
Genes involved in adherence and invasion
17
Exploring Unique Gene Details
18
Summary
  • Needed
  • Effective solutions for academic biological data
    management
  • Employing appropriate technologies and methods
  • Developed within (time, cost) constraints
  • IMG Case Study
  • System development process framework essential
    for
  • Continuously evolving content
  • aiming at coherence, completeness
  • Developing meaningful data analysis tools
  • Clarity of methods, parameters, results
  • Metric for success
  • Community adoption and support
  • Increase in analysis productivity and value

19
Summary
  • Biological Data Management in Academic Settings
  • Problems discussed in numerous forums since 1990
  • Tools, techniques - poorly understood used
  • Potential Causes
  • biologists have been ineffective in the care
    and feeding of databases that now extends to
    poor maintenance of genomics databases
    American Academy of Microbiology Report, 2002
  • Computer scientists in pursuit of insignificant
    or misunderstood problems Bio Data Management
    Workshop, 2003
  • Have little interest in tedious, repetitive, data
    management tasks
  • diminished responsibility for biological
    databases . Is correlated with lack of
    enthusiasm for funding these efforts AAM
    Report 2002
  • Poor industry support
Write a Comment
User Comments (0)
About PowerShow.com