An Ontology for ProteinProtein Interaction Data - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

An Ontology for ProteinProtein Interaction Data

Description:

DIP (Jing Xia) Database of Interacting Proteins. Most reliable data set. Jing Xia. BIND (Abhijit Erande, Aaron Schoenhofer) Biomolecular Interactions Network Databank ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 29
Provided by: KarenJ53
Category:

less

Transcript and Presenter's Notes

Title: An Ontology for ProteinProtein Interaction Data


1
An Ontology for Protein-Protein Interaction Data
  • Karen Jantz
  • CIS Honors Project
  • December 7, 2006

2
Overview
  • Problem Statement
  • Objectives
  • Approach
  • Background
  • Methodology
  • Evaluation
  • Demonstration
  • Conclusion

3
Problem Statement
  • Several sources for protein-protein interaction
    data
  • Different schemata
  • Different purposes
  • Different strengths/weaknesses

4
Objectives
  • Unify the data
  • Enable data mining
  • Evaluate reliability of data across data sources
  • Gain new information about the entire data set
  • Enable others to easily add other data sources to
    the set

5
Approach ontology
  • ontology n.
  • that which exists (philosophy)
  • that which is represented (artificial
    intelligence)
  • A descriptive data model
  • Defines the entities and relationships within a
    domain
  • Based upon data
  • Human-readable

6
Approach ontology
  • Data integration
  • Enables simultaneous querying across multiple
    databases
  • Data transformation
  • Enables interchange between database formats
  • Data mining
  • Enables reasoning and learning over the entire
    data set

7
Background Data Sources
  • DIP (Jing Xia)
  • Database of Interacting Proteins
  • Most reliable data set
  • Jing Xia
  • BIND (Abhijit Erande, Aaron Schoenhofer)
  • Biomolecular Interactions Network Databank
  • Very large data set
  • Contains interactions, molecular complexes, and
    pathways

8
Background Data Sources
  • MINT
  • Molecular INTeractions database
  • experimentally verified protein interactions
  • Evaluates confidence level
  • IntAct
  • Not limited to binary interactions
  • Allows user submissions
  • mips CYGD
  • Munich Information Center for Protein Sequences
    Comprehensive Yeast Genome Database
  • Limited to yeast
  • Focuses on sequencing

9
Background Tools
  • Protégé
  • Open-Source Project
  • Graphical ontology editor
  • Interacts with OWL Reasoner
  • Detailed API for modifying ontologies
    programmatically

10
Background Tools
  • Prompt
  • A Protégé Plugin
  • Enables ontology mapping
  • Enables ontology comparison

11
Background Related Work
  • PSI-MI
  • Controlled vocabulary for PPI data
  • Not a proposed database structure
  • Decreases the strength of information
  • Helpful in defining relationships and keys

12
Methodology Overview
Web Interface
Q What interactions have been observed between
with protein A?
Q What experiments give evidence for a given
interaction?
Unified Ontology
Unified Data Set
transformation
DIP
BIND
MIPS
MINT
IntAct
13
Methodology Design
  • Review the singular database schemata and
    determine strengths/weaknesses
  • View data files
  • Native formats
  • PSI-MI formats
  • Create a unified schema of the data sources
  • Create the unified ontology in Protégé
  • Create each singular database as a subset of the
    unified ontology

14
Protégé Screenshot
15
Methodology Data Import
  • DOMParser
  • Load data from XML
  • Protégé-OWL API
  • Insert entities into singular databases

16
Methodology Transformation
  • Use Prompt to create a mapping for each specific
    data source to the unified ontology
  • Use Prompt mappings to insert individuals from
    each singular ontology into the unified model

17
Methodology Transformation
  • Duplicate Data
  • Need to fill in attributes on existing records
  • Write Algorithm Plugin for Prompt to determine
    when individuals are the same

18
Prompt Screenshot - Mapping
19
Methodology Query Interface
  • Export Protégé data into MySQL
  • Web interface for collecting data
  • Working with domain experts to determine useful
    views, queries

20
Evaluation
  • Performance
  • Transformation Time in Protégé
  • Query Time for Web Interface
  • Size
  • Minimize redundancy in data model
  • Minimize duplicate data

21
Evaluation
  • Correctness
  • Domain Experts
  • Dr. Brown, Dr. Wang
  • Maintain proper data relationships
  • Utility
  • Enrich data

22
Evaluation
23
Demonstration
24
Future Work
  • Complete transformations
  • Import data
  • Evaluate ontology
  • Add other databases to model

25
Conclusions
  • Adequate start
  • Needs improvement, evolution, more data sources
  • As the project matures, the ontology will be
    ready for use in the biological domain
  • Will be able to more easily gain information
    about protein-protein interactions

26
References
  • AAAI.org - AITopics Ontology
  • http//www.aaai.org/AITopics/html/ontol.html
  • Protégé
  • http//protege.stanford.edu/overview/protege-owl.h
    tml
  • Prompt
  • http//protege.cim3.net/cgi-bin/wiki.pl?Prompt
  • PSI-MI
  • http//psidev.sourceforge.net/mi/xml/doc/user

27
References
  • BIND
  • http//www.bind.ca
  • DIP
  • http//www.dip.doe-mbi.ucla.edu
  • IntAct
  • http//www.ebi.ac.uk/intact/site/
  • MINT
  • http//mint.bio.uniroma2.it/mint/Welcome.do
  • MIPS
  • http//mips.gsf.de/genre/proj/yeast

28
Q A
Write a Comment
User Comments (0)
About PowerShow.com