Intelligent Curation Using Ontologies - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Intelligent Curation Using Ontologies

Description:

(racer) Classified Protein. Phosphatases. Raw protein. sequences ... A DL reasoner (racer) is used to compare individuals to the OWL ontology definitions ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 31
Provided by: Kat8182
Learn more at: http://ww25.co-ode.org
Category:

less

Transcript and Presenter's Notes

Title: Intelligent Curation Using Ontologies


1
Intelligent Curation Using Ontologies
  • K.Wolstencroft

2
Introduction
  • Developing an automated system for extracting and
    classifying proteins from newly sequenced genomes
  • Background
  • Architecture
  • Advantages

3
Motivation
  • Genome sequencing techniques greatly improved
  • More whole genomes are being sequenced quickly -
    lots of data being generated
  • Without analysis and classification sequences
    are simply a series of letters
  • Therefore, data analysis is now the rate-limiting
    step

4
Why Classify?
  • Classification and curation of a genome is the
    first step in understanding the processes and
    functions happening in an organism
  • Classification enables comparative genomic
    studies - what is already known in other
    organisms
  • The similarities and differences between
    processes and functions in related organisms
    often provide the greatest insight into the
    biology

5
BackgroundDNA to Proteins
  • Genome sequencing produces DNA sequences
  • DNA blueprint of an organism
  • DNA encodes complex molecules mostly proteins
  • Proteins are the functional molecules of a cell

6
Proteins
  • Complex molecules constructed from sequences of
    amino acids
  • 20 different amino acids with different chemical
    properties

7
Proteins Primary Structure
  • Amino acid sequences can be represented as a
    series of single letters
  • gt1A5Y_ PROTEIN TYROSINE PHOSPHATASE 1B
  • MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPCRVAKLPKNKNRNRYRDV
    SPFDHSRIKLHQEDNDYINASLIKMEEAQRSYILTQGPLPNTCGHFWEMV
    WEQKSRGVVMLNRVMEKGSLKCAQYWPQKEEKEMIFEDTNLKLTLISEDI
    KSYYTVRQLELENLTTQETREILHFHYTTWPDFGVPESPASFLNFLFKVR
    ESGSLSPEHGPVVVHXSAGIGRSGTFCLADTCLLLMDKRKDPSSVDIKKV
    LLDMRKFRMGLIATAEQLRFSYLAVIEGAKFIMGDSSVQDQWKELSHEDL
    EPPPEHIPPPPRPPKRILEPHNGKCREFFPN

8
ProteinsTertiary Structure
  • Sequence determines structure

9
Searching for Features
  • The relationship between amino acid sequence and
    eventual protein structure means that we can
    search for distinct structural (and functional)
    domains within the sequence
  • Domains could be several amino acids long or
    could span most of the protein

10
Example
  • A search of the linear sequence of protein
    tyrosine phosphatase type K identified 9
    functional domains
  • gtuniprotQ15262PTPK_HUMAN Receptor-type
    protein-tyrosine phosphatase kappa precursor (EC
    3.1.3.48) (R-PTP-kappa).
  • MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQD
    LYDDFEWVHV
  • SAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSY
    LLYSQKGLNP
  • GTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEA
    EVSGGRSGYI
  • AIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWL
    QRRNGEDIPV..

11
Human Expert Annotation
  • Bioinformaticians use a series of tools to
    identify functional domains
  • Similarity searching, domain/motif identification
  • Tools include BLAST / INTERPRO
  • Tools simply show presence of domains
  • Use expert knowledge to classify proteins
    according to domain arrangements
  • Presence / order / number of each important
  • Can an ontology be used to capture this knowledge
    to the standard of a human annotator?

12
Protein Family Classification
  • Proteins divided into broad functional classes
    Protein Families
  • Often diagnostic domains/motif signify family
    membership
  • Initial Study focuses on the protein phosphatase
    family

13
The Protein Phosphatases
  • large superfamily of proteins involved in the
    removal of phosphate groups from molecules
  • Important proteins in almost all cellular
    processes
  • Involved in diseases diabetes and cancer
  • human phosphatases well characterised

14
Characterisation allows classification
  • Diagnostic phosphatase domains/motifs
    sufficient for membership of the protein
    phosphatase superfamily
  • Other motifs determine a proteins place within
    the family
  • This human expert knowledge can be captured and
    incorporated into the model if the domain
    organisations are represented in a formal DL OWL
    ontology

15
Protein Functional Domains
Andersen et al (2001) Mol. Cell. Biol. 21 7117-36
16
Determining Class Definitions
  • R2A
  • Contains 2 protein tyrosine phosphatase domains
  • Contains 1 transmembrane domain
  • Contains 4 fibronectin domains
  • Contains 1 immunoglobulin domain
  • Contains 1 MAM domain
  • Contains 1 cadherin-like domain

17
Protégé OWL Modelling
18
Requirements
  • Extract phosphatase sequences from rest of
    protein sequences from a whole genome
  • Identify the domains present in each
  • Compare these sequences to the formal ontology
    descriptions
  • Classify each protein instance to a place in the
    hierarchy

19
Technology
OWL DL ontology
Instance Store
myGrid Workflow
Classified Protein Phosphatases
Raw protein sequences
Reasoner (racer)
20
myGrid Workflow
  • extract sequences from whole genome
  • perform simple filtering patmatdb
  • performs InterproScan to determine domain
    architecture
  • transform the InterproScan results into abstract
    OWL instance descriptions

21
myGrid Workflow
22
InterproScan Results
23
Conversion to abstract OWL format
  • restriction(lthttp//www.owl-ontologies.com/unnamed
    .owlcontainsDomainIPR000340gt cardinality(1))
  • restriction(lthttp//www.owl-ontologies.com/unnamed
    .owlcontainsDomainIPR001763gt cardinality(1))
  • restriction(lthttp//www.owl-ontologies.com/unnamed
    .owlcontainsDomainIPR000387gt cardinality(1))

24
Instance Store
  • Instance Store enables reasoning over individuals
  • Can support much higher numbers of individuals
  • OWL ontology is loaded into the instance store
  • A DL reasoner (racer) is used to compare
    individuals to the OWL ontology definitions

25
Instance Store
26
Example Instances
  • Protein Individual
  • Dual Specificity Phosphatase DUSE
  • restriction(lthttp//www.owl-ontologies.com/unnamed
    .owlcontainsDomainIPR000340gt cardinality(1))
  • restriction(lthttp//www.owl-ontologies.com/unnamed
    .owlcontainsDomainIPR000387gt cardinality(1))
  • Ontology Definition of Dual Specificity
    Phosphatase
  • containsDomain IPR000340
  • Necessary and Sufficient for class membership
  • Also inherits
  • containsDomain IPR000387 from Parent Class PTP

27
So Far..
  • Human phosphatases have been classified using the
    system
  • The ontology classification performed equally
    well as expert classification
  • The ontology system refined classification
  • - DUSC contains zinc finger domain
  • Characterised and conserved but not in
    classification
  • - DUSA contains a disintegrin domain
  • previously uncharacterised evolutionarily
    conserved

28
Aspergillus fumigatus
  • Phosphatase compliment very different from human
  • gt100 human lt50 A.fumigatus
  • Whole subfamilies missing
  • Different fungi-specific phosphorylation
    pathways?
  • No requirement for tissue-specific variations?
  • Novel serine/threonine phosphatase with homeobox
  • conserved in aspergillus and closely related
    species, but not in any other

29
Conclusions
  • Using ontology allows automated classification to
    reach the standard of human expert annotation
  • Reasoning capabilities allow interpretation of
    domain organisation
  • Produces interesting biological questions
  • Allows fast, efficient comparative genomics
    studies
  • System currently describes protein phosphatases -
    but possible to expand to other protein families

30
Acknowledgements
  • Group myGrid
  • PhD Supervisors Andy Brass, Robert Stevens
  • Phosphatase Biologist Lydia Tabernero
  • Ontogrid and NIBHI
Write a Comment
User Comments (0)
About PowerShow.com