Some thoughts on PATO - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

Some thoughts on PATO

Description:

to define terms in other ontologies ... Rigorous formal definitions in both ontologies and ... undulate value: Having a sinuate margin and rippled surface ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 78
Provided by: chris1008
Category:
Tags: pato | sinuate | thoughts

less

Transcript and Presenter's Notes

Title: Some thoughts on PATO


1
Some thoughts on PATO
  • Chris Mungall
  • BBOP
  • Hinxton
  • May 2006

2
Outline
  • Motivation revisited
  • The Ontology PATO
  • OBD using PATO for annotation

3
Who should use PATO?
  • Originally
  • model organism mutant phenotypes
  • But also
  • ontology-based evolutionary systematics
  • neuroscience BIRN
  • clinical uses
  • OMIM
  • clinical records
  • to define terms in other ontologies
  • e.g. diploid cell invasive tumor, engineered
    gene, condensed chromosome

4
Unifying goal integration
  • Integrating data
  • within and across these domains
  • across levels of granularity
  • across different perspectives
  • Requires
  • Rigorous formal definitions in both ontologies
    and annotation schemas

5
Some thoughts on the ontology itself
  • Outline
  • Definitions
  • how do we define PATO terms?
  • what exactly is it were defining?
  • is_a hierarchy
  • what are the top-level distinctions?
  • what are the finer grained distinctions?
  • shapes and colors

6
Its all about the definitions
  • Everything is doomed to failure without rigorous
    definitions
  • even more so with PATO than other ontologies
  • OBO Foundry Principle
  • Definitions should describe things in reality,
    not how terms are used
  • def should not use the word describing
  • Should we come up with a policy for definitions
    in PATO
  • currently 19 defs (2.5 are circular)
  • proposed breakout session examine all these

7
consistency the property of holding together and
retaining shape amplitude The size of the
maximum displacement from the 'normal' position,
when periodic motion is taking place placement
The spatial property of the way in which
something is placed pointed value A sharp or
tapered end epinastic value A downward bending
of leaves or other plantnparts oblong value
Having a somewhat elongated form
withnapproximately parallel sides elliptic value
Elliptic shapen hearted value Heart
shaped fasciated value Abnormally flattened or
coalescedn opacity The property of not
permitting the passage of electromagnetic
radiatio opaque value Not clear not
transmitting or reflecting light or radiant
energy undulate value Having a sinuate margin
and rippled surface permeability The property of
something that can be pervaded by a liquid (as by
osmosis or diffusion) porosity The property of
being porous being able to absorb fluids porous
value able to absorb fluids viscosity a
property of fluids describing their internal
resistance to flow viscous value a relatively
high resistance to flow. latency The time that
elapses between a stimulus and the response to
it power The rate at which work is done
8
Proposal genus-differentia definitions
  • An S is a G which D
  • Each def should refine the is_a parent
  • Single is_a parent
  • Example (non-PATO)
  • binucleate cell def a cell which has two nuclei
  • Example (proposed PATO def)
  • convex shape def a shape which has no
    indentations
  • opacity def an optical quality which exists by
    virtue of the bearers capacity to block the
    passage of electromagnetic radiation
  • v similar to existing def

9
This policy will reap benefits
  • Advantages
  • Helps avoid circularity
  • Ensures precision
  • Consistency in wording user-friendly
  • Considerations
  • Sometimes leads to awkward phrasing
  • -ity suffix - an opacity which
  • Solution
  • allow shortened gerund form
  • having, being., .
  • most of the existing defs conform already
  • implicit prefix A G which exists by virtue of
    the bearer

10
From the top down
  • First, the fake term pato must be removed
  • How do we define attribute?
  • Note I prefer the term quality or property
  • attribute implies attribution
  • length_in_centimetres is an attribute
  • we can of course continue to say attribute but
    I use quality in these slides
  • most of new new pato defs are phrased as a
    property of which I like, but inconsistent with
    calling the root attribute
  • Well then, what is a quality/property?

11
What a quality is NOT
  • Qualities are not measurements
  • Instances of qualities exist independently of
    their measurements
  • Qualities can have zero or more measurements
  • These are not the names of qualities
  • percentage
  • process
  • abnormal
  • high

12
Some examples of qualities
  • The particular redness of the left eye of a
    single individual fly
  • An instance of a quality type
  • The color red
  • A quality type
  • Note the eye does not instantiate red
  • PATO represents quality types
  • PATO definitions can be used to classify quality
    instances by the types they instantiate

13
the type eye
the type red
instantiates
instantiates
the particular case of redness (of a
particular fly eye)
an instance of an eye (in a particular fly)
inheres in (is a quality of, has_bearer)
14
Qualities are dependent entities
  • Qualities require bearers
  • Bearers can be physical objects or processes
  • Example
  • A shape requires a physical object to bear it
  • If the physical object ceases to exist (e.g. it
    decomposes), then the shape ceases to exist
  • Some qualities are relational
  • they relate a bearer with other entities
  • e.g. sensitivity (to)
  • Compare with functions

15
The PATO hierarchy
  • Proposal for a new top level division
  • Proposal for granular divisions

16
Proposal 1 top level division
  • Spatial quality
  • Definition A quality which has a physical object
    as bearer
  • Examples color, shape, temperature, velocity,
    ploidy, furriness, composition, texture
  • Spatiotemporal quality
  • Definition A quality which has a process as
    bearer
  • Examples rate, periodicity, regularity, duration

17
Proposal 2 subsequent divisions
  • Based on granularity (i.e. size scale)
  • a good account of granularity is vital for
    inferences from molecular (gene) level to
    organismal (disease) level
  • How do we partition the levels?
  • Some qualities are realised at certain levels of
    granularity
  • Others can be realised across levels
  • shape, porosity
  • Sum-of-parts vs emergent

18
(No Transcript)
19
(No Transcript)
20
Granular hierarchy
  • quality
  • spatial quality
  • spatial physical and physico-chemical quality
  • mass, concentration
  • spatial biological quality
  • spatial molecular quality
  • spatial cellular quality
  • spatial organismal quality
  • spatial quality, multiple scales
  • morphology/form
  • optical quality
  • color, opacity, fluorescence

21
Advantages of dividing by granularity
  • Modular
  • strategic question
  • should we focus on biological qualities and work
    with others on morphology, physics-based
    qualities etc?
  • Good for annotation
  • easy to constrain at high level
  • e.g. organismal qualities cannot be borne by
    molecules
  • Mirrors GO and OBO Foundry divisions
  • Easier to find terms
  • to be proved, but I believe so

22
Considerations
  • Possible objection
  • The upper level of an ontology is what the user
    sees first
  • terms such as cross-granular quality may be
    perceived as undesirable and/or abstruse by some
    users
  • Counter-argument
  • Solvable using ontology views
  • aka subsets, slims

23
Relative and absolute
  • Currently PATO terms often come in 3s
  • e.g. mass, relative mass, absolute mass
  • Why do we need these?

24
PATO One or two hierarchies?
  • Currently two hierarchies
  • attribute
  • value
  • My position
  • there should be one hierarchy of qualities
  • My compromise
  • it should be possible to transform PATO
    automatically into a single hierarchy

25
CurrentPATO
attribute
value
color
colorV
hue
sat.
var.
hueV
sat.V
var.V
is_a

blackV
blueV
darkV
paleV
range
26
Proposedchange
attribute
attribute
color
color
hue
sat.
var.
hue
sat.
var.
is_a

black
blue
dark
pale
27
Arguments for a single hierarchy
  • Practical
  • elimination of redundancy
  • no clear line for deciding what should be A and
    what should be V
  • shape, bumpy vs bumpiness
  • Ontological
  • what kind of thing is a value?

Diederich 1997 quote here
28
Arguments against
  • Two hierarchies reflect cognitive and linguistic
    structures
  • e.g. the color of the rose changed from red to
    brown
  • 3 cognitive artifacts
  • we want to present data in a way that is natural
    to users
  • but this can be solved with a single collapsed
    hierarchy
  • Two are useful for cross-products
  • see later - distinguish modifiers from values
  • EAV is common database pattern
  • so?

29
Compromise transformations
  • The Two Hierarchies approach is workable if they
    can be automatically collapsed
  • Prerequisite univocity
  • Each value must be defined to mean exactly one
    thing only
  • i.e. Each value must be the range of a single
    attribute
  • Example
  • having a value fast that could be applied to
    both the spatial quality velocity and the
    process quality duration would be forbidden

30
Collapse on ranges
attribute
value
color
colorV
hue
sat.
var.
hueV
sat.V
var.V
is_a

blackV
blueV
darkV
paleV
range
31
  • Shapes and colors

32
How many types of shape are there?
  • notched, T-shaped, Y-shaped, branched,
    unbranched, antrose, retrose, curled, curved,
    wiggly, squiggly, round, flat, square, oblong,
    elliptical, ovoid, cuboid, spherical, egg-shaped,
    rod-shaped, heart-shaped,
  • How do we define them?
  • How do we compare them?
  • Is it worth the effort?

33
Shape types need precise definitions to be useful
  • Real shapes are not mathematical entities
  • but mathematical definitions can help
  • Axes of classification
  • Dimensionality
  • 2-4D (process shapes)
  • concave vs convex
  • angular vs non-angular
  • number of
  • sides
  • corners
  • Primitive and composed shapes
  • Work with morphometrics community?

34
Shape likeness
  • We can post-coordinate some shape types
  • egg-shaped
  • head-shaped
  • A2-segment-shaped
  • Dangers of circularity
  • Only for genuine likeness (e.g. homeotic
    transformation)
  • not heart-shaped leaf
  • See annotation section of this presentation

35
Color
  • Keep PATO HSV model
  • but is black a color hue?
  • We should allow overlapping partitions of color
    space
  • different domains have sub-terminologies of
    color
  • Is color relational?
  • Humans vs tetrachromatic UV-seeing animals
  • Composition
  • using has_part

36
Color hierarchy
  • Physical quality
  • Optical quality a physical quality which exists
    in virtue of the bearer interacting with visible
    electromagnetic radiation
  • Chromatic quality an optical quality which
    exists in virtue of the bearer emitting,
    transmitting or reflecting visible
    electromagnetic radiation
  • Color hue
  • Color saturation
  • Color variation
  • Color
  • Opacity an optical quality which exists in
    virtue of the bearer aborbing visible
    electromagnetic radiation
  • opaque
  • translucent
  • transparent

37
Part 2 Annotation using PATO
  • Annotation scheme desiderata
  • OBD Dataflow
  • Proposed annotation scheme

38
Annotation scheme desiderata
  • Rigour
  • There is a subset of the scheme which is simple
  • The entire scheme is expressive

39
It should have an unambiguous mapping to real
world entities
  • Even if PATO is completely unambiguous, an
    ill-defined annotation scheme may leave room for
    ambiguity
  • Example
  • Annotation
  • Eeye, Qred
  • What does this mean?
  • both eyes are red in this one fly instance
  • at least one eye is red in this one fly instance
  • a typical eye is red in this many-eyed spider
  • both eyes are red in this one fly at some point
    in time
  • both eyes are red in this one fly at all times
  • all eyes are red in all flies in this experiment
  • some eyes are red in some flies in this
    experiment

40
There should be a certain usable subset that is
simple
  • Rationale - MODs have limited resources
  • building entry tools for simple subsets is easier
  • building databases and query/search engines is
    easier
  • curating with a less expressive formalism is
    easier, faster and requires less training
  • MODs primary use case is search, for which
    expressivity is less useful
  • Specifics
  • Tools should have an (optional) simple facade
  • Simple annotations should be expressible in a
    simple syntax that is understood by users with
    relatively little training
  • There should be an exchange format and/or
    database schemas that use traditional technology
    as might be used in a MOD
  • eg XML, relational tables

41
The scheme must be highly expressive
  • Rationale
  • May be required by other NCBCs (BIRN)
  • May be required for cbio 200 gene list
  • Will be required in future
  • Specifics
  • Expressive superset will be optional
  • MODs can pick and choose their subset
  • Native exchange and storage format will be
    logic-based
  • Details outwith scope of this presentation

42
Dataflow
  • How will various kinds of phenotypic data get
    into OBD?
  • what kinds of data suppliers will use different
    formalisms?
  • 3 scenarios (more possible)

43
Example dataflow I
  • generic MOD curators annotates phenotypes using
    Phenote
  • Annotations stored directly in MODs central DB
  • MOD periodically submits to OBD
  • eg using Phenote to create pheno-xml
  • OBD converts pheno-xml to native logic-based
    formalism
  • Users can query MOD directly, or OBD
  • OBD will allow more expressive queries and have
    more data integrated

44
Example dataflow 2
  • Non-MOD generates complex annotations and stores
    them locally
  • e.g. BIRN group?
  • Periodic submissions to OBD
  • e.g. as OWL or Obo-format instance data
  • OBD converts to native logic-based formalism
  • Users can query OBD using more complex queries

45
Example dataflow 3
  • cBio MOD curates 200 genes using Phenote
  • Annotations may be stored outside normal MOD
    schema
  • schema may not be expressive enough for
    complicated phenotypes
  • TBD - up to MOD
  • Periodic submissions to OBD
  • Phenote can be used to submit pheno-xml, OWL or
    OBO
  • MOD doesnt have to worry about format
  • OBD converts to native formalism
  • Users can query OBD using relatively complex
    queries
  • Is this (should it be) different from 1?

46
MOD A
MOD B
MOD C
Non-MOD
pheno-detailed XML file
OBD
47
Proposed annotation schema
  • The schema will be described informally using a
    simple syntax
  • I use E for entity and Q for quality
  • Pretend it is EAV if you like
  • with implicit superfluous A
  • The schema has (will have) a formal
    interpretation
  • aim database exchange and removal of ambiguities
  • can be expressed using logical language
  • OBD will use an internal logic-based
    representation

48
Outline of annotation schema
  • EAV or EQ is not enough
  • Fine for (very) simple subset
  • Extensions
  • time
  • relational qualities
  • post-coordination of entity types
  • count qualities
  • measurements

49
Standard case monadic qualities
  • Examples
  • Ekidney, Qhypertrophied
  • autodef a kidney which is hypertrophied
  • We assume that there is more contextual data (not
    shown)
  • e.g. genotype, environment, number of organisms
    in study that showed phenotype
  • Interpretation (with the rest of the database
    record)
  • all fish in this experiment with a particular
    genotype had a hypertrophied kidney at some point
    in time

50
Quantification
  • long thick thoracic bristles
  • 2 statements
  • Ethoracic bristle, Qlong
  • Ethoracic bristle, Qthick
  • Default interpretation
  • A typical thoracic bristle is long and thick
  • Optional entity quantifiers
  • EQuantsome,all,most,ltpercentagegt,ltcountgt
  • Ethoracic bristle, Qlong, EQuant80
  • 80 of the thoracic bristles in this one
    individual fly

51
OBD internal representation
52
Time
  • Example
  • Ebrain,Qsmall,duringstage
  • A E which has quality that instantiates Q during
    T
  • E has the quality Q for some extent of time, and
    that extent overlaps T
  • during and other temporal relations will come
    from the OBO Relations ontology

53
Relational qualities
  • E.g. sensitivity
  • Eeye, Qsensitive, E2red light

54
Post-coordinating entity types
  • Eblood in head Qpooled
  • Problem
  • The E may not be pre-defined (pre-coordinated,
    pre-composed) in the anatomy ontology
  • We can post-compose a type representation (aka
    make a cross-product)
  • E(blood ? has_location(head))
  • The ability to post-coordinate may not be
    available in the simple-subset
  • can be expressed easily in pheno-xml, obo, owl,
    phenote(soon)
  • OBD will handle all required reasoning

55
Pre-coordinating phenotypes
  • Mammalian phenotype ontology has pre-coordinated
    phenotype terms
  • osteoporosis
  • pink fur
  • OBD will be able to translate
  • post-coordinated queries to annotations on
    pre-defined terms
  • queries on pre-defined terms to post-coordinated
    phenotypes
  • Requirement
  • computable logical definitions are added to MP

56
Count qualities
  • wingless
  • polydactyly
  • spermatocytes devoid of asters

57
Absence can never be instantiated
  • wingless
  • Ewing, Qabsent
  • autodef an instance of wing which is absent
  • Proposal restate as
  • Emesothoracic segment, Qmissing part, E2wing
  • This has other advantages
  • works better for spermatocyte devoid of asters

58
The quality of being many does not inhere in a
finger
  • Polydactyly
  • Efinger, Qsupernumerary
  • autodef a finger which is supernumerary
  • Restate as
  • Ehand, Qsupernumerary parts, E2finger
  • a hand which has more fingers as parts than is
    typical
  • With count extension
  • Ehand, Qsupernumerary parts, E2finger, Count6
  • could also say 1
  • a hand with 6 fingers, which is more than
    normal

59
Proposed PATO sub-hierarchy
part count quality
lacking parts
having normal part count
having extra parts
lacking all
lacking some
60
Mass count qualities
  • furriness
  • porosity
  • Bearers possess these qualities by virtue of the
    number and qualities of their granular parts
  • hairiness by virtue of number, width, length,
    spacing, orientation of hair-parts

61
What is the essence of hairy?
  • Attempt 1
  • Eskin,Qhairy
  • but what if we do not have hairy
    pre-coordinated in PATO?
  • Alternate representation
  • Eskin,Qexcess fine-grained parts,E2hair
  • open Q is this equivalent to, subsumed by, or
    related to representation 1?
  • Another representation
  • Ehair, Qlong
  • this is something different

62
increased brown fat cells
  • increased brown fat cells
  • Attempt 1
  • Ebrown fat cell, Qincreased
  • autodef a brown fat cell which is increased
  • Restate as
  • Eorganism, Qincreased (granular) parts,
    E2brown fat cell
  • works better for increased brown fat cells in
    upper body
  • OBD handles reasoning
  • should annotations to above be returned for
    queries of PATO term fatty?

63
Relativity
  • PATO has terms like
  • large
  • increased
  • Context is implicit
  • strain
  • species
  • genus/order
  • Extension to make explicit

64
In_comparison_to
  • Bigger than average for species/genus/etc
  • Ebrain,Qlarge,In_comparison_tolttaxon-idgt
  • default is same species as specified by genotype
  • Comparative phenotypes
  • Ebrain,Qlarge,In_comparison_toltphenotype-idgt
  • requires recording phenotype IDs
  • e.g. two experiments, same genotype, different
    environment, phenotype stronger in one

65
Ratio relative_to
  • Use cases
  • Size of brain relative to size of skull
  • Size of brain relative to size of skull in an
    individual when compared to size brain relative
    to size of skull in a typical individual of that
    species
  • Ebrain,Qlarge,relative_toskull,
    in_comparison_tolttaxon_idgt
  • defaults to whole organism

66
Modifiers
  • Ebone,Qnotched,Modmild
  • Standardised qualitative modifiers
  • Meaning dependent on E and Q
  • Can have multiple, cross-cutting scales
  • qualitative and numeric/score based

67
Modifiers modify meaning of Q
  • Influence of Mod on Q is subjective but the
    direction is objective
  • Example Eadult_human_body, duringsleep
  • Qlow,high temperature, Modmild,normal,moderate
    ,extreme

word scale
score scale
temperature
low temperature
high temperature
68
Modifiers and PATO
  • Modifiers are not qualities
  • Modifiers should not be in a true ontology
  • But we can still give these PATO IDs
  • kept separate from core PATO ontology
  • Modifiers can be relational
  • relatum may be implicit
  • e.g. abnormal_with_respct_to

69
  • Modifiers serve similar purposes as Values in
    tripartite EAV model
  • Difference
  • absent, low, high are not treated in the same way
    as genuine quality types like notched, large,
    diploid, pink
  • they are ingredients in the representation
    language, and not types in an ontology

70
  • Heterozygous flies have very short and highly
    branched arista laterals.
  • Earista lateral, EQuantall, Qshort,
    Modextreme, in_comparison_toDmel
  • Earista lateral EQuantall, Qbranched,
    Modextreme, in_comparison_toDmel

71
Measurements
  • Measurements are not qualities
  • In the schema, representations of measurements
    are attached to the representations of qualities
  • Separate measurement schema
  • dont need to discuss fine grained details here
  • some data providers will require more detail than
    others here
  • e.g. averages, error bars,

72
  • Etail, Qlength, Measurement2cm
  • Etail, Qlength, Measurement.1cm,
    in_comparison_toltindividual-idgt

73
Likeness
  • Shape likeness
  • Homeotic transformations
  • EA2 segment,Qmorphology,Similar_toA3 segment
  • Interp
  • An A2 segment with the morphological features of
    an A3 segment
  • but not heart-shaped leaves

74
Conditionals
  • Some phenotypes are only realised under certain
    conditions
  • environment
  • including chemical interactions, RNA interference
    etc
  • we should separate conditionals (this phenotype
    only seen in this envirotype with this genotype)
    from data (on this occasion this phenotype seen
    in this envirotype with this genotype)

75
Schema elements
  • Phenotype character
  • E
  • Q
  • EQuant
  • E2
  • Count
  • Mod
  • Relative_to
  • In_comparison_to
  • Similar_to
  • Measurment
  • Temporal
  • Most of these elements are optional
  • data providers pick and choose their level of
    expressivity

76
future extensions
  • boolean combinations
  • conditional statements
  • eg environment

77
modifier

.
-

--
Write a Comment
User Comments (0)
About PowerShow.com