The Aim - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

The Aim

Description:

To allow exploration of mapping information. To discover or ... Specifying which properties are allowed, which must always be true, and which are disallowed ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 28
Provided by: pate66
Category:
Tags: aim | disallowed

less

Transcript and Presenter's Notes

Title: The Aim


1
  • The Aim
  • Integrate genomic mapping data across different
    data sources on the Internet
  • To allow exploration of mapping information
  • To discover or predict new information
  • By exploiting Conservation of Synteny
    acrossspecies boundaries.

2
  • The Problems
  • Lots of different types of mapping data
  • from different types of experiments
  • in various species
  • different locations/ computer systems
  • different representations of data
  • different terminology
  • different storage formats
  • data of variable quality
  • et cetera

3
  • (Part Of) The Solution
  • Define a language/terminology to describe and
    represent mapping data unambiguously
  • Capturing both the meaning of the data and the
    relationships represented in the data.
  • Use this language as a common representation for
    exchanging and querying data and providing
    results.
  • Perhaps use the semantics captured in this
    language to automatically discover new
    information.

4
What is an Ontology?
Increasing Formality of Ontology
5
Defining the ComparaGRID Domain Ontology.
  • Ontology a (more or less) 'formal' specification
    of a domain of knowledge
  • (Here Genomic Mapping data across all
    species)
  • What types of concepts are there (defined terms,
    things we need to talk about)
  • And how these concepts (might or necessarily)
    relate to each other
  • Can be used to control the vocabulary used for
    storing or describing data
  • Can represent Formal Logics allow 'reasoning'
    about data (Software can check the validity of
    data and deduce new information).

6
Formal Logics.What?
  • If we 'know' that
  • 1. Concept A is related to Concept B a ? ß
  • 2. Concept B is related to Concept C ß ? ?
  • Can we deduce/reason anything about possible
    relationships between Concepts A and C? a
    ??? ?

7
Formal Logics.Why?
  • For example
  • If in species A
  • Gene A is syntenic with Gene B a ? ß
  • Gene B is syntenic with Gene C ß ? ?
  • Gene C is syntenic with Gene D ? ? d
  • If we have defined Synteny to be Transitive
  • We can deduce that
  • Gene A is syntenic with Gene D a ? d

8
Formal Logics.Why?
  • More complex example
  • If in species A
  • Gene A1 is syntenic with Gene A2 a1? a2
  • Gene A2 is syntenic with Gene A3 a2? a3
  • And in a somewhat related Species B
  • Gene B2 is syntenic with Gene B3 ß2? ß3
  • Gene B3 is syntenic with Gene B4 ß3? ß4

a4
And sequence comparisons establish that A2/B2 ,
A3/B3, A4/B4 are orthologues..
We might be able to postulate that an orthologue
of A1 might be found syntenic with B2, 3 and 4
and that A4 might be syntenic with A1,2,3
9
Exploiting Conserved Synteny to predict
candidate genes
10
The ComparaGRID Ontology
  • The ComparaGRID ontology defines the terminology
    used in the domain of Comparative Genomics, and
    how this terminology can be used.
  • There are two components of the ontology
  • Classes (Concepts, or terms with
    definitions)and
  • 2. Properties (simple relationships, between a
    Class and a Value the value can be another
    Class, or a simple number etc.)

11
Example Concepts and Properties
Map is a Concept It has a definitionThe
abstract (typically linear) representation of an
informational macromolecule or chromosome etc.,
allowing the positioning of identifiable markers
along the length of the map... hasScaleUnit is
a Property In our ontology we can define which
Concepts can have particular Properties, and
which Concepts may be the values of particular
Properties. Ontology Statement Map
hasScaleUnit ScaleUnit Real Data
ltRFxWL_UppsalaChromosome1gt hasScaleUnit
ltcentiMorgangt
12
Building the Ontology
The process of Ontology Definition involves
collecting all the terms and relationships in the
knowledge domain Providing definitions for
terms Concepts Classifying Concepts into
related groups in a hierarchical tree Defining
the relationships found in the data
Properties Specifying the permitted domain and
range for these properties Specifying which
properties are allowed, which must always be
true, and which are disallowed
13
CONCEPTS
SIMPLE RELATIONSHIPS
transcribedFrom
Microsatellite
identifier
PartOf
QuantitativeTrait
hasAbbreviation
TechniqueUsed
Chromosome
COMPLEX RELATIONSHIPS
DNADuplication
Orthology
Interval Position
Mapping
Reciprocal BestMatch
GeneticLinkageMap
14
Example Modelling Maps
  • WHAT IS A MAP? information about the presence
    and ordering of Markers on an abstract
    representation of a macromolecule (DNA Molecule,
    Chromosome or even a Polypeptide).
  • Linkage Group the simplest Map
  • a collection or set of markers that are
    inherited together without implied order.
  • i.e the relationship between a Marker and a
    Linkage Group is a Containment - the Linkage
    Group contains Markers.
  • A true Map
  • has some sort of ordering of Markers belonging
    to a Linkage Group.
  • i.e. the relationship between a Marker and a Map
    is a Mapping which has a Position. This
    Position may be purely ordinal, or may be
    co-ordinate and be associated with Scale Units.
    The Map maps Markers with a Position.
  • A Map is a specialized type-of Linkage Group

15
Modelling Maps
Workshop One distinguished two types of Maps
Physical Maps Probabilistic Maps 1.
Physical Map A map of the locations of
identifiable landmarks on DNA (e.g.,
restriction-enzyme cutting sites, genes),
regardless of inheritance. At highest resolution,
distance is measured in base pairs, other units
may be used. For a given genome, the
lowest-resolution physical map might be the
banding patterns on the different chromosomes
the highest-resolution physical map of a DNA
Molecule is its complete nucleotide
sequence. e.g. Contig Map Cytogenetic Map
Breakpoint Map Deletion Map FingerprintMap
Restriction Site Map Sequence Map (Amino Acid,
DNA, RNA)
16
Modelling Maps
2. Probabilistic Map A map of the relative
locations of markers on a chromosome derived from
an experimental analysis tracking the propensity
markers to be inherited together following
natural or induced chromosomal disruption. i.e
based on some probabilistic measure of
closeness. e.g. Genetic Linkage Map Meiotic
Linkage Map Radiation Hybrid Map HAPPY Map
In addition we might represent 3. Integrated
Map A map combining mapping data from multiple
map sources and experiments
17
The Importance of Relationships
Defining concepts is easy.-) In many
respects defining concepts such as maps, genes,
positions, chromosomes etc. to represent the
species specific maps in existing datasources is
straightforward. This language defines the nuts
and bolts used to represent and exchange the data
between individual datasources. However, some
concepts are problematic even within one
datasource e.g. what is meant by a
Marker? Even more complicated are the
Relationships that we want to express between
data, in different datasources and between
different organisms. And this represents the
primary scientific challenge for ComparaGRID.
18
The Importance of Relationships
For example A pig database records the mapping
of some marker PIGA on a map at position SSC9
30.1, and associates that marker observation with
a technique PCR, and some reagents primers P1
and P2 with Sequence S1 and S2
30
31
SSC9
PIGA hasEvidence PCRDetection
hasReagent Primer1 (with sequence S1)
Primer2 (with sequence S2)
19
The Importance of Relationships
A cattle database records mapping of a
marker COWX on a map BTA4 105.3 and associates
that marker observation with a technique PCR,
and some reagents primers P1 and P2 with
Sequence S1 and S2
106
105
BTA4
COWX hasEvidence PCRDetection
hasReagent Primer1 (with sequence S1)
Primer2 (with sequence S2)
20
The Importance of Relationships
  • Pig primers P1 and P2 with Sequence S1 and S2
    (detecting Marker A). Are identical to Cow
    primers P1 and P2 with Sequence S1 and S2
    (detecting Marker X)
  • What can we say about the possible relationships
    between Marker A and Marker X?

21
The Importance of Relationships
  • What can we discover about the relationships
    between these mapping data?(And HOW can we
    discover any relationships between these data?)
  • Can we draw any inference between the use of
    identical primer sequences and a similar
    detection technique?
  • Does this imply a relationship between the
    cattle and pig markers?
  • Does it imply homology?
  • Is it evidence that they are or could be
    considered the same marker?
  • How good or reliable is any such inference?
  • How can we represent different values/qualities
    of such inferences to allow weighting of
    evidence?
  • How can we accumulate different strands of
    evidence to establish a real relationship between
    these markers and these regions of the two
    genomes?

22
ComparaGRID Ontology Classification of
Relationships
Some of the relationships that we want to capture
in our data can be represented by simple binary
properties Concept A ?Property? Concept
B hasPosition hasScaleUnit hasProduct hasEvid
ence hasPart mappedOn containedOn hasMarker has
Value hasLatinName
23
Simple relationships can be represented as
Properties
Homo sapiens
24
ComparaGRID Ontology Classification of
Relationships
Others relationships are more complicated, and
might link mutiple concepts and have properties
attached to them. These are modelled as complex
concepts so that we can represent more details
about them Mapping Synteny Orthology Paralog
y Containment Similarity TaxonomicIdentification
IsMapOf
25
More complex relationships are modelled as
Concepts
26
Mapping is a type of Relationship
You can make a map of any DomainConcept made of
a biological informational macromolecule (DNA,
RNA, Protein...)
Any Concept that can experimentally be placed on
a Map/LinkageGroup. e.g a Gene, Gene Product,
Genetic Variation, QTL, Phenotype, STS, EST, SNP,
nucleotide etc.
27
Whats the point of all this ontological
classification etc?
A structured classification makes it easier for
the human user to understand and navigate the
terminology. The meaning of terms is more
precisely captured and how the terms relate to
each other. We can see how terms used in
different datasets relate to each other. We can
integrate datasets that are described using this
common vocabulary. We can link data and make
inferences between species based on formalised
rules and conditions. Automatic classification
and reasoning about data is feasible.
Write a Comment
User Comments (0)
About PowerShow.com