Title: Knowledge Elicitation: Going LowTech
1Knowledge Elicitation Going Low-Tech
Susanna-Assunta Sansone NET Project
Coordinator EMBL EBI The European
Bioinformatics Institute Cambridge,UK
Ontogenesis Network Manchester, October
30-31st, 2006
2Knowledge Elicitation Going Low-Tech
Ontogenesis Network Manchester, October
30-31st, 2006
3Outline
- Rationale behind this work
- The Ontology for Biomedical Investigation (OBI)
- - An example of collaborative ontology building
effort - Knowledge elicitation exercises
- An ontology for the nutrition domain
- - Our method and experience
4EBI and the NET Project
- Contribute to standards initiatives, defining
minimal descriptors, XML-based exchange
formats and CVs/ontology - Technology-driven efforts, e.g.
- Microarray Gene Expression Data Society (MGED)
- Proteomics Standardization Initiative (PSI),
HUPO - Metabolomics Standards Initiative (MSI),
Metabolomics Society - Genomic Standards Consortium (GSC)
- Collaborative projects (including initiatives
above plus many others) - MIcheck, gathering and integrating minimal
description checklists - Functional Genomics (FuGE) Object Model and XML
serialization of the model - Ontology for Biomedical Investigation (OBI,
previously FuGO) - Develop standards-compliant systems to store and
exchange data - ArrayExpress (MGED), Pride (PSI), metabolomics
planned (MSI) - Collaborate with domain-specific communities
- Nutrition, Environment and Toxicology domains
(NET Project)
5OBI Overview
6OBI Communities
- Current technology-driven communities
- Omics technologies Standards initiatives
- HUPO - Proteomics Standards Initiative (PSI)
- Microarray Gene Expression Data (MGED) Society
- Metabolomics Society Metabolomics Standards
Initiative (MSI) - Other technologies Groups around databases and
networks - Flow cytometry
- Polymorphism
- In situ hybridization and immunohistochemistry
- Current biology-driven communities
- NERC Environmental Bioinformatics Center (NEBC)
- Generation Challenge Programme (GCP)
- Biomedical Informatics Research Network (BIRN)
- Immunology Database and Analysis Portal
- National Center for Toxicogenomics, NIEHS
- Nutrigenomics Organization (NuGO)
7OBI Organization
- Coordination Committee
- Representatives of technological and biological
communities - - Monthly conferences calls (papers,
presentations, new groups etc) - Developers Working Group
- Representatives and members of these communities
- - Weekly conferences calls (hands-on the
ontology, tackle issues etc) - Advisory Board
- Advise on high level design and best practices
- Provide links to other key efforts
- - Barry Smith, IFOMIS (NcBIO and OBO Foundry)
- - Frank Hartel, NIH-NCI
- - Mark Musen, Stanford (Protégé Team and NcBIO)
- - Robert Stevens, Manchester Un
- - Steve Oliver, Manchester Un
- - Suzi Lewis, Berkeley Un (GO and NcBIO)
- Documentation and dissemination
- http//fugo.sf.net
8Nutrigenomics Community - NuGO
- Nutrigenomics is the study of the response of a
genome to nutrients - Using transcriptomics, proteomics and
metabolomics technologies - In combination with biometrics and clinical
tests - Epidemiological studies, intervention studies,
gut microflora etc - The European Nutrigenomics Organization (NuGO)
includes 22 partners and organisations from 10
European countries - Aiming to develop and integrate all facets of
resources - An ontology would be one of these resources
- Providing semantics for those descriptors
relevant to the interpretation and of
nutrigenomics experiments and analysis of the
data - Upper-level framework provides semantics for
those descriptors common to other domains - gt Collaboration with OBI
- - Lower-level framework provides semantics for
those descriptors specific to the nutritional
domain
9Our Methodology
10Scope and Scenario
- Identify the scope
- What is the ontology going to be used for?
- What do you want the ontology to be aware of?
- What is the scope of the knowledge you want in
the ontology? - Define competency questions
- Are those questions for which we want the
ontology to be able to provide support for
reasoning and inferring processes - Illustrate possible scenarios
- Which investigations have samples treated with a
high-fat diet? - Which investigations employ microarray in
combination with metabolomics technologies? - List those investigations in which the fasting
phase has as duration one day
11Knowledge Elicitation Phase
- The art of questioning
- Phase where the knowledge engineer gathers in
the form of concepts and relationships between
concepts what the domain experts understand to
exist in that domain - Challenges
- Expertise is socially distributed
- - Experts have different but complementary
understanding of the domain, playing different
roles in the organizations - gt Project managers have the big picture of the
experiment - gt Experimentalists have hands-on experience with
one or many technologies - Expertise is geographically distributed
- Domain experts are located in 10 European
countries - Domain experts express their knowledge in
natural language.
12From Natural Language to Concepts
Seven week old C57BL/6N mice were treated with
low-fat diet. Liver was dissected out, RNA
preparedetc.
13Knowledge Elicitation Methods
- One to one or one to many interaction
- Narrative approach
- Interviews
- Discussion and focus groups
- Survey forms
- Emails
- Diagrams
- Conceptual Maps (CMs)
14Conceptual Maps (CMs) Informal Artefacts
15CMs Limitations and Use
- Not computationally enabled
- Greater utility than other forms of knowledge
representation such as spreadsheets or word
processor tables - Quite intuitive and easy to use
- Have a (very) simple semantics
- Domain experts are able to
- - Represent concepts
- - Add definitions and examples
- - Declare relations
- Useful to test the representations in
decentralised settings - CMAP-tools version 3.8 as a CM editor
- Freely available
- http//cmap.ihmc.us
16CMs Knowledge Elicitation Sessions
17CMs Knowledge Elicitation Sessions
- Initial session(s)
- Move from the narration to a list of concepts
- Define some starting concepts
- Agree on what went where
- Define an early structure of the relationships
that glue the information together - Following sessions
- Focus on structural aspects of specific concepts
- Guide the discussion by asking
- How does A relate to B?
- Why do we need A here instead of B?
- How does A impact on B?
- Discuss cardinality issues
18CMs Knowledge Elicitation Sessions
- Iterative process
- Moving from instances to classes
- Create different levels of abstractions
- Identify is_a and defining the whole/part-of
relationship
E.g. Define the content of a meal
19CMs Knowledge Elicitation Sessions
- Caveats
- Accuracy in the definition of terms
- - Create consistent label for the terms
- - Add the context where the term is used
- - Provide examples
- Coherence of the story produced
- Ensure consistency within the narration, as CMs
are being enriched as iterative process - Extensibility of the representation
- Add more details to the existing CMs
- Generate new CMs
- Group concepts into higher-level abstractions
- gt Validate these with domain experts
- gt Analyse the models from different angles or
perspectives
20The experts need to perceive significant
benefit before they can provide information to
others Dave Randall
21Explain the Benefits
22Engage with the Experts Build the Trust
23Resources and Acknowledgements
- Nutrigenomics domain experts
- Ruan Elliot (Institute of Food and Research,
IFR) - Anne-Marie Minihane (Food Science, Reading
University) - Their postdocs and technicians
- Other (OBI) experts
- Jennifer Fostel (NIH-NIEHS)
- Norman Morrison (NERC Bioinformatics Center,
NEBC) - Funds
- NuGO (EU NoE 503630), and Semantic Mining (EU
NoE 507505)