Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies - PowerPoint PPT Presentation

About This Presentation
Title:

Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies

Description:

Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 28
Provided by: csManAcU
Category:

less

Transcript and Presenter's Notes

Title: Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies


1
Linking Diseases and Genes through Informatics
Knowledge Bases and Ontologies
  • Joyce A. Mitchell, Ph.D.
  • National Library of Medicine
  • University of Missouri

2
Research Collaborators
  • Olivier Bodenreider, M.D., Ph.D.
  • Alexa T. McCray, Ph.D.
  • Allen C. Browne

3
Research Goals
  • Investigating methods of connecting the disease
    and genomic information.
  • Overall goals are to
  • Overcome difficulties traversing multiple
    information resources
  • Examine coverage of Unified Medical Language
    System (UMLS), Gene OntologyTM (GO),
    LocusLink-OMIM
  • Develop methods to use ontologies more
    effectively
  • Present data in understandable manner

4
Background UMLS
  • NLM developed, maintains
  • Purpose facilitate retrieval integration of
    information from multiple biomedical sources
  • Interrelates 60 biomedical terminologies
  • MeSH, SNOMED, Read Codes, ICD, etc
  • No vocabulary focused on molecular biology
  • 1.5 million English terms 800,000 concepts
  • 134 Semantic Types 54 Semantic Relationships

5
Background Gene Ontology
  • GO Consortium developed, maintains
  • Purpose
  • promoting cross-species methodologies for
    functional comparisions
  • Allows annotation of molecular information on
    genes, gene products
  • an essential start to creating a shared language
    of biology
  • Focused on
  • molecular function (5626 terms)
  • biological processes (4677 terms)
  • cellular components (1077 terms)
  • Two semantic relations (is-a and part-of)
  • Genome Research 2001 111425-33.

6
Background - LocusLink
  • Curated, gene-centered resource of National
    Center for Biotechnology Information (NLM)
  • Gene names, gene product names, gene product
    functions, and reference sequences (DNA, RNA,
    protein)
  • Associates phenotype (diseases) to the genotype
    via Online Mendelian Inheritance in Man (OMIM)
  • Online links to major bioinformatics knowledge
    bases and the literature

7
Specific Questions
  • This study looked at coverage in UMLS of
  • 1244 genes associated with human diseases
  • 1702 diseases associated with the genes
  • 11,380 Gene Ontology terms
  • 38,832 genes/gene products in GO database
    (141,071 names)
  • Associations of genes and their functions in UMLS
  • Representation of gene function in GO compared to
    the UMLS

8
Methods
  • LocusLink query
  • human genes whose sequence is known and
    associated with disease (1244 loci)
  • LocusLink data
  • Genes/gene products (official names, synonyms,
    symbols)
  • Phenotypes (diseases) (1702 diseases)
  • GO data
  • all concepts (ontology terms), excluding obsolete
    terms (11,380 terms)
  • Gene products from all species (134,646 unique
    names, 38,832 genes)

9
Methods
  • LocusLink and GO terms mapped to UMLS concepts
  • normalization used
  • mappings constrained by semantic type
  • LocusLink loci studied for relationships in UMLS
  • Gene/GP phenotype
  • Gene/GP molecular function
  • Gene/GP biological process
  • Gene/GP cellular component
  • For specific genes compared annotations in GO to
    representation in UMLS

10
Results - 1
  • For 1244 genes from LocusLink
  • 18 found in the UMLS

11
Results - 2
  • For 1702 phenotypes (diseases) corresponding to
    1244 genes
  • 34 found in the UMLS (575/1244)
  • Most frequent single gene diseases covered
  • Huntington Disease
  • Cystic Fibrosis
  • Marfan Syndrome
  • Phenylketonuria
  • Achondroplasia

12
Results - 3
  • GO terms found in MeSH 2764 terms
  • GO terms found in SNOMED 1366 terms
  • GO terms found overall 27 3062/11,380

13
Results - 4
  • For 134,646 unique gene names in GO database

14
Results - 5
  • LocusLink UMLS Relationship Categories found
    overall 72

15
Results - 5
  • Type of Relationship
  • Associative 613
  • Co-occurrence 3353
  • Hierarchical 1168

16
Results - 6
  • Representation of gene function in GO compared to
    the UMLS

17
Neurofibromin 2 merlin in GO
18
(No Transcript)
19
(No Transcript)
20
Discussion
21
Best Worst Mappings
  • Best mapping categories
  • Molecular function (GO) 44
  • Cellular component (GO) 35
  • Phenotype (LL) 34
  • Worst mapping categories
  • Gene synonym (GO) 6
  • Biological process (GO) 5
  • Gene symbol (GO) 2

22
Only 34 of diseases?
  • In OMIM-LL, diseases are subdivided by genetic
    causes but not in UMLS
  • E.g. Limb Girdle Muscular Dystrophy
  • LGMD is represented in UMLS
  • A SNOMED term
  • in MeSH it is an entry term for muscular
    dystrophies
  • MeSH notes for MD A general term for a group of
    inherited disorders which are characterized by
    progressive degeneration of skeletal muscles (ed,
    2000)

23
Limb Girdle Muscular Dystrophy genetic types
24
Only 5 of Biological Processes?
  • Only 256 of the biological processes mapped to
    terms in UMLS.
  • In GO, processes are elaborated organism
    specific
  • Example UMLS - Mitotic spindle
  • GO
  • Mitotic spindle assembly
  • Mitotic spindle assembly (sensu Saccharomyces)
  • Mitotic spindle assembly (sensu Fungi)
  • Mitotic spindle checkpoint
  • Mitotic spindle elongation
  • Mitotic spindle orientation
  • Mitotic spindle positioning
  • Mitotic spindle positioning and orientation

25
Why so few gene names and synonyms mapped?
  • Official gene names have metadata and comments.
  • dystrophin (muscular dystrophy, Duchenne and
    Becker types), includes DXS143, DXS164, DXS206,
    DXS230, DXS239, DXS 268, DXS269, DXS270 DXS272
  • No single source has all names and synonyms
  • GO synonym field contains IPI number for well
    known genes, does not match UMLS (useful cross
    reference but not a synonym)
  • Symbols are short acronyms and match poorly

26
Summary 1
  • UMLS needs improvement in molecular biology
    domain but has considerable content
  • 27 of GO concepts map
  • 34 of single gene diseases
  • Existing UMLS terms come primarily from MeSH and
    SNOMED
  • Overall, positive mapping for 13,000 terms

27
Summary continued
  • If the terms are in UMLS, it is possible to find
    a relationship between genes and phenotypes and
    gene function much of the time.
  • UMLS does better with the human genes (20) than
    with genes from all organisms (11)
  • UMLS and GO representations complement each
    other.
Write a Comment
User Comments (0)
About PowerShow.com