A text-mining analysis of the human phenome - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

A text-mining analysis of the human phenome

Description:

1Centre for Molecular and Biomolecular Informatics, Radboud University ... The Online Mendelian Inheritance in Man (OMIM) database contains human disease ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 19
Provided by: nlgCsie
Category:

less

Transcript and Presenter's Notes

Title: A text-mining analysis of the human phenome


1
A text-mining analysis of the human phenome
European Journal of Human Genetics (2006) 14,
535-542
  • Marc A van Driel1, Jorn Bruggeman2, Gert Vriend1,
    Han G Brunner,3 and Jack AM Leunissen2

1Centre for Molecular and Biomolecular
Informatics, Radboud University Nijmegenthe
Netherlands 2Department of Bioinformatics,
Wageningen University and Research Centre
3Department of Human Genetics, University Medical
Centre Nijmegen
Speaker Yu-Ching Fang Advisors Hsueh-Fen Juan
and Hsin-His Chen
2
Outline
  • Introduction
  • Methods
  • Results
  • Discussion

3
Introduction
  • Functional annotation of genes is an important
    challenge once the sequence of a genome has been
    completed.
  • Previous studies have correlated various
    attributes of human genes with the chance of
    causing a disease.

4
Introduction (cont.)
  • But, few attempts have been made to
    systematically classify relationships between
    genes and proteins at the phenotype level.

5
Introduction (cont.)
  • The Online Mendelian Inheritance in Man (OMIM)
    database contains human disease phenotype data
    and record-based textual information, one gene or
    one genetic disorder per record.
  • Goal Systematic grouping of genes by their
    associated phenotypes from the OMIM database.

6
Methods The OMIM database
  • Full text (TX) field 5132 (disease)/16357

7
Methods The OMIM database (cont.)
  • Clinical synopsis (CS) field

8
Creation of feature vectors
  • MeSH terms and their components are concepts.
  • MeSH concepts serve as phenotype features
    characterizing OMIM records.
  • Ex OMIM_1-gtMeSH_1,MeSH_2,

9
Refinement of the feature vectors
  • MeSH concepts can be very broad like Eye or
    more specific like Retina.
  • A concepts hierarchy that describes relationships
    such as Eye-Retina-Photoreceptors.
  • Retina is a hyponym of Eye.

10
Refinement of the feature vectors (cont.)
  • To ensure that the concepts eye and retina are
    recognized as similar, the MeSH hierarchy was
    used to encode this similarity in the feature
    vectors by increasing the value of all hypernyms.

rc relevance of concept c rc,counted count of
the concept c in a document rhypos relevance of
the concept cs hyponym nhypo,c the number of
the concept cs hyponyms
11
Refinement of the feature vectors (cont.)
  • Example of concept expansion using the MeSH
    hierarchical structure.

12
Refinement of the feature vectors (cont.)
  • Not all concepts in the OMIM records are equally
    informative.
  • Ex retina pigment epithelium occurs rarely,
    and thus provides more specific information than
    very frequently terms such as Brain.
  • Inverse document frequency measure

gwc inverse document frequency or global weight
of concept c N 5080 nc the number of records
that contain concept c
13
Refinement of the feature vectors (cont.)
  • Not all OMIM records contain equally extensive
    descriptions (record size differences).
  • These differences will make a comparison between
    records difficult because the diversity and the
    frequency of concepts in the larger records will
    exceed those in the smaller records.

rc relevance of concept c rmf the frequency of
the most occurring MeSH concept in that record
14
Comparing OMIM records
  • The similarity between OMIM records can be
    quantified by comparing the feature vectors that
    are expanded and corrected.
  • Similarities between feature vectors were
    determined by the cosines of their angles.

s(X,Y) the similarity between the feature
vectors X and Y xi, yi concept frequencies
15
Results Comparing OMIM records
  • 5080/5132 OMIM records could match one or more
    MeSH terms.
  • The 5080x5080 pair-wise feature vector
    similarities form phenomap (All to all
    similarities).

Most phenotype-phenotype pairs have a low
similarity score.
16
Comparing OMIM records - The best scores for all
phenotypes in the disease phenotype data set
  • For each OMIM record, the most similar of the
    other 5079 records was identified.
  • Moderately similar phenotype pairs might still
    yield reasonable hypotheses.

Ex Fibromuscular Dysplasia of Arteries and
Cardiomyopathy, Familial Hypertrophic have 0.31
similarity score
17
Comparing OMIM records (cont.)
  • Conclusion The more phenotypes resemble each
    other, the more likely they are to share an
    interaction.

18
Discussion
  • Developed a text-mining approach to map
    relationships between more than 5000 human
    genetic disease phenotypes from the OMIM
    database.
  • Phenotype clustering reflects the modular nature
    of human disease genetics. Thus, the phenomap may
    be used to predict candidate genes for diseases.
Write a Comment
User Comments (0)
About PowerShow.com