Visualization of Relational Text Information - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Visualization of Relational Text Information

Description:

Visualization of Relational Text Information. for Biomedical Knowledge Discovery. James W. Cooper. IBM T J Watson Research Center. Hawthorne, NY. Overview. Prior work ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 30
Provided by: IBMU400
Learn more at: http://vw.indiana.edu
Category:

less

Transcript and Presenter's Notes

Title: Visualization of Relational Text Information


1
Visualization of Relational Text Information
  • for Biomedical Knowledge Discovery

James W. Cooper IBM T J Watson Research
Center Hawthorne, NY
2
Overview
  • Prior work
  • Java based text mining
  • Computation of unnamed relations
  • Graphical display of relations

Text
Text
Text
Text
Text
Text
Text
Text
Text
3
Relations between terms
  • Noun phrase co-occurrence statistics Roark,
    Charniak
  • Choose seed words and look for terms near them.
    Brin Gravano, Agichtein
  • Repeat
  • Biomedical domain
  • Blaschke used dictionary of common verbs
  • Pustejovsky found inhibit relations
  • Stevens, Palakal, Mostafa
  • Detected abstract-wide co-occurrence using
    dictionary of genes and useful verbs.

4
Graphical Displays
  • Biolayout protein similarity
  • ProtInAct interactive system using yFiles
  • Zhang interactive 3D system
  • Jenssen gene network
  • Leroy GeneScene

5
BioLayout Enright and Ouzounis
Five related protein families and their
corresponding relationships.
Spheres represent proteins and lines represent
protein similarities.
6
ProInAct- Spencer and Bennett
Proteins clustered by functional interaction
7
Zhang-Protein interaction mapping
8
Jenssen A literature network
Lines connect genes that have co-occurred in 1 or
more papers.
9
Leroy GeneScene
10
What would we like to do?
  • Find scientifically meaningful connections
    between important terms.
  • Such as Swansons Reynauds disease fish oil
    connection.
  • Allow exploration of relations by user.
  • Filter the relations by ontology or term types
  • Perform path analysis
  • Let the user vary the graphical display.

11
Data we analyzed
  • Two sets of patent data
  • 584 patents on Viagra and phosphodiesterase
    inhibitors.
  • 1514 patents on quinolones (like Cipro)
  • Recognized major technical terms in each patent.
  • Filtered organic chemical nomenclature.

12
The Talent text mining system
  • Text Analysis and Language Engineering Tools
  • Finds multiword noun phrases
  • Does shallow parse
  • Can extract NPs and VGs
  • As well as all other sentence parts

13
The JTalent Library
  • Java class library with JNI interface
  • To Talent DLL
  • Creates database load files of terms
  • Paragraph
  • Sentence
  • Offset
  • Term type (NP, VG)

14
TalentShow Demo
15
The KSS Library
  • Java class library of functions for
  • Accessing a database (DB2, Access)
  • Manipulating a search engine
  • Manipulating tables of information created by
    JTalent.

16
Database Tables
  • Documents
  • Title, author, URL, ID
  • TermDocs
  • Term
  • Paragraph
  • Sentence
  • Offset
  • Type
  • Dictionary of terms, types and IDs
  • Such as MeSH

17
Computing term information
  • Compute unique terms from Termdocs
  • Compute frequency
  • Compute salience
  • Based on frequency
  • Number of docs they appear in more than once

18
Compute term relations
  • Named relations based on abbreviation expansions.
  • Unnamed relations based on proximity, with weight
    based on how frequently they occur near each
    other.
  • Mutual information weight

19
Tuning Computed relations
  • Select only terms above a salience threshold.
  • Only relations in which one or both are members
    of an ontology.
  • Store relations in a database table for rapid
    access
  • Term weight term

20
Original System
  • Visual client
  • SOAP server
  • Queries database to get relations
  • Round trip for each new query
  • Instead, we export the data for the user to
    visualize as they wish.

21
Exporting relations
  • Save relations and ontology information in xml
    file.
  • ltrelationgt
  • lttermgt
  • ltiqgt78lt/iqgt
  • ltsourcegtMeSHlt/sourcegt
  • ltrelationDocumentsgt
  • ltdocgt 34lt/doc
  • lt/termgt
  • lttermgt lt/termgt
  • lt/relationgt
  • This XML file is a portable version of the
    computed relations that we can then use with any
    number of viewers.

22
A Graphical Relations Viewer
  • Creates a Java Relations object for each relation
    it reads from the XML file.
  • Inserts them into a Trie structure based on lower
    cased first term.
  • If there is already a Relation at that point, it
    adds them to a Vector for that term.
  • Creates an alphabetical list of all terms in a
    2nd Trie.

23
Using the Viewer
  • When you enter part of a term, it shows all terms
    starting with that fragment in the left list box.
  • When you click on a term, it shows all its
    relations in the right list box.

24
Lexical Navigation
  • Displays relations between terms graphically and
    allows you to explore them without formulating a
    specific query.

25
Possible enhancements
  • Show only terms belonging to an ontology.
  • Show only higher IQ terms
  • Show the documents the relations occur in.
  • Show the ontology reference.
  • Show computed paths
  • Show more kinds of named relations.
  • Inhibits, expresses

26
Evaluations of Information Visualization
  • Few, if any, graphical displays have been
    evaluated thus far for effectiveness.
  • Usability studies are hard to construct and carry
    out.
  • Intuition seems to show
  • that exploration may result in discoveries.
  • Relations more than one step apart seem best
    displayed graphically.
  • Remains to be shown that such visualizations are
    actually useful.

27
Differences in Intent
  • Displays may represent information your system
    has discovered.
  • Gene protein relations
  • Or they may represent data from which the user
    may discover new information.
  • New 2nd or 3rd order relationships
  • These are rather different applications of
    visualization technology

28
Summary
  • Java-based text mining system
  • Database of terms and positions
  • Computation of relations
  • Export as XML
  • Graphical relations viewer
  • The value of such visual interfaces has not yet
    been established.

29
Acknowledgements
  • Bhavani Iyer XML export
  • Eric Brown DictMatcher hash code
  • Daniel Tunkelang graphical layout
  • Bob Mack paper suggestions
Write a Comment
User Comments (0)
About PowerShow.com