Title: Visualization of Relational Text Information
1Visualization of Relational Text Information
- for Biomedical Knowledge Discovery
James W. Cooper IBM T J Watson Research
Center Hawthorne, NY
2Overview
- Prior work
- Java based text mining
- Computation of unnamed relations
- Graphical display of relations
Text
Text
Text
Text
Text
Text
Text
Text
Text
3Relations between terms
- Noun phrase co-occurrence statistics Roark,
Charniak - Choose seed words and look for terms near them.
Brin Gravano, Agichtein - Repeat
- Biomedical domain
- Blaschke used dictionary of common verbs
- Pustejovsky found inhibit relations
- Stevens, Palakal, Mostafa
- Detected abstract-wide co-occurrence using
dictionary of genes and useful verbs.
4Graphical Displays
- Biolayout protein similarity
- ProtInAct interactive system using yFiles
- Zhang interactive 3D system
- Jenssen gene network
- Leroy GeneScene
5BioLayout Enright and Ouzounis
Five related protein families and their
corresponding relationships.
Spheres represent proteins and lines represent
protein similarities.
6ProInAct- Spencer and Bennett
Proteins clustered by functional interaction
7Zhang-Protein interaction mapping
8Jenssen A literature network
Lines connect genes that have co-occurred in 1 or
more papers.
9Leroy GeneScene
10What would we like to do?
- Find scientifically meaningful connections
between important terms. - Such as Swansons Reynauds disease fish oil
connection. - Allow exploration of relations by user.
- Filter the relations by ontology or term types
- Perform path analysis
- Let the user vary the graphical display.
11Data we analyzed
- Two sets of patent data
- 584 patents on Viagra and phosphodiesterase
inhibitors. - 1514 patents on quinolones (like Cipro)
- Recognized major technical terms in each patent.
- Filtered organic chemical nomenclature.
12The Talent text mining system
- Text Analysis and Language Engineering Tools
- Finds multiword noun phrases
- Does shallow parse
- Can extract NPs and VGs
- As well as all other sentence parts
13The JTalent Library
- Java class library with JNI interface
- To Talent DLL
- Creates database load files of terms
- Paragraph
- Sentence
- Offset
- Term type (NP, VG)
14TalentShow Demo
15The KSS Library
- Java class library of functions for
- Accessing a database (DB2, Access)
- Manipulating a search engine
- Manipulating tables of information created by
JTalent.
16Database Tables
- Documents
- Title, author, URL, ID
- TermDocs
- Term
- Paragraph
- Sentence
- Offset
- Type
- Dictionary of terms, types and IDs
- Such as MeSH
17Computing term information
- Compute unique terms from Termdocs
- Compute frequency
- Compute salience
- Based on frequency
- Number of docs they appear in more than once
18Compute term relations
- Named relations based on abbreviation expansions.
- Unnamed relations based on proximity, with weight
based on how frequently they occur near each
other. - Mutual information weight
19Tuning Computed relations
- Select only terms above a salience threshold.
- Only relations in which one or both are members
of an ontology. - Store relations in a database table for rapid
access - Term weight term
20Original System
- Visual client
- SOAP server
- Queries database to get relations
- Round trip for each new query
- Instead, we export the data for the user to
visualize as they wish.
21Exporting relations
- Save relations and ontology information in xml
file. - ltrelationgt
- lttermgt
- ltiqgt78lt/iqgt
- ltsourcegtMeSHlt/sourcegt
- ltrelationDocumentsgt
- ltdocgt 34lt/doc
- lt/termgt
- lttermgt lt/termgt
- lt/relationgt
- This XML file is a portable version of the
computed relations that we can then use with any
number of viewers.
22A Graphical Relations Viewer
- Creates a Java Relations object for each relation
it reads from the XML file. - Inserts them into a Trie structure based on lower
cased first term. - If there is already a Relation at that point, it
adds them to a Vector for that term. - Creates an alphabetical list of all terms in a
2nd Trie.
23Using the Viewer
- When you enter part of a term, it shows all terms
starting with that fragment in the left list box. - When you click on a term, it shows all its
relations in the right list box.
24Lexical Navigation
- Displays relations between terms graphically and
allows you to explore them without formulating a
specific query.
25Possible enhancements
- Show only terms belonging to an ontology.
- Show only higher IQ terms
- Show the documents the relations occur in.
- Show the ontology reference.
- Show computed paths
- Show more kinds of named relations.
- Inhibits, expresses
26Evaluations of Information Visualization
- Few, if any, graphical displays have been
evaluated thus far for effectiveness. - Usability studies are hard to construct and carry
out. - Intuition seems to show
- that exploration may result in discoveries.
- Relations more than one step apart seem best
displayed graphically. - Remains to be shown that such visualizations are
actually useful.
27Differences in Intent
- Displays may represent information your system
has discovered. - Gene protein relations
- Or they may represent data from which the user
may discover new information. - New 2nd or 3rd order relationships
- These are rather different applications of
visualization technology
28Summary
- Java-based text mining system
- Database of terms and positions
- Computation of relations
- Export as XML
- Graphical relations viewer
- The value of such visual interfaces has not yet
been established.
29Acknowledgements
- Bhavani Iyer XML export
- Eric Brown DictMatcher hash code
- Daniel Tunkelang graphical layout
- Bob Mack paper suggestions