Title: Telemakus:
1Telemakus An Information Extraction
Representation System for Rapid Review of the
Biomedical Literature
Debra Revere 12/15/2004 Brown Bag presentation
2Overview of Talk
- Introduction background
- Current Telemakus system
- Issues
- Future directions
3Project Team Participants, Current Alumni
Sherrilynne Fuller, Principal Investigator Debra
Revere, Research Coordinator Paul F. Bugni, Lead
Software Engineer Craig Benson,
Programmer David J. Owens, Programmer Gayle
Yamamoto, Information Analyst Heather L. Fuller,
Information Analyst Jerome Woody,
Programmer Lisa Tisch, Information Analyst Lucas
Reber, Systems Administrator Stephen Soderland,
NLP Consultant Yana Kadiyska, Programmer George
M. Martin, Chair, Scientific Advisory Committee
4Intro Context of the Problem
- Information explosion
- Specialization can create barriers
- Need for information retrieval tools that provide
answers rather than lists of documents
5Issue Answers not Lists
6Our Questions
Is there a format that is both conducive to rapid
review of retrieved citations also presents
an accurate representation of the research
methods findings? How do retrieved citations
relate to one another? Can research literature
be mined to identify connections not previously
noted?
7Possible Approaches to the Problem
BITOLA1
ARROWSMITH2
GeneScene3
IRIDISCENT4
1Hristovski et al. Using literature-based
discovery to identify disease candidate genes.
Int J Med Inf 2004 in press. 2Swanson et al.
Information discovery from complementary
literatures categorizing viruses as potential
weapons. JASIST 200152797-812. 3Leroy Chen.
Genescene an ontology-enhanced integration of
linguistic and co-occurrence based relations in
biomedical texts. JASIST 2005 in press. 4Wren
et al. Knowledge discovery by automated
identification and ranking of implicit
relationships. Bioinformatics 200420389-98.
8Telemakus Components
- Database elements extraction
- Research concept
- relationship extraction
- Document surrogate
- Concept maps
9Research Methods, Materials Data Extraction
10Research Concept Relationship Extraction
11Concept Identification Relationships
Figure Heading The relationship between
insulin infusion rate (IIR) and visceral fat
(VF). Points presenting the 4 ad libitum (AL),
18 AL, and 18 caloric restricted (CR)
rats. Extracted Research Concepts
Relationships insulin infusion rate visceral
fat insulin infusion rate ad libitum insulin
infusion rate caloric restriction visceral fat
ad libitum visceral fat caloric restriction
12Document Surrogate
- Based on schema theory
- we understand the world in terms of prototypical
patterns (scripts, schemas, narratives) in which
are embedded a vast array of relationships,
concepts, and vocabulary words. - Document Surrogate
- represents research environment, methods and
outcomes - capitalizes on standardization of research report
format (abstract, intro, materials methods,
results, discussion) - includes standard bibliographic info, research
design methods, research findings derived from
data tables figures - The Document Surrogate
- facilitates searching rapid review of retrieved
documents
13Document Surrogate
14Document Surrogate
15- Concept Maps
- used to show inter-relationships
between concepts extracted from a body
of domain documents
16Concept Mapping Interface
17Document Processing Database Building
- Fetcher
- Extractor
- CrossCheck
18System Architecture
19Putting It All Together
Document Surrogate Visualization
Concept Representation Concept
Relationships
20Slide Demo Search on Antioxidants
21Retrieval Set
22Document Surrogate
23Map It
24Antioxidants Concept Map
25Problem How to automate concept identification?
26Experiment Remove Specific Semantic Types
utterance('00000000.tx.1',"Effect of aging on
growth hormone-induced growth hormone receptor
and Janus-activated kinase 2 phosphorylation").
(map(-1000,ev(-1000,'C0205414','Effect','Effecti
ve',effect,qlco,1,1,1,1,0,yes,no)
)). (map(-1000,ev(-1000,'C0001811','Ageing','A
ging',ageing,orgf,tmco,1,1,1,1,0,yes,n
o) )). (map(-836,ev(-904,'C0034839','Growth
Hormone Receptor','Receptors, Somatotropin',
growth, hormone,receptor,aapp,rcpt,1,2,1,
2,0,6,6,3,3,0,yes,no), ev(-632,'C0205263'
, 'Induced','Induced',induced,ftcn,3,3,1,
1,0,no,no))). (map(-868,ev(-722,'C0169661',
'Janus kinase 2','Janus kinase 2',janus,kinase,'2
',aapp,enzy, 1,1,1,1,0,3,3,2,2,0,
4,4,3,3,0,no,no), ev(-604,'C0879526',activat
e, activate,activate,ftcn,2,2,1,1,1,no
,no), ev(-804,'C0031715','Phosphorylation','Phosph
orylation',phosphorylation,npop,5,5,1,1,
0, yes,no))).
27Results
- recall 44.81
- precision (normal MetaMap processing) 15.46
- precision (MetaMap with STs removed) 33.41
28Future Work
- Continuing refinement of MetaMap processing to
improve recall precision investigate using
other NLM tools to automate relationship analysis - Tackle Issues re
- performance of system in real world
- does system actually support
- researchers learning something not known
- before or not previously reported in the
literature? - Evaluation!
- New domain neurodegenerative diseases
29Summary
The Telemakus system is unique in combining
document surrogates with interactive concept maps
of linked relationships across groups of research
reports Telemakus formalizes representation of
the research methods results of scientific
reports, thus offering a potential strategy to
enhance the scientific discovery
process. Scalability is an issue automating
concept and relationship analysis is
essential. MetaMap shows promise as a means of
addressing the concept analysis problem other
tools need to be explored as potential means for
addressing the scalability issue.
30TRY TELEMAKUS
Telemakus http//www.telemakus.net/ Telemakus
is funded, in part, by the Ellison Medical
Foundation http//www.ellisonfoundation.org/ Te
lemakus is a component of the Ellison-funded
Science of Aging project in partnership with
AAAS and Highwire Press at Stanford
University http//sageke.sciencemag.org/
31THANK YOU!!
32