Text Mining: Challenges and Opportunities - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Text Mining: Challenges and Opportunities

Description:

University of Iowa Faculty Scholar Award. NSF ITR grant 2003-2006. Students: Aditya Kumar Sehgal, Xin Ying Qiu, ... 2003 Aug: Am J Clinical Oncology: Merck ' ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 42
Provided by: padm1
Category:

less

Transcript and Presenter's Notes

Title: Text Mining: Challenges and Opportunities


1
Text Mining Challenges and Opportunities
  • Padmini Srinivasan
  • School of Library Information Science
  • Department of Management Sciences
  • The University of Iowa, Iowa City, IA
  • http//mingo.info-science.uiowa.edu/padmini
  • padmini-srinivasan_at_uiowa.edu

2
Acknowledgements University of Iowa Faculty
Scholar Award NSF ITR grant 2003-2006 Students
Aditya Kumar Sehgal, Xin Ying Qiu, Micah
Wedemeyer, Li Zhou. Colleagues Bisharah Libbus,
Olivier Bodenreider, David Eichmann, Marc Light
3
Outline
1. What is text mining? 2. Our profile based
approach 3. Studies conducted Turmeric 4. The
search for a general model of text mining
4
Text Mining contribute to knowledge discovery
hypothesis generation hypothesis
exploration Mine a collection of texts for
novel and interesting ideas associations Pr
opositions/hypotheses need follow up
verification
5
Consider a local example
Weinstock, gastroenterologist (U.
Iowa) drinking a concoction containing
thousands of pig whipworm eggs could protect
people against bowel disease - inflammatory
bowel disease (IBD) IBD is rare in countries
where parasitic infections are more
common BioCure (German) trials with the drink
make serendipity more likely!!
6
Interesting links
Finding a plausible hypothesis about a
new treatment for a disease Finding possible new
products in which particular components may be
used. Looking for connections between
people between genes and diseases between drugs
and cellular functions between products and
consumers between politicians and campaign
contributors
all from semi-structured texts
7
Single link paths through texts
Person A
(leads)
Political organization B
(member of)
Person C
(member of)
Political organization D
(member of)
Person E
(funds)
Organization F
Person A --?-- Organization F
8
Finding these links in texts?
Person A
(co-occurs with)
0.8
Political Organization B
(co-occurs with)
0.5
Person C
(co-occurs with)
0.9
Political Organization D
0.4
(co-occurs with)
0.4
Person E
(co-occurs with)
Organization F
9
In fact co-occurrence is often exploited
Jenssen et al. PubGene Nature Genetics 01
(Screenshot from PubGene web site)
10
In fact co-occurrence is often exploited
Jenssen et al. PubGene Nature Genetics 01
Does the pair co-occur more than expected by
chance alone? Is the co-occurrence at the
sentence level? Is sentence in a key location?
(Screenshot from PubGene web site)
11
Extract the particular relationships
Person A
Person A
(co-occur)
(leads)
Organization B
Organization B
0.8
0.8
A, the president of B As president, A has the
choicest parking spot in B The CEO A of B has to
.. A set the vision for B in During his term as
CEO A made B into a success One of As
responsibilities as president is to see that B
.. etc.
12
F gave a check for 10.000 pounds to E.. E has
been paid several times by the charitable group,
F F continues to financially support Es
activities. etc.
0.4
Person E
(funds)
Organization F
13
Extraction is a precursor to text mining
Person A
(co-occurs with)
Organization B
0.8
(co-occurs with)
0.5
Person C
(co-occurs with)
0.9
Organization D
0.4
(co-occurs with)
0.4
Person E
(co-occurs with)
Organization F
14
Instead of single link paths
You extract the following set of links in the
text collection.
Location 1 Location 2 Location 3 Location
4 Location 5 Location 6


Person A
Person B
(present at the same time)
Person A --?-- Person B
15
location1 location2 location3 location4 opinion1 o
pinion2 opinion3 organization1 organization2 affil
iate1 affiliate2 affiliate3


Person A
Person B
Know about person A - looking for similar
people Is this a clustering problem? Should we
start clustering all people? But then which
objects are we going to be interested in next?
16
Interested in dishes similar to rice pudding
?
rice pudding
17
Interested in recipes similar to rice pudding
sutlac
arroz con leche
0.8
0.9
rice milk sugar almonds pistachio cinnamon vanilla
0.7
ris a lamande
rice pudding
0.6
payasam
0.5
0.8
kheer
0.6
riz bhaleeb
grod
18
Flexible solution needed and in fact.
19
Different objects
prior experience address degrees projects expe
cted salary hobbies


Person A
Company B
Start with person A, looking for appropriate
companies or the other way
20
molecular function cellular function pathologic
function genomic function physiologic
function tissue function


Drug A
Disease B
Have some intuition about a particular drug and a
disease start from both directions.
21
Generalizing this a bit we can say in our
approach we.
22
Build profiles
molecular function cellular function pathologic
function genomic function physiologic
function tissue function


Drug A
Disease B
Build a profile for drug A and a profile for
disease B then compare to see to what extent
they overlap in dimensions that are interesting
23
General approach Topics and Profiles
Topic eg. a gene, a disease, a company or
product. represented by an appropriate subset
of documents from the text collection
(MEDLINE) Profile Set of terms/concepts
characterizing the topic weights to represent
their relative importance. Text mining involves
comparing and connecting profiles.
24
Open Discovery
C1
B1 B2 B3 B4 B5 . . Bn
C2
A
. . . .
Cm
Drug
Disease
25
Closed Discovery
B1 B2 B3 B4 B5 . . Bn
A
C
Drug
Disease
26
Larger groups of topics
A2
A2
common Properties?
A1
A2
A2
A2
Genes from a microarray experiment or a set
of diseases or a group of drugs
27
Manjal prototype text mining tool
http//sulu.info-science.uiowa.edu/Manjal.html
w/ Sehgal, Qiu
28
(No Transcript)
29
Open Discovery
Closed Discovery
Larger Group of Topics
30
(No Transcript)
31
(No Transcript)
32
Current work with Topic Profiles
1. Building profiles for genes (humans,..)
Gene name ambiguity BAD, CART, I, .. w/
Sehgal
2. Using different kinds of profiles for analysis
of microarray data w/Qiu, Bodenreider,
Zheng
3. Exploring differences between the discourses
of patients, journalists and researchers
(journals and clinical trials) on
diseases/health problems Autism w/Zhou
33
4. Extracting Speculative statements.
Build profiles for topics
Use these to go back into the documents
Pull out the speculative statements
W/Qiu and Light
34
Profile for VIOXX (Rofecoxib)
Cornea 2004 This case report suggests that
oral rofecoxib may trigger Stevens-Johnson
syndrome, potentially causing symblepharons,
corneal neovascularization and cicatricial
ectropions. Drug Safety 2004 Do some
inhibitors of COX-2 increase the risk of
thromboembolic events? 2003 Aug Am J Clinical
Oncology Merck Rofecoxib (Vioxx) is used
clinically for osteoarthritis and pain, and in
addition the results described here suggest
that Vioxx may be useful as a chemopreventive in
humans at risk for colorectal neoplasia.
35
Open Discovery Exploring Turmeric (Curcumin
Longa) 2004 ISMB Bioinformatics
paper w/Libbus (NLM)
Widely used spice in Asia for hundreds of
years Used for treating burns, ulcers, various
skin diseases etc. Can we use open discovery to
suggest novel uses for turmeric?
36
Open Discovery
Curcumin
(genes, enzymes, proteins)
Diseases?
37
Open Discovery
Curcumin
(genes, enzymes, proteins)
Diseases?
Retinal diseases, Chron disease, Problems related
to the spinal cord
38
Retinal Diseases
TNF-alpha IL1-beta COX-2 JNK ERK
MAPK NFkappaB
Curcumin
TNF-alpha elevated in early stages of diabetic
retinopathy Activation of TNF-alpha may lead to
glaucoma. Anti-TNF-alpha treatment reduced
leukocyte adhesion to eye blood vessels and
vascular leakage (problems in retinopathy) TNF-alp
ha activation followed by NF-KappaB
transcription (suppressed by curcumin)
39
General Model Where does one start? Finding
Haystacks with Needles?
Take a substance Identify key
functions/mechanisms Identify other diseases
in which the mechanism is significant Challenge
Properties of a good text mining problem?
40
Challenge!
Weinstock, gastroenterologist (U.
Iowa) drinking a concoction containing
thousands of pig whipworm eggs could protect
people against bowel disease - inflammatory
bowel disease (IBD) IBD is rare in developing
countries where parasitic infections are more
common BioCure (German) trials with the drink
41
Summary text mining approach topic
profiles highlights of our research with
profiles current challenge finding a model
for interesting text mining problems
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com