Which Kinds of Trend Metrics Are More Effective for Emerging Trend Detection - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Which Kinds of Trend Metrics Are More Effective for Emerging Trend Detection

Description:

Monitoring research trends has always been a concern of policy makers ... They have been also termed as hot topics, upward trends, or emerging trends. ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 29

Provided by: liasNc

Category:

more less

Transcript and Presenter's Notes

Title: Which Kinds of Trend Metrics Are More Effective for Emerging Trend Detection

1
Which Kinds of Trend Metrics Are More Effective
for Emerging Trend Detection?

Yuen-Hsien Tseng
National Taiwan Normal UniversityYu-I Lin
Taipei Municipal Univ. of Education

Chun-Hsien Kuo Yi-Yang Lee Science Technology
Policy Research and Information Center
Taipei, Taiwan, R.O.C. 106
This presentation is based on the work to appear
in Scientometrics.
2
Introduction ETD

Monitoring research trends has always been a
concern of policy makers
it helps resource allocation and technology
forecast.
Increasingly important research topics are of
particular interest to those policy makers
They have been also termed as hot topics, upward
trends, or emerging trends.
ETD (Emerging Trend Detection)

3
But how to detect them effectively?

Domain experts are often consulted
good at identifying interesting research trends
But their observations do not generalize
effectively to the fields beyond their expertise
when a large number of research topics need to be
prioritized, inconsistent decision may result
Automatic mechanism for monitoring research
trends in a large stream of upcoming publications
would be of great help

4
Detecting trends in scientometrics

Noyons and van Raan pointed out that
Domain experts are often hard to find, due to
busy schedules and lack of affinity with
scientometrics studies
Policy makers are often too much overwhelmed by
the amount of resulting information

5
Motivations

In past trend analysis,
different year spans may be used to create the
time sequence
different indices were chosen for trend
observation
Simple count of publications is suspicious to get
good trend sequences Chi et al, 2006
The effectiveness of these choices
was unknown quantitatively and comparatively
This work provides clues to better interpret the
results when a certain choice was made

6
Questions ?

For effective trend detection, which options
should be used?

Different year spans!
Data are from Smeaton et al 2003 ACM SIGIR Forum
Simple count to create a sequence suspicious for
ETD
Different trend orderings due to different
criteria!
7
Simple Trend vs EigenTrend
Chi, Tseng, Tatemura, 2006, CIKM challenged
the validity of the simple accumulation of
published documents over time

Simple authority
Simple trend

(First) Authority U1
(First) Eigen-Trend s11V1
Error

Break down by sources
DUSVT
8
Outline of the following talk

ETD methodology
Trend metrics to be compared
Evaluation method
Data sets for evaluating ETD
Safety agriculture (SA)
Information retrieval (IR)
Evaluation results
Conclusions and implications

9
ETD methodology

Documents (terms) were clustered to yield topics
For each topic, a time series of number of
publications over time was created
Topics were then ranked by a trend metric
an IR-based metaphor
Input
a set of publications (each with PY, TI, AU, C1,
SO, )
Output
a ranked list of topics in decreasing order of
interest

10
Trend metrics to be compared (1/2)

api (average percentage of increase)
used in a foresight survey in Japan (STFC, 2004)
used by Noyons et al when n2
slp slope of the linear regression line that
best fits the data in the time series
slpz same as slp, but the sequence is first
z-score transformed (zi(di-avg)/stderr )

11
Trend metrics to be compared (2/2)

slppi a combination of api and slp.
d1, d2, , dngtpi1, pi2, , pin-1,
pii(di1-di)/di
may be ideal for sharp increasing trend detection
slpc eigen-trend break down by C1
C1 first authors country
slpj eigen-trend break down by SO (journal)

12
Evaluation method NAP Pre_at_R

Assume A-E and V-Z are ten items to be ordered
and A-E are relevant while V-Z are not.
Ordering S1 is the best by
NAP Non-interpolated Average Precision rate
Pre_at_R Precision rate at Recall position
Pre_at_R r/R, where r is the number of
relevant items in the top R items
With NAP and Pre_at_R, we can evaluate which trend
orderings are best

Pre_at_R0.603/5 NAP0.68(1/12/33/54/75/9)/5
13
Data set SA

Six research domains regarding safety agriculture
(SA) were enumerated by a group of experts from
the Science Technology Policy Research and
Information Center (STPI)
food security, crop protection, livestock,
fishery, agroforestry, and environment
for each domain, a query was formulated to search
the ISIs Web of Science database
72500 records between 1996 and 2005 were
downloaded

14
Topic detection for Safety agriculture

Clustering analysis was based on controlled terms
179 SC terms each occurs in more than 10 docs.
3632 DE terms of this kind
Terms from each field (SC or DE) co-occurred in
the same records were counted
Similarity based on this count was used in a
complete link clustering algorithm
80 clusters (topics) were found for SC terms
1617 clusters for DE terms

15
Trend Type Labelling by Experts (1/2)

We sampled 50 of clusters from SC and 10 from
DE for experts to judge their trend types
6 professors, 2 researchers, 1 admin. manager
Trend types
sharp increasing
increasing
fluctuation
- decreasing
-- sharp decreasing
? inconclusive

16
Trend Type Labelling by Experts (2/2)

Experts were advised to judge the type of each
cluster based on their knowledge
If this did not help, the time series of the
cluster can be consulted
If this did not help either, the documents in the
cluster can be examined.
If all these efforts failed, the cluster was
labeled inconclusive

17
Experts feedback
Data are from 72500 documents in safety
agricultural area.
Sharp increase Increase
Controlled terms clustering Different fields
undecidable
- Decrease -- Sharp decrease
18
Date set Information retrieval (IR)

853 papers from the first ACM SIGIR conference to
the 25th were clustered by a commercial software
package called Clustan Graphics by Smeaton et al
ACM SIGIR Forum 2003
29 non-overlapping clusters were generated
They then inspected each cluster manually and
assigned a topic description to reflect the theme
of the majority of the papers in each cluster
Topics are sorted approximately in order of a
combination of the year of their first
appearance, and the number of papers published

19
Clustering and ordering of SIGIR papers by topics
made by Smeaton
20
Hot topics predicted by Seamton et al

The ideal paper title expected by Smeaton et al
to appear in SIGIR 2003 is
"Evaluation of a Language Model Implementation of
a Topic-Based, Cross-Lingual Question-Answering
and Summarisation System

21
Fourteen session titles (topics) in the SIGIR
2003 conference
22
Evaluation results SA
Avg is the average of the values in the SC and
DE rows
23
Prediction effectiveness when year span varies
from 1, 2, to 5
(x1, x2)gt ((x1-avg)/stderr, (x2-avg)/stderr)(
(x1-x2)/2 / (x1-x2)/2, (-x1x2)/2 / (x1-x2)/2
). Thus only 3 values result (-1, 1), (1, -1),
(0, 0), which in turn yield only 3 possible
slopes 2, -2, and 0.
24
Prediction based on less data
Percentage of performance drop for slp using only
the first n years of data, where n10, 8, 6, 4,
and 2.
Pre_at_R
NAP
25
Evaluation results IR
26
Conclusions

Which metrics (methods) perform best for ETD?
api average percentage of increase
slp slope of the linear regression line
eigen-trends
Smeatons chronological ordering
Our answer is slp, because it performs well
under
different year spans (1, 2, 5)
different observation durations (10 vs. 25 years)
different domains (SA vs IR)
different collection scales (72500 vs 853 papers)
api only works for n2 (so Noyons work still
valid)

27
Conclusions

Our goal is to explore the best way to predict
upward trends in an environment where a large
number of topics are to be monitored.
If a good trend index is used, the inspection in
the order sorted by the index should be efficient
Our work is important to know which metric is the
best under a certain condition.

28
Implications

The IR based method for evaluating the trend
index performance suggests a relatively objective
and repeatable procedure to indentify better
indices and to gather evidence to support (or
invalidate) our current results.

Write a Comment

User Comments (0)