Title: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory
1Topic Structure Identification of PClause
Sequence Based on Generalized Topic Theory
- Yuru Jiang , Rou Song
- Beijing University of Technology
2Punctuation Clause
?? ? ??? ?? ?? ? 1 ? ?? ?? ,? ? ?? ?? ,
c1 ?? ? ??? ?? ?? ? 1 ? ? c2 ? ?? , c3 ? ?
? c4 ? ?? ,
PClause Sequence
3Topic Clause
- c1 ?? ? ??? ?? ?? ? 1 ? ?
- c2 ? ?? ,
- c3 ? ? ?
- c4 ? ?? ,
What we have done
- t1?? ? ??? ?? ?? ? 1 ? ?
- t2?? ? ?? ,
- t3?? ? ? ? ?
- t4?? ? ?? ,
4Identification Scheme
- Identification Process
- Identification Algorithm
- CTCs Scoring Function
5Identification Process
- Example2??(?????????)
- c1 ?? ? ??? ?? ?? ? 1 ? ?
- c2 ? ?? ,
- c3 ? ? ?
- c4 ? ?? ,
- t1 c1
- t2?
6- if
- t1 ?? ? ??? ?? ?? ? 1 ? ?
- c2 ? ?? ,
- then
- t2?
- ? ?? ,
- ?? ? ?? ,
- ?? ? ? ?? ,
- ?? ? ??? ? ?? ,
- ?? ? ??? ?? ? ? ?? ,
- ?? ? ??? ?? ?? ? ?? ,
- ?? ? ??? ?? ?? ? ? ?? ,
- ?? ? ??? ?? ?? ? 1 ? ?? ,
- ?? ? ??? ?? ?? ? 1 ? ? ?? ,
c2?CTCs
7t1
CTCs of c2
Topic Clause of C3
C3
8- if
- CTCs of c2
- c3 ? ? ,
- then
- t3?
- ? ?? ,
- ?? ? ?? ,
- ?? ? ? ?? ,
- ?? ? ??? ? ?? ,
- ?? ? ??? ?? ? ? ?? ,
- ?? ? ??? ?? ?? ? ?? ,
- ?? ? ??? ?? ?? ? ? ?? ,
- ?? ? ??? ?? ?? ? 1 ? ?? ,
- ?? ? ??? ?? ?? ? 1 ? ? ?? ,
CTCs of c2
9- if
- one CTC of c2?? ? ??? ? ?? ,
- c3 ? ? ,
- then
- one group CTCs of c3 is
- ? ? ,
- ?? ? ? ,
- ?? ? ? ? ,
- ?? ? ??? ? ? ,
- ?? ? ??? ? ? ? ,
- ?? ? ??? ? ?? ? ? ,
10(No Transcript)
11t1
c2?CTCs
c3?CTCs
12CTC Tree
How to choose the best path?
13Identification Algorithm
- Question1How to calculate the value of each node
in the CTC tree? - CTCs Scoring Function
- Question2 How to calculate the path value of
each leaf node to the root node? - Sum of the node value
14CTCs Scoring Function
- Given a CTC d of PClause c, a topic clause most
similar to d is found from the corpus, whose
similarity is marked as sim_CT(d). For any two
strings x and y, given that their similarity is
sim(x,y). sim_CT(d) is defined as
Topic Clause Corpus
15CTCs Scoring Function cont.
- CTset(c) is the CTCs set of c, then the topic
clause of c is - Accuracy rate is 0.6499
- ReferenceYuru Jiang, Rou Song Topic Clause
Identification Based On Generalized Topic Theory.
Journal of Chinese Information Processing. 26(5),
(2012)
16CTCs Scoring Function cont.
- Accuracy rate is 0.7625
- gt0.6499gtbaseline
17- Example3
- d_tcpre A ?? ? ? H ? H C ,
- d_c ?? ?? ?? ?
- t1 A ?? ? ? H ?? ?? ?? ?
- st1 A C ?? ? H ,
- t2 A ?? ? ? H ? H C ?? ?? ?? ?
t_tcpre A ?? B C ? C , t_c ? ?? ?? , t A ?? B
C ? C ? ?? ?? ,
18Experiment
- Corpus
- Evaluation Criteria
- Experiment Result
- Analysis
19Corpus
- 202 texts about fish in the Biology volume of
China Encyclopedia - 15 texts are used for test in the experiment
- K-1 test are used
20Evaluation Criteria
- For N PClauses, if the number of PClauses whose
topic clauses are correctly identified is hitN,
then the identification accuracy rate is hitN/N.
21Experiment Result
- Fig. 2. PClause Count and Accuracy Rate for Topic
Clause Identification about 15 texts
22AnalysisThere may be nodes with the same CTC
string
23AnalysisThe relation between accuracy rate and
the PClause position
24AnalysisThe relation between the accuracy rate
and the PClause depth
25Future Work
- CTCs Scoring Function
- CTC Tree
- Extend to other text
26Thank you! Any suggestion?