Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory PowerPoint PPT Presentation

presentation player overlay
1 / 26
About This Presentation
Transcript and Presenter's Notes

Title: Topic Structure Identification of PClause Sequence Based on Generalized Topic Theory


1
Topic Structure Identification of PClause
Sequence Based on Generalized Topic Theory
  • Yuru Jiang , Rou Song
  • Beijing University of Technology

2
Punctuation Clause
  • Example??

?? ? ??? ?? ?? ? 1 ? ?? ?? ,? ? ?? ?? ,
c1 ?? ? ??? ?? ?? ? 1 ? ? c2 ? ?? , c3 ? ?
? c4 ? ?? ,
PClause Sequence
3
Topic Clause
  • c1 ?? ? ??? ?? ?? ? 1 ? ?
  • c2 ? ?? ,
  • c3 ? ? ?
  • c4 ? ?? ,

What we have done
  • t1?? ? ??? ?? ?? ? 1 ? ?
  • t2?? ? ?? ,
  • t3?? ? ? ? ?
  • t4?? ? ?? ,

4
Identification Scheme
  • Identification Process
  • Identification Algorithm
  • CTCs Scoring Function

5
Identification Process
  • Example2??(?????????)
  • c1 ?? ? ??? ?? ?? ? 1 ? ?
  • c2 ? ?? ,
  • c3 ? ? ?
  • c4 ? ?? ,
  • t1 c1
  • t2?

6
  • if
  • t1 ?? ? ??? ?? ?? ? 1 ? ?
  • c2 ? ?? ,
  • then
  • t2?
  1. ? ?? ,
  2. ?? ? ?? ,
  3. ?? ? ? ?? ,
  4. ?? ? ??? ? ?? ,
  5. ?? ? ??? ?? ? ? ?? ,
  6. ?? ? ??? ?? ?? ? ?? ,
  7. ?? ? ??? ?? ?? ? ? ?? ,
  8. ?? ? ??? ?? ?? ? 1 ? ?? ,
  9. ?? ? ??? ?? ?? ? 1 ? ? ?? ,

c2?CTCs
7
t1
CTCs of c2
Topic Clause of C3
C3
8
  • if
  • CTCs of c2
  • c3 ? ? ,
  • then
  • t3?
  1. ? ?? ,
  2. ?? ? ?? ,
  3. ?? ? ? ?? ,
  4. ?? ? ??? ? ?? ,
  5. ?? ? ??? ?? ? ? ?? ,
  6. ?? ? ??? ?? ?? ? ?? ,
  7. ?? ? ??? ?? ?? ? ? ?? ,
  8. ?? ? ??? ?? ?? ? 1 ? ?? ,
  9. ?? ? ??? ?? ?? ? 1 ? ? ?? ,

CTCs of c2
9
  • if
  • one CTC of c2?? ? ??? ? ?? ,
  • c3 ? ? ,
  • then
  • one group CTCs of c3 is
  1. ? ? ,
  2. ?? ? ? ,
  3. ?? ? ? ? ,
  4. ?? ? ??? ? ? ,
  5. ?? ? ??? ? ? ? ,
  6. ?? ? ??? ? ?? ? ? ,

10
(No Transcript)
11
t1
c2?CTCs
c3?CTCs
12
CTC Tree
How to choose the best path?
13
Identification Algorithm
  • Question1How to calculate the value of each node
    in the CTC tree?
  • CTCs Scoring Function
  • Question2 How to calculate the path value of
    each leaf node to the root node?
  • Sum of the node value

14
CTCs Scoring Function
  • Given a CTC d of PClause c, a topic clause most
    similar to d is found from the corpus, whose
    similarity is marked as sim_CT(d). For any two
    strings x and y, given that their similarity is
    sim(x,y). sim_CT(d) is defined as

Topic Clause Corpus
15
CTCs Scoring Function cont.
  • CTset(c) is the CTCs set of c, then the topic
    clause of c is
  • Accuracy rate is 0.6499
  • ReferenceYuru Jiang, Rou Song Topic Clause
    Identification Based On Generalized Topic Theory.
    Journal of Chinese Information Processing. 26(5),
    (2012)

16
CTCs Scoring Function cont.
  • Accuracy rate is 0.7625
  • gt0.6499gtbaseline

17
  • Example3
  • d_tcpre A ?? ? ? H ? H C ,
  • d_c ?? ?? ?? ?
  • t1 A ?? ? ? H ?? ?? ?? ?
  • st1 A C ?? ? H ,
  • t2 A ?? ? ? H ? H C ?? ?? ?? ?

t_tcpre A ?? B C ? C , t_c ? ?? ?? , t A ?? B
C ? C ? ?? ?? ,
18
Experiment
  • Corpus
  • Evaluation Criteria
  • Experiment Result
  • Analysis

19
Corpus
  • 202 texts about fish in the Biology volume of
    China Encyclopedia
  • 15 texts are used for test in the experiment
  • K-1 test are used

20
Evaluation Criteria
  • For N PClauses, if the number of PClauses whose
    topic clauses are correctly identified is hitN,
    then the identification accuracy rate is hitN/N.

21
Experiment Result
  • Fig. 2. PClause Count and Accuracy Rate for Topic
    Clause Identification about 15 texts

22
AnalysisThere may be nodes with the same CTC
string
23
AnalysisThe relation between accuracy rate and
the PClause position
24
AnalysisThe relation between the accuracy rate
and the PClause depth
25
Future Work
  • CTCs Scoring Function
  • CTC Tree
  • Extend to other text

26
Thank you! Any suggestion?
Write a Comment
User Comments (0)
About PowerShow.com