Building Hierarchical Classifiers Using Class Proximity - PowerPoint PPT Presentation

About This Presentation
Title:

Building Hierarchical Classifiers Using Class Proximity

Description:

Yahoo classes. Yahoo. recreation. science. automotive. sports ... http://dir.yahoo.com/recreation/sports. 7,550 documents. 367 classes, 7 levels. 10,747 terms ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 22
Provided by: cen7159
Category:

less

Transcript and Presenter's Notes

Title: Building Hierarchical Classifiers Using Class Proximity


1
Building Hierarchical Classifiers Using Class
Proximity
  • Ke Wang
  • Senqiang Zhou
  • Shiang Chen Liew
  • National University of Singapore

2
Hierarchical classification
  • Given
  • a class hierarchy
  • a collection of pre-classified documents
  • a document is a set of terms
  • Build
  • a classifier that assigns a relevant class to a
    new document
  • Key
  • extract features of classes

3
Yahoo classes
Yahoo
recreation
science
sports
automotive
cycling
skating
4
ACM classes
Hardware
Level 1
General
Memory_structure
Level 2
Level 3
Design_style
General
Cache_memories
Level 4
5
Existing local approaches
  • build one classifier at each split of the class
    hierarchy
  • determine features locally at each node
  • classify a document by going through a path of
    classifiers starting from the root

6
Diminishing of high level structure
  • rely on classification at high levels
  • but high level structures usually weak, i.e.,
    divergence of topics
  • e.g., car is a feature at Recreation
    Automotive, but not at Recreation

7
Bias of misclassification
  • sibling classes Vs. nephew classes
  • misclassification at high levels Vs. at low
    levels
  • specialisation Vs. generalisation

8
Features should be
  • determined wrt the target class
  • determined at all concept levels
  • correlated
  • The solution generalised association rules
    (SA95, HF95)
  • sql, IO ? DB
  • language, performance ? CS

9
Our approach
  • class proximity
  • global classifier
  • term hierarchy
  • use the best generalised association rule T? C
    to determine the class

10
Rank association rules
  • Biased confidence
  • Biased J-measure

11
An example
Arts
Music
Literature
...
...
A_Music
A_Literature
author
story
Class hierarchy
Term hierarchy
editor
writer
poem
fiction
12
Term hierarchy(T)YesClass proximity(B)Yes
  • R0 author,story?Literature (ConfB1,Clistd6,d7)
  • R1 author?Literature (ConfB1)
  • R2 story?Literature (ConfB0.67, Wlistd5(1))
  • R4 hall?Music (ConfB0.4, Clistd1,d2,
    Wlistd3(1))
  • R3 States?A_Literature (ConfB0.33, Clistd4,d5)

13
(No Transcript)
14
Experiment I
  • http//www.acm.org/dl/toc.html/
  • 26,515 papers, 78 classes, 14,754 terms
  • class hierarchyLevel-1 and level-2 categories
  • term hierarchyLevel-3 and level-4 categories
  • documentTitle and level-4 categories

15
Best rules found by (B,T)
  • CSO
  • vector,stream,processor,parallel?Processor_Archite
    ctures
  • multiple_instruction_stream?Processor_Architecture
    s
  • data_flow,architectur?Processor_Architectures
  • internet, architectur?Computer_Communication_Netwo
    rks
  • mode,atm?Computer_Communication_Networks
  • network,circuit_switching? Computer_Communication_
    Networks
  • tecniqu, model, attribut?Performance_of_Systems
  • Software
  • program,function, application?Programming_Techniqu
    es
  • object_oriented_programming?Programming_Techniques
  • reusable_software?Software_Engineering
  • software,methodologie?Software_Engineering
  • organization, distributed_system?Operating_Systems

16
() --- (T) --- ?
(B) --- ? (B,T) --- ?
(CDAR97,T) --- ?? (CDAR97) --- ?
17
(No Transcript)
18
Experiment II
  • http//dir.yahoo.com/recreation/sports
  • 7,550 documents
  • 367 classes, 7 levels
  • 10,747 terms
  • 90 of the terms occur in no more than 10
    documents and many documents contain only such
    terms

19
Best rules found by (B,T)
  • SportsCycling
  • page,mountain ?Mountain_Biking
  • product,bike?Mountain_Biking
  • mtb,mountain ?Mountain_Biking
  • held,bicycl?Races
  • classic,bicycl?Races
  • trip,tour?Travelogues
  • trip,canada ?Travelogues
  • bicycl,alaska?Travelogues
  • SportsAuto_Racing
  • team,result,driver?Formula_one
  • model,featur?Tracks_and_Speedways
  • oval?Tracks_and_Speedways
  • raceway?Tracks_and_Speedways

20
(No Transcript)
21
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com