Name Disambiguation in Digital Libraries - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Name Disambiguation in Digital Libraries

Description:

Danny C. C. Poo, Teck-Kang Toh, Christopher S. G. Khoo, Glenn Hong. ... QUALIFIER: Question Answering by Lexical Fabric and External Resources. EACL 2003: 363-370 ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 9
Provided by: tanye
Category:

less

Transcript and Presenter's Notes

Title: Name Disambiguation in Digital Libraries


1
Name Disambiguation in Digital Libraries
  • Tan Yee Fan
  • 2005 October 19
  • WING Group Meeting

2
Digital libraries
  • DBLP, Citeseer, etc.
  • Information is stored as metadata records to
    facilitate searching
  • Author names
  • Titles
  • Publication titles
  • Inconsistency in metadata records hinders
    searching
  • Abbreviation of names and publication titles
  • Typographical errors

3
Are they the same author?
  • Danny Poo
  • Danny C. C. Poo, Teck-Kang Toh, Christopher S. G.
    Khoo, Glenn Hong. Development of an Intelligent
    Web Interface to Online Library Catalog
    Databases. APSEC 1999 64-7
  • Danny Chiang Choon Poo, Isaac K. C. Tan. Design
    of an Automatic Annotation Framework for
    Corporate Web Content. APSEC 2004 384-391
  • Hui Yang
  • Maan A. Kousa, Ahmed K. Elhakeem, Hui Yang.
    Performance of ATM networks under hybrid ARQ/FEC
    error control scheme. IEEE/ACM Trans. Netw. 7(6)
    917-925 (1999)
  • Hui Yang, Tat-Seng Chua. QUALIFIER Question
    Answering by Lexical Fabric and External
    Resources. EACL 2003 363-370

4
Who am I, I am who?
  • Author name disambiguation
  • Given a large number of citations, how to
    determine which name is which author?
  • Closely related problem citation matching
  • Given a large number of citations, how to
    determine which citations refer to the same
    papers?
  • Solutions must be scalable
  • DBLP has more than 660,000 citations
  • Citeseer has more than 730,000 documents

5
Ideas
  • Idea 1 determine the research field
  • Unfortunately, paper titles have limited words
    and some conferences tend to be broad
  • Idea 2 use coauthors information
  • Likely that an author will collaborate with a
    selected group of people
  • This group will likely publish a number of papers
    together
  • To find the similarity of coauthor lists

6
Forward directionM. Kan M.-Y. Kan Min-Yen
Kan
  • Problem
  • Pairwise comparison on all the coauthor lists is
    very expensive (few days also cannot finish)
  • Solution
  • Soft clustering on the coauthor lists using some
    cheap distance measure
  • Then perform pairwise comparison within the
    clusters
  • What is a good soft clustering algorithm?

7
Backward directionThis Hang Cui is not that
Hang Cui
  • Difficult to determine using the metadata alone
    without external resources
  • Many authors have several distinct research areas
  • Each research area with different collaborators
  • Currently investigating what kind of external
    resource to use
  • Goooooooooogle for URLs?

8
The end
  • But the research has just begun
Write a Comment
User Comments (0)
About PowerShow.com