Extracting Academic Affiliations - PowerPoint PPT Presentation

About This Presentation
Title:

Extracting Academic Affiliations

Description:

Determine academic institutions with which a professor is or has been affiliated ... web pages, especially in resume, CV, or biography section of personal home pages ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 12
Provided by: tri5334
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Extracting Academic Affiliations


1
Extracting Academic Affiliations
  • Alicia Tribble
  • Einat Minkov
  • Andy Schlaikjer
  • Laura Kieras

2
The Problem
  • Determine academic institutions with which a
    professor is or has been affiliated
  • Where degrees earned
  • Previous affiliations, including post-doc
  • Current affiliation
  • Why would this be useful?
  • Studying social networks in academia
  • Person entity disambiguation

3
Knowledge We Will Learn
  • Example text rules to be learned
  • If string ltpersongt received his ltdegreegt in
    ltdepartmentgt from ltinstitutiongt , Then
    'Affiliated(ltpersongt, ltinstitutiongt)
  • If stringltdegreegt , ltdepartmentgt ,
    ltinstitutiongt on ltpersongt s home page,
  • Then 'Affiliated(ltpersongt, ltinstitutiongt)'
  • Class of beliefs to be learned
  • Affiliated(ltpersongt,ltinstitutiongt)

4
Sources of redundant information
  • URL of professors personal home page(e.g.,
    www.cmu.edu/xxx)
  • Text found on multiple web pages, especially in
    resume, CV, or biography section of personal home
    pages
  • Links incoming and outgoing from personal home
    pages

5
Additional information
  • Dictionary of institution names
  • Dictionary of degrees
  • E.g. Ph.D., B.S., B. Tech., etc
  • Map of domain names to institution names
  • E.g cmu.edu -gt Carnegie Mellon University
  • This could be learned but we will leave that for
    another group!

6
Bootstrapping Logistics
  • Start with a few seed rules and seed facts
  • Use these rules to learn more facts, these facts
    to learn more rules, etc etc!

7
Our seed facts
  • Affiliated(ltTom M. Mitchellgt, ltStanford
    Universitygt)
  • Affiliated(ltTom Mitchellgt, ltCarnegie Mellon
    Universitygt)
  • Affiliated(ltWilliam Cohengt, ltDuke Universitygt)

8
Our seed rules
  • If URL of personal web page is in the academic
    URL dictionary, then believe Affiliated(ltpersongt,
    ltinstitutiongt)
  • If looking at a resume or personal web page and
    any of the patterns below are found, then believe
    Affiliated(ltpersongt,ltinstitutiongt)
  • "ltdegreegt.ltdepartmentgt ltinstitutiongt.
  • "ltdegreegt.ltinstitutiongt ltdepartmentgt
  • "ltpositiongt,ltdepartmentgt ltinstitutiongt
  • "ltpersongt received ltpronoungt ltdegreegt from
    ltinstitutiongt"

9
Algorithm walk-through
  • Start with known belief Affiliated(William Cohen,
    Duke University)
  • Extract sentences from William Cohen web page
    that contain "William Cohen" and "Duke"
  • Found pattern "William Cohen received his
    bachelor's degree in Computer Science from Duke
    University in 1984
  • Learned new pattern "received ltpronoungt ltdegreegt
    from ltinstitutiongt

10
Walk-through continued
  • Search for new web pages matching our pattern
    "received his degree from
  • Found example "Adnan Darwiche is an Associate
    Professor of Computer Science at UCLA, having
    received his PhD and MS degrees in Computer
    Science from Stanford University
  • Extracted belief Affiliated(Adnan Darwiche,
    Stanford University)

11
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com