Extracting Academic Affiliations Status Report - PowerPoint PPT Presentation

About This Presentation
Title:

Extracting Academic Affiliations Status Report

Description:

Extracting Academic Affiliations Status Report Alicia Tribble Einat Minkov Andy Schlaikjer Laura Kieras The Problem Identify people who are affiliated with an ... – PowerPoint PPT presentation

Number of Views:346
Avg rating:3.0/5.0
Slides: 19
Provided by: Tria407
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Extracting Academic Affiliations Status Report


1
Extracting Academic AffiliationsStatus Report
  • Alicia Tribble
  • Einat Minkov
  • Andy Schlaikjer
  • Laura Kieras

2
The Problem
  • Identify people who are affiliated with an
    academic institution
  • Degrees earned
  • Positions held (student, post-doc, faculty)
  • Current position
  • Class of beliefs to be learned
  • affiliated(ltpersongt,ltdegreegt,ltinstitutiongt)

3
The System /Algorithm
Query Generator
Search EngineInterface
Query relation
Query pattern
Html files
Extract patterns
Extract relations
Patterns
Relations (facts)
Assess patterns
Assess relations
4
Algorithm Details
  • Pattern query formulation
  • Replace ltarggt in pattern string with '' operator
  • Remove leading and trailing ''s
  • Wrap query string in quotes
  • Example
  • "ltPERSONgt received his ltDEGREEgt from
    ltUNIVERSITYgt"
  • -becomes-
  • '"received his from"'

5
Algorithm Details
  • Relation Extraction (Slot filling)
  • Find the relevant sentence/s on a page
  • Alignment slot filling
  • Some cleanup he, capitalization
  • Examples
  • Robertson, Ph.D. in ecology and evolutionary
    biology, Indiana University
  • Jeff, B.S., Bucknell University
  • Rex Jung, degree, University of New Mexico
  • Alavosius, BA in psychology, Clark University
  • Jacobs, B.E.E. degree, Cornell University
  • He, Associates Degree in Livestock Production,
    Northeast Community College

6
Algorithm Details
  • Relation query formulation
  • All argument values become query terms
  • Example
  • (William Cohen, Ph.D., Rutgers)
  • -becomes-
  • 'William Cohen Ph.D. Rutgers'

7
Algorithm Details
  • Pattern Extraction
  • Build a regex from a relation, one per argument
  • (Mr\.MrMRM\.?r\.?Dr\.?Mrs\.?MRSMsMS)
    ?(Scott FahlmanScottFahlman)
  • (a-zA-Z? dDegreeDdoctoral
    DdegreePhDPh\.D\.DoctoratePHD)
  • (MIT)
  • Apply regex to input and for every match, extract
    intermediate string and generalize
  • ltPERSONgt received her ltDEGREEgt from ltUNIVERSITYgt
  • ltPERSONgt received his ltDEGREEgt from ltUNIVERSITYgt
  • ltPERSONgt earned a ltDEGREEgt from ltUNIVERSITYgt
  • ltPERSONgts, MD ltUNIVERSITYgt

8
Experimental Settings
  • Initial seeds
  • Relations
  • affiliated('William Cohen', 'Ph.D.', 'Duke
    University')
  • affiliated('Tom Mitchell', 'Ph.D.', 'Stanford')
  • affiliated('Scott Fahlman', 'Ph.D.', 'MIT')
  • Patterns
  • ltPERSONgt received his ltDEGREEgt from ltUNIVERSITYgt
  • ltPERSONgt earned his ltDEGREEgt from ltUNIVERSITYgt
  • ltPERSONgt earned a ltDEGREEgt from ltUNIVERSITYgt
  • Testing and development performed with 2
    bootstrap iterations, using only Google snippets

9
Results!
  • inital
  • patterns 3
  • relations 3
  • iteration 0
  • patterns 6 (3)
  • relations 13 (3)
  • iteration 1
  • patterns 14 (9)
  • relations 0
  • total
  • patterns 23
  • relations 16

10
Interim Conclusions
  • Issue I over-specificity of queries arguments
  • Q "Oren Etzioni" "Ph.D" "CMU"
  • But, what if actual relevant mention
    includesA "Oren Etzioni", "doctorate"
    "Carnegie Mellon University".. ?
  • Possible avenues
  • Larger dictionaries
  • Unquote query arguments? (allow for some
    variation)
  • Allow argument values to include random terms
    "Oren Etzioni"
  • This might incorporate more noise, and require
    additional queries to be issued per relation.

11
Interim Conclusions
  • Issue II name and pronoun resolution
  • Q "Oren Etzioni" "Ph.D" "CMU"
  • But, what if actual relevant mention
    includesA "He recieved his Ph.D from CMU
    in..."
  • Rate of occurance of "S/he..." in extracted
    relations
  • 1 pattern, 50 queries 56.8 (96/169)
  • Possible avenues
  • Identify homepages and extract names from titles,
    or other unambiguous sources on page
  • Pronoun resolution simple techniques?? (for
    example, identify immediate previous name
    mentions. This may require NER.)

12
Interim Conclusions
  • Issue III compound sentences
  • Q "Oren Etzioni" "Ph.D" "CMU"
  • But, what if actual relevant mention
    includesA "Oren Etzioni recieved his MS from
    ltUNIVERSITYgt, and his Ph.D from CMU"
  • Possible avenues
  • Extensions to pattern extraction techinque
  • May require dependency parsing

13
Software / Resources
  • A generic search framework which allows
    asynchronous processing of search tasks, as well
    as "filter" tasks (processing of resulting URLs)
  • A URL caching implementation of Java 1.5's
    java.net.ResponseCache using Hibernate,
    supporting centralized caching and remote access

14
Generic Search Framework
Search
Extraction
SearchProcessor
Search
Test run 1 Search50 URLs169 Extractions15
seconds
Search Tasks
Filter Tasks
Result
Filter
15
Search Framework System Flow
16
Extensions
  • Dictionaries - next slide
  • Simple pronoun resolution
  • Extraction validation metrics
  • URL of professors personal home page
  • Clustering of people / universities, or
    normalization of names
  • Identify biography section of personal home pages
  • Links incoming and outgoing from personal home
    page

17
Additional information
  • Dictionary of institution names
  • Tiny dictionary of degrees
  • E.g. Ph.D., B.S., B. Tech., etc
  • Map of domain names to institution names
  • E.g. cmu.edu Carnegie Mellon University
  • This could be learned but we will leave that for
    another group!

18
Example extracted relations
  • Dictionary of institution names
  • Tiny dictionary of degrees
  • E.g. Ph.D., B.S., B. Tech., etc
  • Map of domain names to institution names
  • E.g. cmu.edu Carnegie Mellon University
  • This could be learned but we will leave that for
    another group!
Write a Comment
User Comments (0)
About PowerShow.com