Background to SEER Proposal - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Background to SEER Proposal

Description:

output was LT TTT, an XML-based toolset for low level linguistic analysis ... Helen Weir, the finance director of Kingfisher, was handed a 334,607 allowance ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 9
Provided by: alexpro8
Category:

less

Transcript and Presenter's Notes

Title: Background to SEER Proposal


1
Background to SEER Proposal
  • TTT (Text Tokenisation Tool)
  • output was LT TTT, an XML-based toolset for low
    level linguistic analysis and annotation
  • used as basis of
  • Participation in MUC7 named entity task
  • mostly rule-based approach
  • but incremental approach where max ent
    classifier was interleaved with rule-based
    annotation


2
Other Projects
  • DISP (Data Intensive Semantics Pragmatics)
  • corpus of Medline abstracts
  • annotation of domain specific entities and terms
    prior to parsing
  • Mascara
  • resolution of metonymy (e.g. place for
    organisation)
  • named entity recognition a prerequisite


3
Other Projects
  • CROSSMARC
  • Multilingual IE from web pages
  • laptop computer and job offer domains
    non-standard set of entities
  • EDIFY contract
  • Recognition of person names and addresses in
    emails


4
Different Domains, Different Entities
  • MUC
  • Web pages
  • Laptops
  • Job Ads
  • Email
  • Biomedical
  • DISP
  • BioCreative
  • Legal


5
Machine Learning vs Rule-Based
  • Rule-based systems tend to perform better but
    high labour costs.
  • Machine learning approaches are becoming
    competitive.
  • c.f. CoNLL 2002, 2003 shared task (Clark and
    Curran, Klein, Smarr, Nguyen and Manning).
  • Machine learning still requires significant
    amounts of training material labour costs still
    high and adaptation to new domains still slow.


6
Goals of the SEER Project
  • To develop the means to recognise a wider variety
    of entities than before including ones not
    signalled by capitalisation.
  • To experiment to find the most useful machine
    learning techniques.
  • To investigate boot-strapping techniques in order
    to minimise the amount of training data needed.


7
Where We Are Now
  • wider variety of entities
  • Chosen two very different domains, biomedical
    (genes, proteins etc) and architectural/archaeolog
    ical.
  • experiment with machine learning techniques
  • max ent tagging using CC and Stanford tagger
  • investigate boot-strapping techniques to minimise
    the amount of training data needed
  • STARTS TODAY!


8
MUC Named Entity Recognition
  • Helen Weir, the finance director of
    Kingfisher, was handed a 334,607 allowance last
    year to cover the costs of a relocation that
    appears to have shortened her commute by around
    15 miles. The payment to the 40-year-old amounts
    to roughly 23,000 a mile to allow her to move
    from Hampshire to Buckinghamshire after an
    internal promotion.
Write a Comment
User Comments (0)
About PowerShow.com