GATE: an AKT success story - PowerPoint PPT Presentation

About This Presentation
Title:

GATE: an AKT success story

Description:

MUSE, cross-genre entitiy finder. HSL, Health-and-safety IE ... UK and EU projects inc.MyGrid, CLEF, DotKom, AMITIES, Cub Reporter, EMILLE, Poesia... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 13
Provided by: ham48
Category:
Tags: akt | gate | story | success

less

Transcript and Presenter's Notes

Title: GATE: an AKT success story


1
  • GATE an AKT success story
  • GATE open source language technology component
    architecture and many tools, with a number of
    AKT roles
  • http//gate.ac.uk/ http//nlp.shef.ac.uk/
  • Hamish Cunningham
  • Kalina Bontcheva
  • Yorick Wilks
  • Southampton, January 2004
  • New GATE-related projects
  • Current state of the system
  • Future plans

2
New Projects
  • SEKT 9m IP with BT, AIFB, JSI, Empolis, SAI,
    OntoPrise, ISOCO, UB, Kea-Pro
  • PrestoSpace 9m IP with BBC, RAI, ORF, INA,
    ... preservation of audio-visual media
  • KnowledgeWeb NoE successor to OntoWeb
  • ETCSL GATE for humanities scholars
  • hTechSight petrochem tech oversight
  • SWAN large-scale semantic annotation

3
SEKT large-scale DM robust HLT for NGKM
KEY MNLG Multilingual Natural Language
GenerationOBIE Ontology-Based Information
Extraction(MI)IE Mixed-Intiative IECLIE
Controlled Language IE
(M)NLG
Semantic Web Semantic GridSemantic Web
Services
Formal Knowledge(ontologies andinstance bases)
HumanLanguage
OBIE
(MI)IE
ControlledLanguage
CLIE
4
SEKT Evaluating Semantic Tagging
  • Need for new metrics when evaluating
    hierarchy/ontology-based NE tagging
  • Need to take into account distance in the
    hierarchy
  • Tagging a company as a charity is less wrong than
    tagging it as a person
  • Several SEKT-related initiatives (w/s at ECAI
    Pascal network)

5
PrestoSpace
  • Cultural Heritage / Digital Libraries IP
  • BBC, RAI, ORF, INA, BG, USFD, and 23 others (!)
  • 20th Century Rot rapid disappearance of
    audio-visual media
  • Preservation and digitisation is high cost
  • Therefore we need rich metadata and semantic
    access
  • Little training data, open domain FSTs for users
  • Follows MUMIS and other projects
  • Evaluation TRECVID, OBIE

6
GATE Status (version 2½)
  • Stable core since end 2002
  • Increasing numbers of users (next slide)
  • Increasing numbers of languages (most recently
    Chinese, Arabic, Russian, German system from
    DotKom)
  • Increasing numbers of 3rd party components (e.g.
    Medline and UMLS work, OBIE/KIM, QA,
    summarisation, ...)
  • Embedded in KM applications

7
A bit of a nuisance (GATE users)
  • Thousands of users at hundreds of sites (based on
    survey of 4,700 downloaders). A representative
    sample
  • the American National Corpus project
  • the Perseus Digital Library project, Tufts
    University, US
  • Greenstone digital library, NZ
  • Longman Pearson publishing, UK
  • Merck KgAa, Germany
  • Canon Europe, UK
  • Knight Ridder, US
  • BBN (leading HLT research lab), US
  • SMEs inc. Sirma AI Ltd., Bulgaria
  • Imperial College, London, the University of
    Manchester, UMIST, Vassar College, the University
    of Southern California and a large number of
    other UK, US and EU Universities
  • UK and EU projects inc.MyGrid, CLEF, DotKom,
    AMITIES, Cub Reporter, EMILLE, Poesia...
  • GATE team projects.
  • Past
  • MUMIS semantic index of sports video
  • MUSE, cross-genre entitiy finder
  • HSL, Health-and-safety IE
  • Old Bailey collaboration with HRI on 17th
    century court reports
  • Multiflora plant taxonomy text analysis for
    biodiversity research e-science
  • EMILLE S. Asian languages corpus
  • ACE / TIDES Arabic, Chinese NE
  • Present
  • Advanced Knowledge Technologies
  • SEKT next-generation KM
  • PrestoSpace audiovisual preservation)
  • KnowledgeWeb semantic web network
  • h-TechSight technology oversight
  • ETCSL Sumerian language corpus
  • SWAN Semantic Web Annotator

8
Some new stuff
  • Johns Hopkins w/s on Semantic Annotation
    BNC-based corpus, ME expts
  • WEKA 2 release (JSI library integration soon)
  • papers RANLP, ISWC, Journal of Digital
    Libraries, Journal of Data and Knowledge Eng.
  • JWS editorial board co-editor JNLE special
  • RANLP IE tutorial, tutorial on HLT/SW at ESWS
  • HLT/SW evaluation workshop at ECAI
  • OBIE in Multiflora, hTechsight
  • SW NLG in MiAKT (below)

9
MIAKT NLG for SW
RDF input from image annotation GUI...
...generated text
MIAKT has important productivity and accuracy
implications
10
hTechSight tech oversight
  • Ontology-Based IE (OBIE) for semantic tagging of
    job adverts, news and reports in chemical
    engineering domain
  • Aim is to track technological change over time
  • Centred around domain-specific ontology
  • Terminological gazetteer lists are linked to
    classes in the ontology
  • Rules classify the mentions in the text wrt. the
    domain ontology
  • Annotations output to DB or RDF

11
OBIE in MultiFlora 2 Combining Information
Extraction and Knowledge Representation for
Biodiversity Informatics
Varyingplanttaxa
Merged RDF
BBSRC project led by Mary McGee Wood, U. Mcr.
12
GATE 4 the Final Conflict
  • (GATE 3 release happening soonish)
  • Continuity guaranteed for AKT phase 2 (2 million
    GATE-related work 2004-2007)
  • Some future elements
  • more and better OBIE, inc. cross-doc co-reference
  • pluggable OWL repository support (now only
    Sesame soon 3Store, KAON)
  • large- and huge-scale processing
  • standardisation of the component integration
    model (ECLIPSE)
  • service-based integration (SDK SW API)
  • This talk http//gate.ac.uk/sale/talks/akt-jan04.
    ppt
  • What else? You tell us...
Write a Comment
User Comments (0)
About PowerShow.com