Title: GATE: an AKT success story
1- GATE an AKT success story
- GATE open source language technology component
architecture and many tools, with a number of
AKT roles - http//gate.ac.uk/ http//nlp.shef.ac.uk/
- Hamish Cunningham
- Kalina Bontcheva
- Yorick Wilks
- Southampton, January 2004
- New GATE-related projects
- Current state of the system
- Future plans
2New Projects
- SEKT 9m IP with BT, AIFB, JSI, Empolis, SAI,
OntoPrise, ISOCO, UB, Kea-Pro - PrestoSpace 9m IP with BBC, RAI, ORF, INA,
... preservation of audio-visual media - KnowledgeWeb NoE successor to OntoWeb
- ETCSL GATE for humanities scholars
- hTechSight petrochem tech oversight
- SWAN large-scale semantic annotation
3SEKT large-scale DM robust HLT for NGKM
KEY MNLG Multilingual Natural Language
GenerationOBIE Ontology-Based Information
Extraction(MI)IE Mixed-Intiative IECLIE
Controlled Language IE
(M)NLG
Semantic Web Semantic GridSemantic Web
Services
Formal Knowledge(ontologies andinstance bases)
HumanLanguage
OBIE
(MI)IE
ControlledLanguage
CLIE
4SEKT Evaluating Semantic Tagging
- Need for new metrics when evaluating
hierarchy/ontology-based NE tagging - Need to take into account distance in the
hierarchy - Tagging a company as a charity is less wrong than
tagging it as a person - Several SEKT-related initiatives (w/s at ECAI
Pascal network)
5PrestoSpace
- Cultural Heritage / Digital Libraries IP
- BBC, RAI, ORF, INA, BG, USFD, and 23 others (!)
- 20th Century Rot rapid disappearance of
audio-visual media - Preservation and digitisation is high cost
- Therefore we need rich metadata and semantic
access - Little training data, open domain FSTs for users
- Follows MUMIS and other projects
- Evaluation TRECVID, OBIE
6GATE Status (version 2½)
- Stable core since end 2002
- Increasing numbers of users (next slide)
- Increasing numbers of languages (most recently
Chinese, Arabic, Russian, German system from
DotKom) - Increasing numbers of 3rd party components (e.g.
Medline and UMLS work, OBIE/KIM, QA,
summarisation, ...) - Embedded in KM applications
7A bit of a nuisance (GATE users)
- Thousands of users at hundreds of sites (based on
survey of 4,700 downloaders). A representative
sample - the American National Corpus project
- the Perseus Digital Library project, Tufts
University, US - Greenstone digital library, NZ
- Longman Pearson publishing, UK
- Merck KgAa, Germany
- Canon Europe, UK
- Knight Ridder, US
- BBN (leading HLT research lab), US
- SMEs inc. Sirma AI Ltd., Bulgaria
- Imperial College, London, the University of
Manchester, UMIST, Vassar College, the University
of Southern California and a large number of
other UK, US and EU Universities - UK and EU projects inc.MyGrid, CLEF, DotKom,
AMITIES, Cub Reporter, EMILLE, Poesia...
- GATE team projects.
- Past
- MUMIS semantic index of sports video
- MUSE, cross-genre entitiy finder
- HSL, Health-and-safety IE
- Old Bailey collaboration with HRI on 17th
century court reports - Multiflora plant taxonomy text analysis for
biodiversity research e-science - EMILLE S. Asian languages corpus
- ACE / TIDES Arabic, Chinese NE
- Present
- Advanced Knowledge Technologies
- SEKT next-generation KM
- PrestoSpace audiovisual preservation)
- KnowledgeWeb semantic web network
- h-TechSight technology oversight
- ETCSL Sumerian language corpus
- SWAN Semantic Web Annotator
8Some new stuff
- Johns Hopkins w/s on Semantic Annotation
BNC-based corpus, ME expts - WEKA 2 release (JSI library integration soon)
- papers RANLP, ISWC, Journal of Digital
Libraries, Journal of Data and Knowledge Eng. - JWS editorial board co-editor JNLE special
- RANLP IE tutorial, tutorial on HLT/SW at ESWS
- HLT/SW evaluation workshop at ECAI
- OBIE in Multiflora, hTechsight
- SW NLG in MiAKT (below)
9MIAKT NLG for SW
RDF input from image annotation GUI...
...generated text
MIAKT has important productivity and accuracy
implications
10hTechSight tech oversight
- Ontology-Based IE (OBIE) for semantic tagging of
job adverts, news and reports in chemical
engineering domain - Aim is to track technological change over time
- Centred around domain-specific ontology
- Terminological gazetteer lists are linked to
classes in the ontology - Rules classify the mentions in the text wrt. the
domain ontology - Annotations output to DB or RDF
11OBIE in MultiFlora 2 Combining Information
Extraction and Knowledge Representation for
Biodiversity Informatics
Varyingplanttaxa
Merged RDF
BBSRC project led by Mary McGee Wood, U. Mcr.
12GATE 4 the Final Conflict
- (GATE 3 release happening soonish)
- Continuity guaranteed for AKT phase 2 (2 million
GATE-related work 2004-2007) - Some future elements
- more and better OBIE, inc. cross-doc co-reference
- pluggable OWL repository support (now only
Sesame soon 3Store, KAON) - large- and huge-scale processing
- standardisation of the component integration
model (ECLIPSE) - service-based integration (SDK SW API)
- This talk http//gate.ac.uk/sale/talks/akt-jan04.
ppt - What else? You tell us...