Using the GATE Architecture for NE Recognition in the Football Domain PowerPoint PPT Presentation

presentation player overlay
1 / 17
About This Presentation
Transcript and Presenter's Notes

Title: Using the GATE Architecture for NE Recognition in the Football Domain


1
Using the GATE Architecture for NE Recognition in
the Football Domain
  • Horacio Saggion, Hamish Cunningham, Diana
    Maynard, Yorick Wilks
  • Department of Computer Science
  • University of Sheffield

2
MUMIS Objectives
  • European Project U. of Twente (CTIT), U. of
    Nijmegen (TSI), DFKI Saarbrücken, MPI, Sheffield
    (DCS), ESTEAM, and VDA
  • Technology development to automatically index
    (with formal annotations) lengthy multimedia
    recordings (off-line process)
  • Technology development to exploit indexed
    multimedia archives (on-line process)
  • Test Domain Football Games / UEFA Tournament 2000

3
Information Extraction Task
  • 31 events in the football domain shot on goal,
    goal, yellow card, red card, foul, free-kick,
    pass, etc.
  • Meta Data (result, teams, referee, city, stadium,
    )
  • Named Entities
  • Person gt player, referee, etc.
  • Place gt location on the pitch, etc.
  • Time gt relative time (2 min)
  • Numbers gt score, distance
  • 39 England's best movement of the match. Wise
    plays a crossfield pass to Gary Neville, who
    feeds Scholes,
  • Event Pass
  • Time 39
  • Player1 Dennise Wise
  • Player2 Gary Neville

4
Text Sources
  • Tickers
  • England Seaman, G. Neville, P. Neville,
    Campbell, Keown, Beckham, Scholes, Shearer, Owen,
    Ince, Wise. Substitutes Martyn, Wright,
    Southgate, Barry, Gerrard, Barmby, Heskey,
    Fowler, Phillips.
  • 1 England kick off. After all the expectation,
    we're finally under way. Playing from right to
    left, the first England attack is a long ball to
    Shearer.
  • Comments
  • After 34 years of hurt, self examination, navel
    gazing, inferiority complexes and frustration,
    Kevin Keegan believes the tide of German
    superiority over England has turned. 'We're fed
    up of hearing they've got something on us and we
    play them again soon. I hope we make them pay as
    we've had to pay.'
  • Matchs
  • Alan Shearer scored the all-important goal, not
    one of his most difficult but a strike destined
    to be remembered longer than many others, early
    in the second half. They had to survive a few
    subsequent scares, but England did enough to
    confirm they are not the worst team in their
    group. Indeed, England could swagger into the
    quarter-finals with confidence. They may need to,
    for Italy in Brussels are their most likely
    opponents.

5
Sheffield Information Extraction System
6
Basic Steps
  • Text Formats
  • HTML, XML, SGML, EMAIL
  • HTML head, title, paragraph, etc.
  • EMAIL from, date, subject, etc.
  • PLAIN TEXT, RTF
  • Unicode Tokeniser
  • Rule Based
  • (UPPERCASE_LETTER) (LOWERCASE_LETTER) gt Token
    orth upperInitial kind word

7
Gazetteer Look-up
  • Hand-coded lists (.lst) from different sources
  • referee_names_euro2000.lst
  • Günter Benkö
  • Pierluigi Collina
  • Set of lists defined in .def file and compiled
    into FSM
  • Each element has attributes MajorType and
    MinorType
  • national_teams_euro2000.lstchampionships_infot
    eam
  • referee_names_euro2000.lstchampionships_infore
    feree
  • players_goalkeeper.lstplayergoalkeeper

8
Regular Grammars
  • Java Annotation Pattern Engine (JAPE) Grammar
  • Similar to Common Pattern Specification Language
  • Set of rules
  • LHS regular expression over annotations
  • RHS annotations to be added
  • Priority
  • Left and Right context around the pattern
  • JAVA Code
  • Rules are compiled in a FST over annotations
  • A set of grammars can be loaded
  • Rules for sentence splitting

9
Rules
Rule TimeStamp5 ((Token.kind number)
(SPACE)? (Lookup.minorType minutes))
annotate (Token.string )) --gt
annotate.TimeStamp rule TimeStamp5
  • Adams (Keown 82mins)

England 6 - 1 Yugoslavia Team1 England Team2
Yugoslavia Score1 6 Score2 1
Rule AddValueStateOfGame1 (StateOfGame.rule
rule1)annotate --gt annotate JAVA CODE
10
NE Recognition
  • Holland 1 - 0 Czech Republic
  • Full Time. Holland 1 - 0 Czech Republic
  • Holland 1 Czech Rep 0
  • Germany Scholl 28 1 - 1 Romania Moldovan 5
  • England Seaman, G. Neville, Adams,
  • Holland (4-3-2-1) Van der Sar Reiziger, Stam
    (Konterman, 75min),
  • France 1. Bernard Lama 19. Christian Karembeu,
    18. Franck...
  • Gazetteer Lookup and Classification
  • Seaman Player, Goalkeeper, England
  • Cascade of Jape Grammars
  • Players (Name and Position), Teams (National and
    Collective), Substitution (On, Off, Time), Lists
    of Players (all playing, all substitutes),
    Formation, Temporal Expressions (General), Teams
    Playing, State of Game, Time Stamps, Results
    (partial, final)

11
Other Finite State Components
  • Lemmatiser
  • List of Exceptions (biases analysed as
    biass)
  • biases gt rootbias, affixs
  • Rules for Regular forms (expresses analysed as
    expresss)
  • ANY DOUBLE ES gt rootANY DOUBLE, affixs
  • POS tagger
  • Lexicon
  • beginning VBG NN
  • observed VBD VBN JJ
  • Rules
  • VB NN PREV1OR2TAG DT
  • VB VBP PREVTAG NNS
  • IN JJ SURROUNDTAG DT NN

12
Processing Resource
13
An Application
14
Language Resource
15
Visualization
16
Visualization
17
Prolog Components
  • Named Entities and Semantic Annotations
  • feed Prolog back-end components
  • Bottom Up Chart Parsing
  • Context Free Grammar
  • Semantic Rules
  • Discourse Interpretation
  • Entity and Event Co-reference
  • Presuppositions and Consequences
Write a Comment
User Comments (0)
About PowerShow.com