Title: Using the GATE Architecture for NE Recognition in the Football Domain
1Using the GATE Architecture for NE Recognition in
the Football Domain
- Horacio Saggion, Hamish Cunningham, Diana
Maynard, Yorick Wilks - Department of Computer Science
- University of Sheffield
2MUMIS Objectives
- European Project U. of Twente (CTIT), U. of
Nijmegen (TSI), DFKI Saarbrücken, MPI, Sheffield
(DCS), ESTEAM, and VDA - Technology development to automatically index
(with formal annotations) lengthy multimedia
recordings (off-line process) - Technology development to exploit indexed
multimedia archives (on-line process) - Test Domain Football Games / UEFA Tournament 2000
3Information Extraction Task
- 31 events in the football domain shot on goal,
goal, yellow card, red card, foul, free-kick,
pass, etc. - Meta Data (result, teams, referee, city, stadium,
) - Named Entities
- Person gt player, referee, etc.
- Place gt location on the pitch, etc.
- Time gt relative time (2 min)
- Numbers gt score, distance
- 39 England's best movement of the match. Wise
plays a crossfield pass to Gary Neville, who
feeds Scholes, - Event Pass
- Time 39
- Player1 Dennise Wise
- Player2 Gary Neville
4Text Sources
- Tickers
- England Seaman, G. Neville, P. Neville,
Campbell, Keown, Beckham, Scholes, Shearer, Owen,
Ince, Wise. Substitutes Martyn, Wright,
Southgate, Barry, Gerrard, Barmby, Heskey,
Fowler, Phillips. - 1 England kick off. After all the expectation,
we're finally under way. Playing from right to
left, the first England attack is a long ball to
Shearer. - Comments
- After 34 years of hurt, self examination, navel
gazing, inferiority complexes and frustration,
Kevin Keegan believes the tide of German
superiority over England has turned. 'We're fed
up of hearing they've got something on us and we
play them again soon. I hope we make them pay as
we've had to pay.' - Matchs
- Alan Shearer scored the all-important goal, not
one of his most difficult but a strike destined
to be remembered longer than many others, early
in the second half. They had to survive a few
subsequent scares, but England did enough to
confirm they are not the worst team in their
group. Indeed, England could swagger into the
quarter-finals with confidence. They may need to,
for Italy in Brussels are their most likely
opponents.
5Sheffield Information Extraction System
6Basic Steps
- Text Formats
- HTML, XML, SGML, EMAIL
- HTML head, title, paragraph, etc.
- EMAIL from, date, subject, etc.
- PLAIN TEXT, RTF
- Unicode Tokeniser
- Rule Based
- (UPPERCASE_LETTER) (LOWERCASE_LETTER) gt Token
orth upperInitial kind word
7Gazetteer Look-up
- Hand-coded lists (.lst) from different sources
- referee_names_euro2000.lst
- Günter Benkö
- Pierluigi Collina
- Set of lists defined in .def file and compiled
into FSM - Each element has attributes MajorType and
MinorType - national_teams_euro2000.lstchampionships_infot
eam - referee_names_euro2000.lstchampionships_infore
feree - players_goalkeeper.lstplayergoalkeeper
-
8Regular Grammars
- Java Annotation Pattern Engine (JAPE) Grammar
- Similar to Common Pattern Specification Language
- Set of rules
- LHS regular expression over annotations
- RHS annotations to be added
- Priority
- Left and Right context around the pattern
- JAVA Code
- Rules are compiled in a FST over annotations
- A set of grammars can be loaded
- Rules for sentence splitting
9Rules
Rule TimeStamp5 ((Token.kind number)
(SPACE)? (Lookup.minorType minutes))
annotate (Token.string )) --gt
annotate.TimeStamp rule TimeStamp5
England 6 - 1 Yugoslavia Team1 England Team2
Yugoslavia Score1 6 Score2 1
Rule AddValueStateOfGame1 (StateOfGame.rule
rule1)annotate --gt annotate JAVA CODE
10NE Recognition
- Holland 1 - 0 Czech Republic
- Full Time. Holland 1 - 0 Czech Republic
- Holland 1 Czech Rep 0
- Germany Scholl 28 1 - 1 Romania Moldovan 5
- England Seaman, G. Neville, Adams,
- Holland (4-3-2-1) Van der Sar Reiziger, Stam
(Konterman, 75min), - France 1. Bernard Lama 19. Christian Karembeu,
18. Franck... - Gazetteer Lookup and Classification
- Seaman Player, Goalkeeper, England
- Cascade of Jape Grammars
- Players (Name and Position), Teams (National and
Collective), Substitution (On, Off, Time), Lists
of Players (all playing, all substitutes),
Formation, Temporal Expressions (General), Teams
Playing, State of Game, Time Stamps, Results
(partial, final) -
11Other Finite State Components
- Lemmatiser
- List of Exceptions (biases analysed as
biass) - biases gt rootbias, affixs
- Rules for Regular forms (expresses analysed as
expresss) - ANY DOUBLE ES gt rootANY DOUBLE, affixs
- POS tagger
- Lexicon
- beginning VBG NN
- observed VBD VBN JJ
- Rules
- VB NN PREV1OR2TAG DT
- VB VBP PREVTAG NNS
- IN JJ SURROUNDTAG DT NN
12Processing Resource
13An Application
14Language Resource
15Visualization
16Visualization
17Prolog Components
- Named Entities and Semantic Annotations
- feed Prolog back-end components
- Bottom Up Chart Parsing
- Context Free Grammar
- Semantic Rules
- Discourse Interpretation
- Entity and Event Co-reference
- Presuppositions and Consequences