Title: Local context
1A confidence-based framework for disambiguating
geographic terms Erik Rauch, Michael Bukatin,
and Kenneth Baker MetaCarta, Inc.
2(No Transcript)
3wine in Europe
4Al Hamra
( red in Arabic)
5(No Transcript)
6Local and non-local information
More non-local information -gt too many states to
get probabilities
Madison
s downtown
Wisconsin
Milwaukee
7Candidate places
8Local context
resident of Madison
Madison, WI Madison, ID Madison, CT Madison,
KY
9Context affects confidence
- Increase or decrease c(p,n) based on strength of
context words - by Madison vs. President Madison
- can be added manually or automatically
- and/or use HMM
10Local context problems
Madison family attractions
Milwaukee
Madison, WI Madison, ID Madison, CT Madison,
KY
11Using spatial patterns of geographic references
12Increase c(p,n) based on number of other
references Enclosing regions or nearby points
Madison
Wisconsin
Milwaukee
13Pitfalls
14Training
- Philadelphia is usually geographic Bend
usually isnt - If name n often refers to point p in documents,
give (n,p) high confidence to start with - Use average confidence in a large corpus
15Training contd
- Extract local linguistic contexts that often
occur with geographic names in tagged corpora - Or train HMM
16Relevance
Query cheese in France
- Several dimensions to relevance
- Traditional textual relevance of query terms
- Georelevance
17Georelevance
- Depends on
- Attributes of the geotext, e.g. document
frequency, font size, position - Geoconfidence
- Aim combination reflects users preferred
balance between recall and correctness of the
geographic reference - e.g. Georelevance query term relevance
geoconfidence
18Conclusion
- Ambiguity problem much worse with large
gazetteers - Can use probabilistic methods where feasible
(local information), combine with
confidence-based heuristics