Treatment of Duplicates in the ADL Gazetteer - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Treatment of Duplicates in the ADL Gazetteer

Description:

Lake Tahoe. Discussion of approach to resolving issues regarding 'duplicates' ... Half Moon Lake. Gazetteer 'Duplicates' Discussion (4) Specifics. Textual ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 29
Provided by: jordanh6
Category:

less

Transcript and Presenter's Notes

Title: Treatment of Duplicates in the ADL Gazetteer


1
Treatment of Duplicates in the ADL Gazetteer
  • Jordan Hastings Linda Hill
  • Alexandria Digital Library ProjectDepartment of
    GeographyUniversity of CaliforniaSanta Barbara

2
Introduction (1)
  • What is a gazetteer?
  • Spatial dictionary of named and typed features
    located in the environment.
  • Traditional Appendix in Atlas
  • Digital Computer Database

3
Introduction (2)
  • What are duplicates?
  • Features that are somehow conflicted re
    names, types, or locations
  • One feature - many names
  • One name - many features

4
Introduction (3)
  • What is the ADL Gazetteer?
  • http//www.alexandria.ucsb.edu/gazetteer
  • Key access component for digital geodata
  • Pilot implementation of publicly-accessible
    placename (feature) database service
  • Fundamental GI Science research activity

5
Outline of Talk
  • Tour of gazetteer-related issues in
    California-Nevada, esp. Lake Tahoe
  • Discussion of approach to resolving issues
    regarding duplicates
  • Demonstration of software that implements the
    approach

6
(No Transcript)
7
California Nevada
Source DataESRI ArcView 3.2
8
(No Transcript)
9
(No Transcript)
10
Names Features (4)
  • Lake Bigler, thru 1920s
  • Lake Bonpland (also Bondland), thru 1890s
  • Da-ow-a-ga, thru 1850s

11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Discussion (1) Definitions
  • DEF Feature Humanly recognizable, persistent
    phenomenon in the environment
  • Each feature integrates interrelates three
    differentkinds of attributes (with special
    issues)
  • Location (framework scale, accuracy)
  • Name (linguistics, culture)
  • Type (taxonomy, ontology)
  • DEF Gazetteer Database of Featuresi.e., a
    spatial dictionary,continually evolving

16
Discussion (2) Approach
  • Multiple metrics of feature similarity
  • Geospatial
  • Proximity (familiarity)
  • Containment (hierarchy)
  • Textual
  • Notation (as written)
  • Diction (as spoken)
  • Weighted combinations of these metrics

17
Discussion (3) Specifics
  • Geospatial Metrics (w/Subtleties)
  • Great Circle Distance
  • Bounding Box Topology(Polygons may not be
    better!)
  • Inside
  • Nearto
  • both scaled areally

Twin Lakes
Half Moon Lake
18
Discussion (4) Specifics
  • Textual Metrics (w/Subtleties)
  • Hamming Distance (hd)
  • hd (Lake,Pond) 4
  • Edit Distance (ed)
  • ed (Lake,Lakes) 1
  • Soundex (sdx)
  • sdx (Pyramid Lake) P653
  • sdx (Lake Tahoe) T000

1 B,P,F,V2 C,S,K,G,J,Q,X,Z3 D,T4 L5
M,N6 R
19
Discussion (5) Specifics
  • Canonical Names
  • Tahoe
  • Lake Tahoe Tahoe, Lake
  • Tahoe, Lake
  • but
  • Lake Bigler Bigler, Lake
  • Big Frog Lake

Big Frog, LakeFrog, Big, Lake
?
20
Demonstration (1) - Background
  • GNIS Dataset http//geoname.usgs.gov
  • Public product of USGS / Mapping Div. for BGN
  • Centroid point features, from many 1100K- maps
  • Web-accessible, updated ad hoc
  • GDT Dataset http//www.gdt1.com
  • Private product, sold into logistics mapping
    firms
  • Polygon line features, from DLGs, other sources
  • CD-publication (75 for U.S.), updated quarterly

21
demo
22
Demonstration (2) - Processing
  • De-Duping GNIS
  • 1) By Name -- sampled thru Cs
  • 2) By Location -- sampled thru Cs
  • Full (prior) results viewed
  • Matching GNIS to GDT
  • 3) By Combination -- run to completion
  • Statistics, Metrics discussed reviewed

23
Summary (1)
  • Features Cover a Large Territory
  • Crisp or Diffuse
  • Compact or Extended
  • Tangible or Abstract
  • Naming Features is Human Necessity
  • Linguistic Reference
  • Identity and Ownership
  • Navigation and Wayfinding

24
Summary (2)
  • Feature Names are Numerous Various
  • Polynymous, multi-lingual
  • Suffused with linguistic conventions
  • Time-variant
  • Feature Locations also Numerous Various
  • Projected, multi-scale
  • Obscured by cartographic conventions
  • Time-variant
  • Types, too, can be Numerous Various

25
Summary (3)
  • Automatic Recognition of Duplicates
  • Essential to gazetteer construction
  • Relies on both geospatial textual
    metricsweighting of combinations is subjective
  • Results in multiple characterizationsfor a
    single feature in many (most) cases,? database
    visualization implications
  • Gazetteers pushing at the limits of GIS
  • spatially, temporally, and ontologically

26
Observations
  • Features are subjective, not objective
  • Duplicate features are not problems, but clews
    to important subtleties
  • No right answer to feature-izing the
    environment. Features vary
  • Spatially (scale)
  • Temporally
  • Culturally (socially)
  • Cognitively (personally)

27
end
28
Future Work
  • Widening beyond California Nevada
  • Adjusting metrics weights, regionally
  • Testing computational costs/benefits of polygon
    vs. bounding box calculations
  • Exploring database mechanisms to deal with
    complexity of gazetteer knowledge
  • Implementing in Web-mapping GIS

29
Feature Types (1)
  • Dependable Type System
  • Because Features are Objects
  • Because Human Mind Categorizes
  • Types present in Taxonomy
  • Hierarchy is Natural in Environment
  • Because Human Mind Categorizes

30
Feature Types (2) Examples
  • Cultural Environment
  • Nations -gt States -gt Provinces -gt Districts

31
Feature Types (2) - Examples
  • Physical Environment
  • Watersources Springs--gtSeeps
  • Watercourses Rivers--gtStreams--gtCreeks
  • Waterbodies Lakes--gtPonds--gtSloughs
    ?Glaciers

32
Feature Types (2)
  • Type Examples
  • Cultural Environment
  • Nations -gt States -gt Provinces -gt Districts
  • Physical Environment
  • Watersources Springs, Seeps
  • Watercourses Rivers, Streams, Creeks
  • Waterbodies Lakes, Ponds, Sloughs, ?Glaciers

33
Fundaments (1)
  • Definition Gazetteer A spatial dictionary of
    named typed features in the environment
  • Implications
  • Features uniquely identified
  • Searchable by name and type
  • Also searchable geospatially

34
Fundaments (2)
  • Duplicates An approximate notion
  • Firm types, close in hierarchy
  • Locations close dependent on scale
  • Names close dependent on language or not at
    all
  • All aspects variant in time

35
Fundaments (3)
  • Database Implications / Support
  • Custom Datatypes
  • Hierarchy
  • Geometry
  • Multiple Attribution (unlimited)
  • Names
  • Locations
  • Efficient Geospatial Processing

36
Approach (1)
  • Independent Measures of Duplicates
  • 1. Type Thesaurus Metrics
  • Inter-feature hierarchy, explicit linkages
  • 2. Geospatial Metrics
  • Intra-feature size, compactness,
  • Inter-feature distance, overlap,
  • 3. Geonomial Metrics
  • Intra-feature NL translation not considered
    yet
  • Intra-feature stemming, soundex, substitution

37
Approach (2)
  • Unified Assessment of Duplicates
  • Weighted Combination of Measures
  • 1 Type
  • 2 Location(s)
  • 3 Name(s)
  • Geographic Visualization, over Maps
  • Final Authority of Human Cataloger

38
Gazetteer DuplicatesProcessing Cycle
random features
prep
grouped features
rework
39
Gazetteer DuplicatesProcessing Cycle
random features
prep
grouped features
rework
40
Gazetteer DuplicatesProcessing Cycle
41
Gazetteer DuplicatesProcessing Cycle
review
42
Gazetteer DuplicatesProcessing Cycle
random features
prep
grouped features
rework
weigh
review
accepted
suspended
post
featuredatabase
reject
trash
43
end
44
(No Transcript)
45
Tour (TBD)
Write a Comment
User Comments (0)
About PowerShow.com