Using Metadata to Link Uncertainty - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Using Metadata to Link Uncertainty

Description:

Ships that pass in the night and speak each other in passing; ... 'Truth, as in a single, incontrovertible and correct fact, simply does not exist ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 26
Provided by: richardw57
Category:

less

Transcript and Presenter's Notes

Title: Using Metadata to Link Uncertainty


1
Using Metadata to Link Uncertainty and Data
Quality
Richard Wadsworth
Ships that pass in the night and speak each other in passingOnly a signal shown and a distant voice in the darknessSo on the ocean of life we pass and speak one another, Only a look and a voice then darkness again and a silence.   Tales of a Wayside Inn. Part iii. Henry Wadsworth Longfellow. (18071882)

2
Data is a only a representation of a phenomena
3
Interested in
  • Information not Data
  • Users not Producers
  • What Metadata ought to be doing not what it is
    doing
  • Exploiting data not discovering data
  • Using land cover as an example

4
Information v. Reality
  • Truth, as in a single, incontrovertible and
    correct fact, simply does not exist for much
    geographical information
  • Real world infinitely complex
  • All representations involve
  • Abstraction, Aggregation, Simplification etc.
  • Choices about representation depend on
  • Commissioning variation (who paid for it?)
  • Observer variation (what did you see?)
  • Institutional variation (why do you see it that
    way?)
  • Representational variation (how did you record
    it?)

5
Geographic Objects
Geographic objects
Well defined objects
Poorly defined objects
Buildings, Roads,
Ambiguous objects
Vague objects
Discordant objects
Non-specific objects
Mountains, Sand dunes,
Forests, Bogs,
Rural England, improved grassland
6
An Example land cover
  • Two maps (LCMGB, LCM2000), produced 10 years
    apart, by the same people, using the same basic
    approach (automatic classification of Landsat
    ETM data)
  • In the OS SK tile (100 x 100km around Leicester)
  • In 1990 lt1 ha of Bog (12 pixels).
  • In 2000 gt7,500 ha of Bog (120,728 pixels)
  • Was this global change? (probably not )

7
Commissioning context - LCMGB
8
Commissioning context LCM2000
9
Classes cluster in attribute space
10
Classes and relationships change
11
Specific outcome for Bog
In 1990 Bog was a land cover defined by what
could be seen ...permanent waterlogging,
depositions of acidic peat permanent or
temporary standing water ...water-logging,
perhaps with surface water, In 2000 Bog was
a priority habitat and identification needed
ancillary data in areas with peat gt0.5 m deep
(no reference to water).
12
Another example - What is a forest?
13
FAO - Forest Resource Assessments
14
Spatial characteristics also changed
15
So lets standardise everything?
Standards organisations Want their standard to
be adopted
Producers Want to show they can follow a recipe
Users Want reassurance
Mediators Want to show the data to be of
merchantable quality
16
Standards an analogy
  • Your car
  • Standards are created (you must have an MOT)
  • Producers ensures their cars conform
  • Mediators (sellers) advertise compliance with the
    standard
  • User (buyer) is reassured that the car is ok
  • BUT people
  • Buy an AA assessment of a particular car
  • Use Which Report (or Jeremy Clarkson?) to
    understand whether the type of car is Useful not
    just Useable
  • For data there is no Which Report

17
Data Quality Standards
  • Once dominated by the national mapping agencies
    and software companies, now dominated by ISO, the
    Open GIS Consortium etc.
  • The big 5 of geo-spatial data quality
    standards
  • Positional Accuracy,
  • Attribute Accuracy,
  • Logical Consistency,
  • Completeness,
  • Lineage.
  • Salgé (1995) tried to introduce the concept of
    semantic accuracy but has largely been ignored.

18
Data Quality v. geographic objects
Nature of Geographic Reality Nature of Geographic Reality Nature of Geographic Reality Nature of Geographic Reality
Well defined objects Poorly defined objects Poorly defined objects Poorly defined objects
Well defined objects Vague objects Ambiguity Ambiguity
Well defined objects Vague objects Discordant objects Non-specific objects
Measures of Data Quality Positional accuracy Yes Yes, but No No
Measures of Data Quality Attribute accuracy Yes Yes, but No No
Measures of Data Quality Logical consistency Yes Yes, but No No
Measures of Data Quality Completeness Yes Yes, but No No
Measures of Data Quality Lineage Yes Yes Yes Yes
19
Uncertainty v. geographic objects
Nature of geographic Reality Nature of geographic Reality Nature of geographic Reality Nature of geographic Reality
Well defined objects Poorly defined objects Poorly defined objects Poorly defined objects
Well defined objects Vague objects Ambiguity Ambiguity
Well defined objects Vague objects Discordant objects Non-specific objects
Techniques to process uncertainty Probability Yes No No No
Techniques to process uncertainty Fuzzy sets (Yes) Yes Yes ?
Techniques to process uncertainty Dempster-Shafer (Yes) Yes Yes ?
Techniques to process uncertainty Endorsement Theory (Yes) (Yes) Yes Yes
Including Monte Carlo, bootstrapping,
conditional simulations, frequency, confusion
matrices etc.
20
What can be done?
  • IF you stretch Metadata to include
  • Scientific and policy background (context)
  • Organisational and institutional origins of the
    conceptualisation (ontology)
  • How were objects measured (epistemology)
  • How were classes specified (semantics)

Then
21
Semantic-Statistical Comparisons
One experts opinion of the semantic relationship
between classes in two land cover maps. (From
blue to red. Expected and uncertain
relationships)
22
Assume landscape consists of segments
For class A expected score 18, uncertain
score 7 (4 class B pixels 3 class C
pixels) unexpected score 1 (the single pixel
of class D).
23
Segment in second classification
For class A expected score 19 (class X),
uncertain score 5 (class Z) unexpected
score 2 (class Y).
24
Combine Scores
Scores are treated as if they were probabilities
then using Dempster-Shafer Belief (Bel1.Bel2
Unc1.Bel2 Unc2.Bel1) / ß where ß (1
Bel1.Dis2 Bel2.Dis1) Bel1 Bel2 the
beliefs (expected), Unc1 Unc2
uncertainties (uncertain), Dis1 Dis2
disbeliefs (unexpected). For class A. Bel1
18/26 0.692, Unc1 7/26 0.269, Dis1 1/26
0.038 Bel2 19/26 0.731, Unc2 2/26
0.077, Dis2 5/26 0.192 Therefore ß 1
0.6920.192 0.7310.038 0.839 Belief
(0.6920.731 0.6930.077 0.7310.269) / 0.839
0.901 The belief has increased therefore we
consider that the segment is consistent for A
25
Conclusions
  • Understanding data meaning is increasingly
    important
  • Increased number of users
  • Spatial Data Initiatives
  • Decreased role of old fashioned but complete
    metadata (the survey memoirs)
  • Naive belief in technology as a solution
    (standards, inter-operability etc).
  • Metadata needs to include
  • - user experience
  • - producers understanding of the data
  • origins of the information
  • expanded Logical Consistency
Write a Comment
User Comments (0)
About PowerShow.com