Knowledge acquisition and semantic rules discovery for CORINE land cover mapping using Latent Dirich - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

Knowledge acquisition and semantic rules discovery for CORINE land cover mapping using Latent Dirich

Description:

... of operational automatic or semi-automatic methods capable of extracting the ... Operational use of automatic or semi-automatic methods still not very widespread: ... – PowerPoint PPT presentation

Number of Views:163

Avg rating:3.0/5.0

Slides: 17

Provided by: ion89

Category:

more less

Transcript and Presenter's Notes

Title: Knowledge acquisition and semantic rules discovery for CORINE land cover mapping using Latent Dirich

1
Knowledge acquisition and semantic rules
discovery for CORINE land cover mapping using
Latent Dirichlet Allocation

Sixth conference on Image Information Mining
EUSC Torrejon Air Base
3-5 November 2009

2
Land Use Land Cover Mapping

Subject discussed at EU and international level
CLC 2000, CLC 2006
FAO LCCS
GMES Soil Sealing, Forest, Urban Atlas
GlobCover
CLC methodology
High resolution Visible IR data (Landsat, SPOT,
IRS)
False or natural color compositions
Reference Topo 150K maps
classification scheme
Visual interpretation and delineation of polygons
Quality control
Products image data, vector layer, raster layer

3
Difficulties in Land Use Land Cover Mapping

Availability and price of EO data
Resolution characteristics (spatial, spectral,
temporal) not always / yet fulfilling user needs
Landsat data successful in answering current
requirements, no more available replaced by SPOT
and IRS data
New sensor data available not intensive use yet
price is very high
Methodology low productivity due to
Most of the times visual interpretation and
on-the-screen delineation of land cover polygons
are the main tools
Lack of operational automatic or semi-automatic
methods capable of extracting the user defined
classes / methods exists but are not used by the
production unit
Solutions pay for new sensor data and use
automatic methods (i.e. stimulate research and
pay for software) get data for low price and
further invest in human operators and cheap
software tools

4
Difficulties in Land Use Land Cover Mapping

Operational use of automatic or semi-automatic
methods still not very widespread
Sensors produce different data
From one day to another
From one place to another
From one orbit to another, etc
Data coming from different sensors is different
(SPOTltgt IRSltgt RapidEye, ERSltgtRadarsat)
Altered (processed data) can lead to different
results (RAW data vs Radiometrically enhanced
data)
User defined classification schemes
(nomenclatures) application specific not the
same as the classes seen by the computer i.e.
user not interested in a generic bare soil the
user wants non irrigated arable land

5
Current work in ROSA

Many RD projects aiming to increase the
effectiveness of extracting geo-information from
EO data ROSAR, GEOINF, SIGUR, MUTER, ROKEO,
SAFER, GEOLAND
Both object oriented and pixel based methods are
studied towards bridging the gap between high
semantic, user defined needs and state-of-the art
classification algorithms
Results are tested and evaluated against existing
LULC datasets produced and validated under
operational conditions
Algorithms are tested on original data and
efforts are made to make them work on new sensor
data

6
From text to images - Latent Dirichlet Allocation

Latent Dirichlet Allocation from text to image
processing
Probabilistic, Bayesian model focusing on
documents i.e. subsets of pixels (tiles)
Documents are made of words i.e. pixels and
express a latent set of topics i.e classes
The whole set of words define the vocabulary i.e.
the dynamic range
Each topic is modeled as a probability
distribution over the vocabulary
Many documents together form a corpus i.e. a
satellite image
The order of words in a document is ignored (bag
of words)
Based on training sets (histograms of documents),
the model can discover the latent topics in
documens

text corpus
vocabulary
document
topic
topic
topic
7
Latent Dirichlet Allocation
Given a vocabulary with N visual words
For K topics, a two variables model can be derived
And then applied to image document, D
8
LDA for CLC Mapping

Due to complex semantics the (Document, Topics)
couple is more close to the CLC classes i.e. a
document can get a CLC class assigned based on
latent topics inside it
The huge vocabulary in a satellite image (dynamic
range) can be reduced using unsupervised methods
(e.g. k-means, SoilMapper) decrease processing
time the new vocabulary is used to identify
latent topics in a document
The dimension of documents (tiles) function of
the MMU (Minimum Mapping Unit 25 Ha) 15x15
pixels for Landsat, 50x50 pixels for SPOT5
Number of topics used depend on the CLC level
addressed (5, 15, 44) and directly dependent on
the unsupervised process e.g. 35 words for a
Landsat image
Based on latent topics, a document can get a
label according to CLC nomenclature words -gt
topics -gt classes
Semantic rules can be derived for each of the
classes in a nomenclature Class
f(topics(words))

Satellite image
Compressed LANDSAT/ SPOT vocabulary
document
topic
topic
topic
CLC Class
9
MEEO SoilMapper

See Baraldi, A. et. al. 2006. Automatic Spectral
Rule-Based Preliminary Mapping of Calibrated
Landsat TM and ETM Images. IEEE Transactions on
Geoscience and Remote Sensing, vol. 40, no.9
(September 2006), 2563-2585
Image data calibrated to reflectance values using
sensor-specific information
Output preliminary spectral map with pixel
labeled according to an intermediate level
For Landsat data 85, 41, 27 or 16 classes ca be
derived
Operational services provided by MEEO through ESA
SSE

10
CLC Mapping Workflow
Raw data reduced
vocabulary topics on pixels
topics on documents
11
Tests results

Tests were performed on Landsat data subsets of
600x600 pixels and on SPOT data subsets of
2000x2000 pixels, MEEO SoilMapper used for
intermediate classification (vocabulary
compression)

12
Tests results

Example 1 Landsat data 27 words / topics CLC
Level 1 Classes
Overall accuracy 92

13
Tests results

Example 2
Landsat data
41 words / topics
CLC Level 2 Classes
Overall accuracy 70

14
Semantic rules

The current approach allows one to establish
semantic rules for explaining high level semantic
using intermediate level semantic

15
Concluding remarks

A method is proposed based on similarities that
can be established between text and satellite
images used in LULC mapping, based on words,
documents and topics
The current approach show promising results in
trying to find solutions for bridging the gap
between low level semantic data and high level
semantic features by operating an intermediate
level classification and then using LDA and thus
addressing user requirements on providing
meaningful information for policy making
Currently, the availability of fully automatic
methods allowing effective sensor specific
intermediate level classification is exploited in
order to reach a superior semantic level
Although CLC tests were shown, any other high
level semantic nomenclature can be applied
semantic rules can be established between the
intermediate and superior semantic level features
A good level of accuracy is obtain by applying
LDA on intermediate, fully automatic classified
data
In line with current discussions (GMES, INSPIRE)
on new data models for LULC data promoting object
oriented schemes