Knowledge acquisition and semantic rules discovery for CORINE land cover mapping using Latent Dirich - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Knowledge acquisition and semantic rules discovery for CORINE land cover mapping using Latent Dirich

Description:

... of operational automatic or semi-automatic methods capable of extracting the ... Operational use of automatic or semi-automatic methods still not very widespread: ... – PowerPoint PPT presentation

Number of Views:163
Avg rating:3.0/5.0
Slides: 17
Provided by: ion89
Category:

less

Transcript and Presenter's Notes

Title: Knowledge acquisition and semantic rules discovery for CORINE land cover mapping using Latent Dirich


1
Knowledge acquisition and semantic rules
discovery for CORINE land cover mapping using
Latent Dirichlet Allocation
  • Sixth conference on Image Information Mining
  • EUSC Torrejon Air Base
  • 3-5 November 2009

2
Land Use Land Cover Mapping
  • Subject discussed at EU and international level
  • CLC 2000, CLC 2006
  • FAO LCCS
  • GMES Soil Sealing, Forest, Urban Atlas
  • GlobCover
  • CLC methodology
  • High resolution Visible IR data (Landsat, SPOT,
    IRS)
  • False or natural color compositions
  • Reference Topo 150K maps
  • classification scheme
  • Visual interpretation and delineation of polygons
  • Quality control
  • Products image data, vector layer, raster layer

3
Difficulties in Land Use Land Cover Mapping
  • Availability and price of EO data
  • Resolution characteristics (spatial, spectral,
    temporal) not always / yet fulfilling user needs
  • Landsat data successful in answering current
    requirements, no more available replaced by SPOT
    and IRS data
  • New sensor data available not intensive use yet
    price is very high
  • Methodology low productivity due to
  • Most of the times visual interpretation and
    on-the-screen delineation of land cover polygons
    are the main tools
  • Lack of operational automatic or semi-automatic
    methods capable of extracting the user defined
    classes / methods exists but are not used by the
    production unit
  • Solutions pay for new sensor data and use
    automatic methods (i.e. stimulate research and
    pay for software) get data for low price and
    further invest in human operators and cheap
    software tools

4
Difficulties in Land Use Land Cover Mapping
  • Operational use of automatic or semi-automatic
    methods still not very widespread
  • Sensors produce different data
  • From one day to another
  • From one place to another
  • From one orbit to another, etc
  • Data coming from different sensors is different
    (SPOTltgt IRSltgt RapidEye, ERSltgtRadarsat)
  • Altered (processed data) can lead to different
    results (RAW data vs Radiometrically enhanced
    data)
  • User defined classification schemes
    (nomenclatures) application specific not the
    same as the classes seen by the computer i.e.
    user not interested in a generic bare soil the
    user wants non irrigated arable land

5
Current work in ROSA
  • Many RD projects aiming to increase the
    effectiveness of extracting geo-information from
    EO data ROSAR, GEOINF, SIGUR, MUTER, ROKEO,
    SAFER, GEOLAND
  • Both object oriented and pixel based methods are
    studied towards bridging the gap between high
    semantic, user defined needs and state-of-the art
    classification algorithms
  • Results are tested and evaluated against existing
    LULC datasets produced and validated under
    operational conditions
  • Algorithms are tested on original data and
    efforts are made to make them work on new sensor
    data

6
From text to images - Latent Dirichlet Allocation
  • Latent Dirichlet Allocation from text to image
    processing
  • Probabilistic, Bayesian model focusing on
    documents i.e. subsets of pixels (tiles)
  • Documents are made of words i.e. pixels and
    express a latent set of topics i.e classes
  • The whole set of words define the vocabulary i.e.
    the dynamic range
  • Each topic is modeled as a probability
    distribution over the vocabulary
  • Many documents together form a corpus i.e. a
    satellite image
  • The order of words in a document is ignored (bag
    of words)
  • Based on training sets (histograms of documents),
    the model can discover the latent topics in
    documens

text corpus
vocabulary
document
topic
topic
topic
7
Latent Dirichlet Allocation
Given a vocabulary with N visual words
For K topics, a two variables model can be derived
And then applied to image document, D
8
LDA for CLC Mapping
  • Due to complex semantics the (Document, Topics)
    couple is more close to the CLC classes i.e. a
    document can get a CLC class assigned based on
    latent topics inside it
  • The huge vocabulary in a satellite image (dynamic
    range) can be reduced using unsupervised methods
    (e.g. k-means, SoilMapper) decrease processing
    time the new vocabulary is used to identify
    latent topics in a document
  • The dimension of documents (tiles) function of
    the MMU (Minimum Mapping Unit 25 Ha) 15x15
    pixels for Landsat, 50x50 pixels for SPOT5
  • Number of topics used depend on the CLC level
    addressed (5, 15, 44) and directly dependent on
    the unsupervised process e.g. 35 words for a
    Landsat image
  • Based on latent topics, a document can get a
    label according to CLC nomenclature words -gt
    topics -gt classes
  • Semantic rules can be derived for each of the
    classes in a nomenclature Class
    f(topics(words))

Satellite image
Compressed LANDSAT/ SPOT vocabulary
document
topic
topic
topic
CLC Class
9
MEEO SoilMapper
  • See Baraldi, A. et. al. 2006. Automatic Spectral
    Rule-Based Preliminary Mapping of Calibrated
    Landsat TM and ETM Images. IEEE Transactions on
    Geoscience and Remote Sensing, vol. 40, no.9
    (September 2006), 2563-2585
  • Image data calibrated to reflectance values using
    sensor-specific information
  • Output preliminary spectral map with pixel
    labeled according to an intermediate level
  • For Landsat data 85, 41, 27 or 16 classes ca be
    derived
  • Operational services provided by MEEO through ESA
    SSE

10
CLC Mapping Workflow
Raw data reduced
vocabulary topics on pixels
topics on documents
11
Tests results
  • Tests were performed on Landsat data subsets of
    600x600 pixels and on SPOT data subsets of
    2000x2000 pixels, MEEO SoilMapper used for
    intermediate classification (vocabulary
    compression)

12
Tests results
  • Example 1 Landsat data 27 words / topics CLC
    Level 1 Classes
  • Overall accuracy 92

13
Tests results
  • Example 2
  • Landsat data
  • 41 words / topics
  • CLC Level 2 Classes
  • Overall accuracy 70

14
Semantic rules
  • The current approach allows one to establish
    semantic rules for explaining high level semantic
    using intermediate level semantic

15
Concluding remarks
  • A method is proposed based on similarities that
    can be established between text and satellite
    images used in LULC mapping, based on words,
    documents and topics
  • The current approach show promising results in
    trying to find solutions for bridging the gap
    between low level semantic data and high level
    semantic features by operating an intermediate
    level classification and then using LDA and thus
    addressing user requirements on providing
    meaningful information for policy making
  • Currently, the availability of fully automatic
    methods allowing effective sensor specific
    intermediate level classification is exploited in
    order to reach a superior semantic level
  • Although CLC tests were shown, any other high
    level semantic nomenclature can be applied
    semantic rules can be established between the
    intermediate and superior semantic level features
  • A good level of accuracy is obtain by applying
    LDA on intermediate, fully automatic classified
    data
  • In line with current discussions (GMES, INSPIRE)
    on new data models for LULC data promoting object
    oriented schemes

16
Contact
  • Dragos Bratasanu, ROSA dragos.bratasanu_at_rosa.ro
  • Ion Nedelcu, ROSA ion.nedelcu_at_rosa.ro
  • Mihai Datcu, DLR mihai.datcu_at_dlr.de
Write a Comment
User Comments (0)
About PowerShow.com