Tony Rees - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Tony Rees

Description:

Region covered varies for each scene. Often is requirement to retrieve ... Polar X-Y coordinates used to interpolate between quoted boundary points (gives ... – PowerPoint PPT presentation

Number of Views:26

Avg rating:3.0/5.0

Slides: 22

Provided by: ree103

Category:

more less

Transcript and Presenter's Notes

Title: Tony Rees

1
Application of c-squares spatial indexing to an
archive of remotely sensed data
Tony Rees Divisional Data Centre CSIRO Marine
Research, Australia (Tony.Rees_at_csiro.au)
2
The starting point

Archive of c.60,000 satellite scenes,
1986-current, accumulating at another 15 per day
Region covered varies for each scene
Often is requirement to retrieve scenes including
a specific region or place
Low-res GIF images (like examples shown) are
available for last 24 months only
Polygons can be calculated for each scene, but
currently are not stored in any spatially
searchable system.

3
Options for spatial indexing - 1

Bounding rectangles
Mostly simple to generate/ store/ query
Frequently a poor fit to the actual dataset
extent
Potentially become massively over-representative
as data approaches the pole
Do not cope well with oblique or curved
boundaries to the dataset footprint (common in
satellite images)
Poor discrimination for fine-scale queries (many
false positives returned)
Can be problems with footprints which cross the
date line, or include a pole

Scene footprint
bounding rectangle (follows lat, long grid lines)
4
Options for spatial indexing - 2

Bounding polygons
OK to generate/ store
Best fit to the actual dataset extent
Need additional points as data approaches the
pole (regions of high curvature)
Computationally expensive to query (often use
bounding rectangles as well, as pre-filter)
Tend to require a dedicated GIS system to handle
the spatial queries.

bounding polygon inflection points
5
Options for spatial indexing - 3

Raster (tiled) approach
Simple to generate/ store (as list of tiles
intersected)
Reasonable fit to the actual dataset extent
improves with decreasing unit square size
Automatically increases resolution as data
approaches the pole (regions of high curvature)
Simple/ rapid to query, no computation or
dedicated GIS required
No ambiguity with footprints which extend across
date line, or include a pole.

boundary of all tiles for this scene
6
C-squares basics

Based on a tiled (gridded) representation of the
earths surface, at choice of 10 x 10, 5 x 5, 1 x
1, 0.5 x 0.5 degree tiles, etc. (0.5 x 0.5 degree
tiles are used in this example)
Every tile has a unique, hierarchical
alphanumeric ID (incorporates the ID of all
parent, grandparent, etc. in every child tile ID)
Dataset (scene) extents are represented by a
list of all the tiles included in, or intersected
by the footprint
Spatial search comprises looking for one or more
tile IDs in the set associated with any dataset
( simple text search).
(more details see www.marine.csiro.au/csquares/
)

7
Spatial queries supported

Retrieve all scenes which include all or part of
a given tile (at 10 x 10, 5 x 5, 1 x 1, or 0.5 x
0.5 degrees), or set of tiles
optionally, filter also by other criteria e.g.
date, satellite, etc.
Retrieve all tiles associated with a given scene,
with option to
draw representation on a map (or range of maps)
export the list to a file or another database
compare the list with other lists ( polygon
overlays)
NB, choice of initial tile size is important
- too few tiles, only coarse spatial queries
supported
- too many tiles, indexes get very large (and
queries potentially slow)
However, compression of blocks of contiguous
tiles can be quite effective (1 code can replace
4, 25, 100, or 400 codes in certain cases).

8
A real-world exampleNOAA-12 .. 21 Jun 2003 0623
10 x 10 degree squares (28) (base level of
hierarchy, cannot compress)
5 x 5 degree squares (99) 36 after comp.
1 x 1 degree squares (1,982) 515 after comp.
0.5 x 0.5 degree squares (7,691) 704 after comp.
0.5 x 0.5 degree squares - detail
9
Practicalities for satellite data

A single scene may measure (say) 40 x 50 degrees
approx., 2000 1 x 1 degree squares, or 8000 0.5
x 0.5 degree squares
Quadtree-like compression reduces this to (e.g.)
500 codes at 1 x 1 degree resolution, 700 codes
at 0.5 x 0.5 degree resolution
Still require quite a lot of codes (e.g. 42
million) to represent a collection of 60,000
scenes
Each code is 10 characters long, scene IDs are
(say) 6 characters long, thus c.670 million bytes
required for raw index data (before compiling
secondary indexes, etc.)
With secondary indexing, probably need around 2
Gb () to hold the spatial index.

10
Designing the index

Option 1 a table of scenes, with string of codes
comprising each scene footprint, 1 row per scene
comment fast to retrieve all codes for a scene,
slow for a spatial query. List of codes may
present storage problems (e.g. in database
field/s) owing to length.
Option 2 a table of c-square codes, with list of
scenes containing any code, 1 row per code ( an
inverted index)
comment fast for a spatial search, slow to
retrieve all codes for a scene. List of scenes
may present storage problems (e.g. in database
field/s) owing to length. Harder to manage with
respect to scene addition/deletion etc.
(continues)

11
Designing the index (contd)

Option 3 a table of scenes/c-square pairs, 1 row
per pair, indexed on both columns
comment OK to retrieve all codes for a scene,
also for a spatial query. Row length is always
constant, new rows simply created as required.
However, table potentially becomes very long
(e.g. 40 million rows), may become slow to query
Option 4 as option (3), however single table
replaced by multiple smaller tables e.g. split
by blocks of c-squares, blocks of scenes, or
year, or satellite
comment will tend to be tuned to favour
queries which can be completed by accessing
smallest number of tables. Could overcome this by
adding (say) element of Option 1 in parallel
(with added cost of storing the same information
in more than one place).

12
Present philosophy

Currently going with Option 3 a table of
scenes/c-square pairs, 1 row per pair, indexed on
both columns.
However, will consider splitting this along
spatial or date lines if performance degrades
excessively as the index grows ( Option 4) may
then add in Option 1 in parallel also, for rapid
footprint retrieval if needed.

13
Actual table structure 2 search tables only
Scene details (1 row per scene)
Scene/c-square pairs (600-700 rows per scene)
14
Spatial Search Interface (prototype)
www.marine.csiro.au/csquares/satsearch.htm
15
Clickable map interface
16
Example search result
17
SQL for spatial search(example for 0.5 degree
search square)
select distinct A.scene_id, B.satellite,
B.scene_date_time, B.image_location from
satdata.satdata_csq_all A, satdata.scene_info B
where ( (sqrsize 0.5 and
(A.csquare search_csq -- e.g. 34141001
(0.5 degree square) or A.csquare
substr(search_csq,1,8) '' -- 1
level of compression or A.csquare
substr(search_csq,1,6) '' -- 2
levels of compression or A.csquare
substr(search_csq,1,4)'') -- 3 levels
of compression ) -- (plus other supported
search square size options go here)
) and (startdate is null or
B.scene_date_time gt startdate) and
(enddate is null or B.scene_date_time lt
enddate) and (sat 'any' or
B.satellite sat) and A.scene_id
B.scene_id order by B.scene_date_time,
B.satellite
18
This can be triggered by a HTTP request, e.g. as
follows
http//aqua.hba.marine.csiro.au7272/marq/owa/satd
ata_search.get_scenes?satanystartday20startmon
06startyr2003endday22endmon06endyr2003W_
long150N_lat-25E_long155S_lat-30
Generates either HTML result page to browser,
or machine parseable scenes list, e.g. h1awj
h1awl h1awm h1awo h1awr h1awt h1aww h1awy h1ax0
h1ax2 h1ax5 h1ax7 h1ax9 h1axb h1axd h1axe h1axh
h1axj h1axl h1axn h1axo h1axr h1axt h1axw h1axx
h1ay0 h1ay1 h1ay4 h1ay8 h1ay9 h1ayb h1ayc h1ayd
h1aye h1ayg h1ayh h1ayj h1ayl h1ayo h1ayp h1ayr
h1ays h1ayu h1ayw h1az1 h1az3 h1az5 h1az8 h1aza
h1azg h1azj h1azl h1azm h1azn h1azp h1azq h1azt
h1azw h1azz h1b00 h1b03 h1b04 h1b06 h1b09 h1b0a
h1b0c h1b0d h1b0f h1b0i h1b0j h1b0n h1b0q h1b0t
h1b0u h1b0w h1b0y h1b10 h1b12 h1b13 h1b15 h1b18
h1b1b h1b1e h1b1h h1b1j h1b1m h1b1o h1b1r h1b1u
h1b1w h1b20 h1b21 h1b24 h1b26 h1b28 h1b2a h1b2b
h1b2e h1b2g
19
Automated scene processing and upload

An automated routine prepares 2 sets of files
scene information, and scene polygons, in batches
of 100-300 scenes
For the first batch, a script initiates file
upload to the database, then file conversion
(c-square generation) for each scene in the batch
(currently takes c.3 minutes/scene)
When all scenes have been converted, the
c-squares are added to the master searchable
table, and the scenes are flagged as upload
completed
Close to the estimated completion time, a robot
wakes up and checks at 10 minute intervals to
detect when the job is complete (via a http call
to the same procedure which creates the web scene
count on the opening screen).
When the scene count returns 0 scenes currently
being uploaded, the robot calls the script to
start again from stage (2)-(5), with the next
batch of scenes.

20
C-square encoding algorithm

First generation conversion code constructed
January-May 2003, at CMR Hobart
Currently runs in Oracle PL/SQL, reads/writes
from/to 6 Oracle tables (including compression of
the c-square string from each scene)
Includes special treatment for
scenes crossing date line
scenes including a pole
Polar X-Y coordinates used to interpolate between
quoted boundary points (gives better
representation for southernmost boundary)
Special anti-bleed treatment incorporated
(polygons touching, but not entering a given
square do not trigger that square)
Could probably benefit from re-writing the code
in a different language for faster operation,
and/or refining method of operation.

21
Next steps

Keep a watch on index performance (i.e.,
retrieval times) as database size increases
Assess whether this approach is attractive, cf.
systems already in use elsewhere and whether of
interest to others
Look into improving the encoding procedure speed
(currently a bottleneck for processing a large
archive).

Write a Comment

User Comments (0)