Title: Supporting Web-based Visual Exploration of Large-Scale Raster Geospatial Data Using Binned Min-Max Quadtree
1Supporting Web-based Visual Exploration of
Large-Scale Raster Geospatial Data Using Binned
Min-Max Quadtree
- Jianting Zhang12, Simin You2
- City College1 Graduate Center2 of
- The City University of New York
2Outline
- Motivation and Introduction
- Background and Related Work
- Binned Min-Max Quadtree
- Index Construction
- Query Processing
- System Architecture
- Experiments and Evaluation
- Conclusion and Future Work
3Motivation/Introduction
If you load your own data in Google Earth,
In addition to simple display, zoom in/out, pan
Wouldnt it be nicer if you can query your data
and highlight the query results?
3
4Undergraduate Project Generate Dynamic KML Files
for Interactive Visualization in Google Earth (C.
Dasrat/CCNY)
July
Jan
Global 30s Precipitation Data from WorldClim
(Interpolated 1950-2000)
Coloring Schema Green 0 mm Red 100 mm Linear
Interpolation
5Motivation/Introduction
- Task Find/show regions where precipitation
amount in January is between p1,p2). - Intuitive Solution
- Loop through all the raster cells and return all
the cell locations. - Problem long evaluation time and difficulty in
visualizing query results in Web browsers for
practical reasons. - Our Solution
- Backend Index raster data, perform the query in
main memory and return a set of quadrants
(SSDBM10) - Middleware Dynamically generate tiled images
on-demand based on users current view and cache
the tiled images as necessary (Com.Geo10) - Ongoing work massively parallel indexing using
GPGPU (20X speedup)
6Background Related Work
- Spectral, spatial and temporal resolutions of
raster geospatial data are getting increasingly
finer ?larger data volumes - The next generation GOES-R satellite will provide
global coverage at the 0.5-2 km resolution every
5 minutes (16 bands) - Numerous derived products from satellite images
https//lpdaac.usgs.gov/lpdaac/products/modis_prod
ucts_table - Large-scale model simulation results (e.g. WRF)
7Background Related Work
- Manually examine all the data through visual
display is not possible anymore - Human eyes can only effectively distinguish a
limited number of colors at a time - Studies show that screen resolution beyond 4000
by 4000 pixels is not effective - Query data and highlight results (Region of
Interests) for further analysis become more
preferable
8Background Related Work
- Query Driven Visual Exploration of Scientific
Data - Wu et al 2003, Stockinger et al 2005, Rubel et
2008 - Glatter et al 2006, Kendall et al 2009, Fuchs et
al 2009 - Indexing and Query Processing in Spatial
Databases - Overview Gaede and Gunther 1998, Samet 2005
- Vector data R-Tree, Quad-Tree
- Raster data very limited (except tiling/pyramid)
9Background Related Work
- Managing Multi-dimensional Array Data
- Array query definition language Baumann et al
1997, Marathe and Salem 1999, Baumann 2009 - Physical data layout Sarawagi and Stonebraker
1994, Otoo and Rotem 2006, Kim and Jaja 2007,
Otoo et al 2007 - Information Visualization/Visual Exploration
- Desktop Systems Prefuse, GeoVista, GeoDa, IDV
- Web-based Wood et al 2007, Dork et al 2008
- Main-memory based, no database backend support
- Scalability problem ?integrating high-performance
database engines with information
visualization/visual exploration modules
10Binned Min-Max Quadtree (BMMQ-Tree)
- Designed to support ROI finding queries
- Given a set of rasters representing
environmental variables Fi0ltiltn over a spatial
domain D - A ROI finding query Q identifies regions in D
whose cells Cj satisfy the compound condition
op can be either conjunctive and disjunctive,
0ltkltn
lower and high bounds of query Q for variable i
11Binned Min-Max Quadtree
- Why Tree-based indexing?
- A ROI query is a global operation on rasters
- Without indices, scanning whole rasters is
required - Disk IOs are most expensive along storage
hierarchy - Performance is limited by disk IOs.
- With tree-based indexing
- Quickly prune irrelevant branches reduce disk
IOs - Access disk files only when necessary
- Answer a large portion of queries directly
without incurring disk IOs - Indices with small memory footprint can be
main-memory resident
12Binned Min-Max Quadtree
- Why Binned Min-Max Quadtree?
- Associate min/max values with each quadtree node
to help ROI-based queries popular in 3D
graphics for generating iso-surfaces and tracing
rays - First law of geography Everything is related to
everything else, but near things are more related
than distant things (Tobler 1970) - However, neighboring cells values often are
slightly different - Binning helps quadrant uniformity and reduce
quadtree complexity
13Index Construction
14Query Processing Arbitrary Spatial Window
15Tile size NN klog2N
Query Processing Tile Based (Parallelization
possibility)
value range 1,3 under tile (0,1,1)
16Binned Min-Max Quadtree
- BMMQ-Tree integrates features of Binned Bitmap
Indexing and Min-Max kd-trees and octrees - A BMMQ-Tree query results is a set of quadrants
that can be expressed as (X,Y,L) tuples
suitable for data communication between clients
and servers - A BMMQ-Tree query can terminates when the spatial
extent that a quadtree node represents is less
than a screen pixel (Less-than-Single-Pixel
stopping policy) - May result in false positives - NOT necessarily
bad for visual explorations - Identifying Region of Interests is the primary
goal - Details on demand for further examination
17Prototype System
- Original design
- Rendering quadrants as vector objects using Flex
RIA APIs at the client side - Powerful and flexible control rendering at the
pixel level in Web browsers - The performance is poor when the number of
quadrants is above the order of a few thousands - We consider the results as lessons rather than
achievements - Current design (COM.GEO10)
- Support tile based queries
- Render resulting quadrants as binary images in
the middleware - Client is responsible for formulating tiles,
submitting queries and visualizing query results - Significant better performances
18Prototype System Architecture
19- Online demo http//134.74.112.202/comgeo/testover
lay.html
20Experiments and Evaluation
- Data WorldClim January Precipitation Data at 30s
resolution (4320021600) - Value range 0,1003
- Quadtree level16
- Query processing server Dell T5400
- Ad-hoc queries (arbitrary parameters)
- Three bin sizes 8, 16, 32
- Query value range 90,300)
- Eight spatial query windows of sizes around 65
degrees (lon) by 55 degrees (lat) - Tile-based queries (more systematic)
- Bin size32
- Tile size 256256 (k8)
- For query value range0,1003 6848 tiles
- For query value range90,300) 1197 tiles
21Results of Ad-hoc Queries
Less-Than-Single-Pixel stopping policy NOT
applied (Max Level16, results in milliseconds)
B8 B16 B32
Q1 160 153 183
Q2 116 121 163
Q3 112 162 252
Q4 160 153 182
Q5 51 42 47
Q6 91 97 140
Q7 86 108 169
Q8 81 94 105
22Results of End-to-End Performance using OLD Design
Less-Than-Single-Pixel stopping rule Applied Max
Level12 for query window sizes 6555 degrees Bin
size32
23Results of End-to-End Performance using New Design
- Estimating End-to-End time
- Assume available network bandwidth300k Bps?
TT10ms - Assume client display area 10241024?16 tiles
(Parallelizable) - Assume no server/client side caching (cold start)
- Assume rendering times for small images in Web
browsers are negligible - Estimated time (QTGTTT)16
(501010)161120 ms
24Conclusions
- The proposed BMMQ-Tree data structure can be used
to efficiently process ROI-finding queries on
large scale raster geospatial data. Queries can
be processed in fractions of a second for large
query windows. - Tile-based query and dynamic tile image
generation (middleware) and rendering (client)
are more suitable for visualizing complex query
results than client side rendering. - New experimental results have showed that we are
able to achieve an end-to-end performance in the
order of sub-second for 10241024 pixels display
area using 16 tiles. The performance can be
further improved by parallel tile-based
processing.
25Additional Information
- GPU-based indexing
- Nvidia Quadro FX3700 GPU card with 112 cores and
512M device memory - Raster size is limited to 40964096 due to device
memory constraints ? 115 blocks - 20X speedup (8.7s vs. 0.4s)
- We expect to index the same global data on SGI
Octane III 2-node mini-cluster with 4 GPU cards
in about 1-5 seconds after fine-tuning our
current codebase? real time indexing
26Relationship with the Big Picture Visual
Explorations of Global Biodiversity Patterns