Title: We have data, now what?
1We have data, now what?
- Carol Song
- Senior Research Scientist
- Rosen Center for Advanced Computing
- Purdue University
- carolxsong_at_purdue.edu
WGISS-26, September 23, 2008
2Understanding and Utilizing Data
- An integrated system for real-time NEXRAD II
radar data delivery and 3D visualization, with
multi-layer user interfaces to reach a wide
audience. - Collaboration among computer scientists and
earth/atmospheric scientists - Team V. Sundaram, L. Zhao, C.X. Song, B. Benes,
P. Kristof, R. Veeramacheneni, M. Huber. - Demand-driven subscription system for real-time
satellite data delivery - Purdue Terrestrial Observatory
- Team R. Kalyanam, L. Zhao, L. Biehl, C.X. Song
- Providing data through services!
Work supported by National Science Foundation
3Next Generation Radar (NEXRAD) Level II Data
Weather Surveillance Radar (WSR-88D)
- This data contains a very fine temporal and
spatial resolution of three attributes
reflectivity, Doppler radial velocity and
spectrum width - These attributes are vital to understanding,
monitoring and predicting severe weather
conditions - There are 135 Radar Stations in the US
- Continuously received in near real-time, streaming
Doppler Radar Tower in Connecticut and the
Pulsed Doppler Radar inside
Acknowledgment Figures are downloaded from
websites www.CCSU.edu and www.answers.com.
4NEXRAD II Data Generation
- 3D structure in Radar Data
- Continuous rotation over 360 in azimuth
- Simultaneous increase in elevation by 1 to 3per
complete sweep - Continuous NEXRAD Level II radar data stream
- Data files vary in size a few MB to tens of MB
each, depending on the weather conditions. - Data compressed with a modified bzip2
- The temporal resolution is 4-5 minutes in severe
weather vs. 9-10 minutes in calm weather
Structure of Doppler Radar Data (Reflectivity )
5NEXRAD II Data Distribution
- The National Climatic Data Center (NCDC) houses
the data and provides a central clearinghouse of
archived Level II data as a resource to the
research, teaching, and technology development
communities. - Distributed through four top tier distributors
- Purdue makes it available on the NSF TeraGrid
- Opportunity!
- The near real-time availability of
high-resolution radar data provides an exciting
opportunity for meteorologists if the data can be
accessed and visualized in 3D in a timely manner. - Super res data becoming available as we speak
6Technical Challenges
- Large volume and real-time streaming (50 MB/s)
presents major computational and data management
challenges. - Super Res data even larger data
- SUPER RESOLUTION DATA INCREASE THE AZIMUTH
RESOLUTION FROM 1 DEGREE TO 0.5 DEGREE. - THE REFLECTIVITY DATA RANGE RESOLUTION FROM 1 KM
TO 0.25 KM...AND DOPPLER DATA RANGE FROM 230 KM
TO 300 KM FOR SPLIT CUTS...GENERALLY SCANS AT 1.5
DEGREES OR LOWER ELEVATION. - THE AMOUNT OF DATA COLLECTED AND TRANSMITTED
DURING A VOLUME SCAN WILL INCREASE BY A FACTOR OF
APPROXIMATELY 2.3. - Lack of scale Analyzing data over a long period
or large geographical region requires heavy
computation - Lack of interactive 3D visualizations
- Despite the availability of 3D information in the
new generation, the data is most commonly
visualized as 2D images, simple 3D Point clouds
or iso-surfaces. - Access Method Download using FTP/HTTP and no
programmatic access - Data Format compressed (modified bzip2) but not
supported by popular libraries (eg RSL)
7NEXRAD data products
- Online data
- original streamed data from NWS (compressed),
searchable from map and downloadable, most recent
months. - Special event data (severe weather events)
- Data services
- Uncompressed data (through data services)
- Variable values (e.g., reflectivity, radial
velocity) - Pre-generated 3D volumes
- Access methods
- Data portal
- THREDDS, OPeNDAP
- Third party viewers (e.g., IDV, Java NEXRAD
viewer) - Programming interfaces APIs (C library)
- New near real-time, interactive 3D visualization
8An End-to-End Integrated System
- Three important components
- Data Management
- Download required files from SRB and uncompress
using modified bzip2 - Data Processing
- Read the radar files using RSL
- Process the data from multiple sites
- Convert them into render-able 3D volumes
- Visualization/Data Rendering
- Import the volumetric data from the disk.
- Create 3D textures and slices and apply the
texture-based volume-rendering techniques. - Utilize transfer functions to render the data on
GPU.
9Sequential Data Processing and Rendering
The flow chart of data processing and rendering
10Scaling using Teragrid
- How to scale? Key Observations
- Spatial parallelism between stations
- Temporal parallelism volumes generated for
intervals are indpendent - Data access can be parallel as well
- Two types of computation tasks
- Processing per station per interval
- Merging combines 3D volumes from all sites and
creates the full 3D volume for each interval - Granularity of Parallelization
- Depends on the processing power available
- Either fine grained (per site per interval ) or
coarse grained (per site ) - Using Condor DAGMan to orchestrate jobs
11Example
- Images rendered at different timestamps using a
dataset from scanning a 24-hour supercell storm
on March 12, 2006, in the Midwest region of the
United States.
12Hurricane Ike reminant
- Hurricane Ike, data from 4 stations (3 in IL and
1 in IN) between 10-noon on Sept. 14, 2008
13A Service Architecture
14Services through multiple interfaces
- Expert use mode
- Need to see details (large data, lots of
processing), highly interactive, ability to
manipulate color mapping and other settings. - With accelerated graphics hardware
- Learning/casual use mode
- Simple interface, no learning curve
- Does not require high degree of details
- Remote access mode
- Through web browser
- No special hardware
- Need interactivity
- Application developers
- Need API or web service interfaces to integrate
with their applications
15Workload distribution Scalability
- Web 2.0 gadget for the masses
- Data preproposed, rendered, composed into
animation on server animation (or sequence of
images) sent over web - Desktop client for maximum interactivity and
performance - Data preprocessed offline and 3D data volumes
cached on server - 3D Graphics rendering on users computer (GPU
enabled) - Web browser access for interactivity but slower
display - Data preproposed offline, 3D volumes cached and
rendered into 3D graphics - Images sent over the network
- User accesses the interactive application through
a VNC based Java applet
16Reach out to the masses
- A LiveRadar3D Google gadget displaying 3D
visualization of radar data, continuously updated
with streaming data
17The fully Interactive 3D visualization Client
183D Visualization of all stations
19Summary
- Remote 3D visualization services delivered
through multiple interfaces - Application interface of data services for third
party integration - An architecture that scales to different use
scenarios - Parallel data pre-processing using the TeraGrid
Condor resources and partial volume caching which
improve the response time and scalability of the
system. - Continuing effort
- User feedback
- Scale support multiple users simultaneously
- Hierarchical 3D volume structure to support
multi-scale investigation
20Thank you!
- Publications, URLs available.
- Feel free to contact Carol
21PRESTIGEPurdue Real-Time Satellite Information
Gateway
- User Requirement
- Receive continuous data updates
- Real-time or near-real-time access
- Custom-tailored data configurations
- Current Systems
- Impossible to generate complete range of data
products - Have to route through the support staff
- Manual process which is time consuming and
error-prone
22Range of MODIS Data Products
Note that each data set product may contain a few
to many variables.
- Level 1A (MOD01)
- Vegetation Index (MOD09)
- Geolocation (MOD03)
- Aerosol (MOD04)
- Water Vapor (MOD05)
- Clouds (MOD06)
- Atmospheric Profiles (MOD07)
- Reflectance (MOD09)
- Snow (MOD10)
- Fire Detection (MOD14)
- Ocean Color (MOD18)
- Sea Surface Temperature (MOD28)
- Sea Ice (MOD29)
- Cloud Mask (MOD35)
- Also Multiday composites of above
23System Design
- User-driven publish/subscribe model
- Dynamic data generation
- User specifies, controls, and receives
custom-tailored data - Continuous data updates in near-real-time
- Multiple ways to access the data
24(No Transcript)
25Satellite Data Subscription
26Data Subscription
- Web portal based user interface
- Choice list based option selection
- Options include Satellite, Coverage area, Data
product, Projection type and Data format - Ability to select date range for subscription
validity - User-driven product choice expansion
- Individual user-based subscriptions
- User-initiated data production
- Data products generated only when some user is
subscribed to the product - Data production automatically turned off when no
active subscription exists
27Data Notification
- Push-based notifications
- Near real-time delivery of new data notification
through email - Implemented by automatically invoking a
web-service from the processing cluster when new
data is available - Subscription database used to query active
subscriptions - Data delivery mechanism
- Data scped from processing cluster to
webserver-accessible storage space - Thumbnail generated for images to provide a quick
look feature - Link to the webserver data location provided in
the notification email
28Sequential Processing of Radar Data
- We use 3D-Texture based volume rendering for
high-quality visualization - To ensure efficient volume rendering, all data is
resample into a 3D rectilinear grid - Global spherical coordinates
- RSL stores local coordinates (azimuth, elevation
and range ) with the origin at location of each
station. - To combine multiple stations, all local
coordinates are converted to global spherical
coordinates ( latitude, longitude and altitude ) - Interpolation based on time-stamps
- Since different radars are operated at different
tempos, the files are interpolated based on
time-stamps. - Averaging redundant data samples
- The areas where different radars intersect, the
radar reflectivity values are averaged.
3D 256x256x128 grid structure and bounding box
29Data Access API
- We developed a library API called RadarSetLib
that provides a programmatic access to retrieve
any desired file available in SRB. - The important part of our API
- buildDataList - retrieves all the matching file
names for a particular station and a time period. - getOldestFileName - retrieves the oldest file
name available for a given station. - getRadarFile - retrieves the radar file from SRB
with or without uncompressing. - readRadar - retrieves the radar data file from
SRB, uncompresses it, and then stores the
converted data to a Radar structure in memory
using RSL.