Title: Yan Huang
16350 Spatio-temporal Data Processing Course
Overview
- Yan Huang
- huangyan_at_unt.edu
2Basic Information
- Instructor Yan Huang (huangyan at unt.edu)
- Meeting place and time M 230-520pm B157
- Office hours M 1230-230pm
3Basic Information
- TA Sasi Koneru (SasiKoneru_at_my.unt.edu)
- Office hours Monday 1000 AM to 200 PM, F208
4Evaluation
- The evaluation scheme will be
- class participation 10
- paper analysis and presentation - 25
- project - 40.
- Term paper 30
5Classroom policy
- No computers or laptops unless told so.
6Paper Analysis I
- Collect 5 or more papers in one sub-area
- Write short summaries for 3 (100-200 words)
- Make a 15 minutes presentation on what you learn
on this topic - The presentation will take an integrated approach
where you introduce the motivation of the three
papers, give a precise problem definition,
compare and contrast the ways the 3 papers
approach the problem and how they validate their
results, present conclusions, and point to some
future directions if you can identify
7Paper Analysis II
- Choose and present one paper from the reading
list - Collect two questions from each group
- Ask two questions yourself
- Lead group discussion
- Detail instructions are available from
- http//www.cse.unt.edu/huangyan/6350/paperAnalysi
s.txt - One paper every week
8Find Related Work
- Need to know the key words
- May need to explore and refine during your search
- Often you can find electronic version of the
papers, especially for publications related to
computer science - Authors website
- ACM digital library
- IEEE xplore
- Springer Online
- Google scholar
- You school typically subscribes to these
publishers - Search from a computer with IP address belonging
to your school
9Computer Science Bibliography Collections
- CiteSeer
- http//citeseer.ist.psu.edu/
- DBLP
- http//www.informatik.uni-trier.de/ley/db/
- Google Scholar
- http//scholar.google.com/
- ACM Digital Library
- http//portal.acm.org/dl.cfm
- IEEE Xplore
- http//portal.acm.org/dl.cfm
10One Way to Find Related Papers
11Term Project
- ACMGIS CUP 2014
- Team of up-to 2 person
- March 03, 10 minutes presentation on algorithm
design and cost analysis - Score is based on normalized grade you get from
submission.
12Term Paper
- Two choices
- Term paper
- Survey paper
13Term paper
- Research oriented
- Key components
- Problem Statement, Significance of the problem
- Related Work and Our Contributions
- Proposed Approach
- Validation of listed contributions (experimental,
analytical) - Conclusions and Future Work
14Survey paper
- Key components
- Problem Statement, Significance of the problem
- Our Contributions (usually it is the
categorization/classification of the research
literature) - A classification of the papers related to the
problem. Use a concept hierarchy, figures, and
diagrams if necessary. - Summarize, classify, contrast, and compare the
research literature according to your
classification scheme - A summary of the trend and future work of this
line of research. - Conclusion.
15Spatial Databases (SDBMS)
- Traditional (non-spatial) database management
systems provide - Persistence across failures
- Allows concurrent access to data
- Scalability to search queries on very large
datasets which do not fit inside main memories of
computers - Efficient for non-spatial queries, but not for
spatial queries - Non-spatial queries
- List the names of all bookstore with more than
ten thousand titles. - List the names of ten customers, in terms of
sales, in the year 2001 - Use an index to narrow down the search
- Spatial Queries
- List the names of all bookstores with ten miles
of Minneapolis - List all customers who live in Tennessee and its
adjoining states - List all the customers who reside within fifty
miles of the company headquarter
16Value of SDBMS
- Examples of non-spatial data
- Names, phone numbers, email addresses of people
- Examples of Spatial data
- Census Data
- NASA satellites imagery - terabytes of data per
day - Weather and Climate Data
- Rivers, Farms, ecological impact
- Medical Imaging
- Exercise Identify spatial and non-spatial data
items in - A phone book
- A Product catalog
17User, Application domains
- Many important application domains have spatial
data and queries. Some Examples follow - Army Field Commander Has there been any
significant enemy troop movement since last
night? - Insurance Risk Manager Which homes are most
likely to be affected in the next great flood on
the Mississippi? - Medical Doctor Based on this patient's MRI,
have we treated somebody with a similar condition
? - Molecular BiologistIs the topology of the amino
acid biosynthesis gene in the genome found in any
other sequence feature map in the database ? - AstronomerFind all blue galaxies within 2 arcmin
of quasars. - Exercise List two ways you have used spatial
data. Which software did you use to manipulate
spatial data?
18SDBMS
- A SDBMS is a software module that
- can work with an underlying DBMS
- supports spatial data models, spatial abstract
data types (ADTs) and a query language from which
these ADTs are callable - supports spatial indexing, efficient algorithms
for processing spatial operations, and domain
specific rules for query optimization - Example Oracle Spatial data cartridge, ESRI SDE
- can work with Oracle DBMS
- Has spatial data types (e.g. polygon), operations
(e.g. overlap) callable from SQL3 query language - Has spatial indices, e.g. R-trees
- IBM Spatial Option
- Informix Spatial Datablade
19SDDMB vs. GIS
- GIS is a software to visualize and analyze
spatial data using spatial analysis functions
such as - Search Thematic search, search by region,
(re-)classification - Location analysis Buffer, corridor, overlay
- Terrain analysis Slope/aspect, catchment,
drainage network - Flow analysis Connectivity, shortest path
- Distribution Change detection, proximity, nearest
neighbor - Spatial analysis/Statistics Pattern, centrality,
autocorrelation, indices of similarity, topology
hole description - Measurements Distance, perimeter, shape,
adjacency, direction - GIS uses SDBMS
- to store, search, query, share large spatial data
sets
20SDBMS vs. GIS
- SDBMS focuses on
- Efficient storage, querying, sharing of large
spatial datasets - Provides simpler set based query operations
- Example operations search by region, overlay,
nearest neighbor, distance, adjacency, perimeter
etc. - Uses spatial indices and query optimization to
speedup queries over large spatial datasets. - SDBMS may be used by applications other than GIS
- Astronomy, Genomics, Multimedia information
systems, ...
21Issues in SDBMS
- Spatial data model
- Query language
- Query processing
- File organization and indices
- Query optimization, etc.
22Spatio-temporal Databases
- Add temporal dimension
- Examples
- Trajectories
- Evolving region
- Moving points
23Geo-stream databases
- Many data are generated continuously
- Transaction data
- Network monitoring
- Financial application
- Most recent data are commonly queried in a
one-pass fashion - Monitoring
- Aggregation
- Database system provides abstractions and
declarative languages that stream processing can
benefit from
24Stream Application
- Environmental monitoring
- Patient monitoring
- Finance
- Network monitoring
- Click-streams
- Transaction monitoring
- Traffic analysis
- Moving object queries
- Sensor network
- RFID
25Sample Applications
- Environmental monitoring
- Notify me when UV is high, temperature is low
- Traffic monitoring
- Traffic jam aggregated speed much below speed
limit on a road segment for extended time - Accident vehicle on unintended space, e.g. high
way for longer than expected time - Click-streams
- Find the school districts of the houses that the
user browses the most.
26Geo-streams
- Current streams systems lack native spatial
support - Spatial stream queries are common in
- traffic monitoring
- environment monitoring
- moving object databases
27Location Privacy
28Route prediction
- Next position
- Next stop
- The entire route
- Application
- Mobile commerce
- Save energy
- Traffic notification
29Location-based social networking
- Social networking with location
- Loopts
- Google latitude
- Geocache
- Social dynamics
- Iphone applications
30Volunteer Geographic Information System
- OpenStreetMap,
- Wikimapia
- Foursquare
- Trapster
31Spatio-temporal Analytics
- The analysis of data with both spatial and
temporal information - The data are spatially and/or temporally
correlated
"Everything is related to everything else, but
near things are more related than distant things."
32Why do we need spatio-temporal analytics
- Analytics help us to describe what happened in
the past, understand what is happening now,
predict what will happen in the future, and make
decisions. - The proliferation of sensor devices makes
spatio-temporal information a fundamental
component for almost every analytical applications
33Types of Spatio-Temporal Analytics Methods
- Visualization and exploratory analysis
- Segmentation (classification and clustering)
- Outlier analysis
- Colocation mining
- Dependency analysis
- Trend discovery
34Data Visualization and Exploratory Analysis
- Map querying task
- Static query (one-time query using map tools
available on the interface) - Dynamic query36 (setup of event alert
conditions) - Spatial constraints are expressed using the map,
while temporal constraints are expressed as
linear time moments37 - Map animation38
- Focusing, linking and arranging views39
- Map iteration40
- Existential changes25
- Location changes
- Attribute Changes
35Data Visualization and Exploratory Analysis
Example
36Segmentation methods
- Classification41
- Spatial classification decision tree, Bayesian,
ANN - Temporal classification decision tree, Bayesian,
ANN - Temporal extensions to spatial classification/
Spatial extension to temporal classification - Clustering42
- Spatial clustering partitioning method,
hierarchical method, density based method, and
grid-based method. - Temporal clustering
- Interactive spatio-temporal clustering perform
clustering spatially or temporally and then test
whether the cluster exist in both dimensions (EMM
Test43) - Simultaneous spatio-temporal clustering
space-time scan44
37More on Spatio-Temporal Clustering
38More on Spatio-Temporal Clustering
- Model-based clustering46
- define a multivariate density distribution and
look for a set of fitting parameters for the
model. - Distance-based method
- Moving object similarity search
- Density-based method
- DBSCAN extensions, OPTICS47
- Flocks and convoy
- Moving clusters47
- Applications movement data, cellular networks,
environment data
39Spatio-Temporal Clustering Example
40Spatio-Temporal Outlier Analysis
- Definition of outliers
- spatial-temporal object whose thematic attribute
values are signi?cantly di?erent from those of
other spatially and temporally referenced objects
in its spatial or/and temporal neighborhoods. - Methods48
- Clustering-based approach
- Distance based approach
- Computational geometry based approach
- Spatial scan based approach
41Spatio-Temporal Outlier Detection Example
42Co-Location Mining
- Colocation mining finds subset of Boolean
features located in spatial proximity - Methods50
- Data mining-based approach
- Spatial statistical approach
- Buffer-based model
- Temporal extension mixed-drove approach,
weighted window-based model51
43Co-Location Mining Example
44Other methods
- Association rule mining
- Spatial preprocessing is required to discretize
spatial measurements - Methods49
- Bayesian networks
- Hieratical approach
- Trend discovery
- Regression
- Sequence mining
45List of Current Spatio-Temporal Analytics Tools
- Commercial
- ESRI ArcGIS series
- Microsoft SQL Spatial StreamInsight
- Other commercial tools
- Open source/free software
- Descartes and CommonGIS
- MapServer
- Other free tools
46ESRI ArcGIS Series
- ArcGIS desktop and server provide most advanced
and complete toolkit - Has many extensions for different domains
- Can use APIs to develop extensions, web or
desktop applications for customized needs. Many
other commercial tools such as CUBE9 are built
on top of ArcGIS.
47ESRI ArcGIS Desktop and Server Extensions1
- 3D Extension (Desktop and Server)
- Analyze terrain data, model subsurface features,
view and analyze impact zones, determine optimum
facility placement, share 3D views, create a 3D
virtual city. - Geostatistical Extension (Desktop and Server)
- Visualize, model, and predict spatial
relationships. - Link data, graphs, and maps dynamically.
- Perform deterministic and geostatistical
interpolation. - Evaluate models and predictions probabilistically
48ESRI ArcGIS Desktop and Server Extensions
- Network Extension (Desktop and Server)
- Dynamically model realistic network
conditions and solve vehicle routing problems - Multipoint optimized routing, time-sensitive,
turn-by-turn driving directions , allocation of
service areas, determining the fastest fixed
route to the closest facility - Schematics Extension (Desktop and Server)
- Rapid checking of network connectivity
- Automatically generate schematics
49ESRI ArcGIS Desktop and Server Extensions
- Spatial extension (Desktop and Server)
- Comprehensive, raster-based spatial modeling and
analysis. - Survey Extension (Desktop)
- Capture, edit, and leverage land records using
proven survey methodologies - Tracking Extension (Desktop)
- Create time series visualizations so you can
analyze information relative to time and location
50ESRI Domain-Specific Solutions
- ESRI Business Analyst Online
- Web-based solution that combines GIS technology
with extensive demographic, consumer spending,
and business data for the entire United States to
deliver on-demand, boardroom-ready reports and
maps - Perform drive-time analysis
- Analyze trade areas
- Evaluate sites
- Identify most profitable customers and reach
customers
51ESRI Domain-Specific Solutions
- ArcGIS Community Analyst
- Web-based solution that provides GIS
capabilities to analyze data in a geographic
context as granular as congressional district,
block groups, census tracks, or ZIP Codes. - ArcLogistics
- Create optimized routes and schedules based on
multiple factors such as customer needs, business
rules, vehicle traits, and street restrictions. - Esri Situational Awareness
- Provides a geospatial framework for immediate and
long-term situational awareness needs. - Includes a powerful data fusion and analysis
engine a set of fully customizable clients for
data visualization and analysis and locally
hosted, prerendered data.
52Microsoft SQL Library StreamInsight2
- Combines SQL Server spatial library with stream
processing engine - Integrating SQL library within StreamInsight
engine - Focuses on data stream event processing workflow
- GIS Support relies on SQL Server (limited), and
therefore need extensive customization for
applications
53Other Commercial tools by category
- Complete GIS Suite (similar to ArcGIS)
- Cardcorp SIS8,Geomedia17, IDRISI18 ,
Mapinfo19 - Spatio-temporal analysis
- STIS23
- Network (traffic) analysis tools
- ACCESSION GIS3, AltaMap Suite4, CUBE9,
DYNAMEQ15, EMME14 - Terrain analysis
- ANUDEM5
- CAD applications
- AutoCAD Civil3D6,
- Emergency and hazard modeling and analysis
- CadnaA10, Calpuff View11,Caris12,CATS13,Fl
oodworks16 - Specialized analysis
- ClusterSeer and BoundarySeer7 (cluster and
boundary analysis), Mathematica20 - Mathematics toolkit
- Matlab Plus toolbox21, SPSS22
54Descartes and CommonGIS24
- An interactive java based GIS tool for
visualization and exploratory analysis. - Functionalities
- Map and graph visualization (Choropleth maps,
scatter plot) - Basic queries (distance, difference)
- Dynamic queries
- Open source and customizable, lack advanced GIS
analytics functionalities
55MapServer31
- Open source GIS data rendering engine
- Functionalities
- Advanced cartographic output
- Cross platform and APIs for all popular scripting
languages - Support many formats
- OGC standard compliant
- Not a full GIS suite
56Other Free/Open Source tools by category
- Complete GIS Suite (similar to ArcGIS)
- GRASS28
- Spatio-temporal analysis
- Map comparison kit30, STAR34
- Terrain analysis
- Landserf29
- Exploratory data analysis
- GeoDA26
- Database extension
- PostGIS32
- Specialized analysis
- GAM/K25(Clustering), GRASP27(Regression)
- Mathematics toolkit
- R Spatial33
57- Spatio-temporal analytics is becoming an
fundamental component of business analytics - The future
- Big data (bigger due to spatio-temporal
dimension) - Real time (not only historical spatio-temporal
data, but also streaming data that requires
optimization at all levels)
58References (I)
- 1 http//www.esri.com/software/arcgis/index.html
- 2 http//msdn.microsoft.com/en-us/library/ee3625
41.aspx - 3 http//www.citilabs.com/accession.html
- 4 http//www.geomicro.com/
- 5 http//fennerschool.anu.edu.au/publications/so
ftware/anudem.php - 6 http//usa.autodesk.com/adsk/servlet/pc/index?
siteID123112id8777380 - 7 http//www.terraseer.com/products_boundaryseer
.php - 8 http//www.cadcorp.com/products_geographical_i
nformation_systems/index.htm - 9 http//www.citilabs.com/cube_base.html
- 10 http//www.datakustik.com/en/products/cadnaa/
- 11 http//www.weblakes.com/calpuff/calpuff_overv
iew.html - 12 http//www.caris.com
- 13 http//www.saic.com/products/security/cats/
- 14 http//www.inro.ca/en/products/emme/
- 15 http//www.inro.ca/en/products/dynameq/
- 16 http//www.wallingfordsoftware.com/uk/product
s/floodworks/ - 17 http//www.intergraph.com/sgi/default.aspx
- 18 http//www.clarklabs.org/
- 19 http//www.pbinsight.com/welcome/mapinfo/
59References (II)
- 24 http//www.esds.ac.uk/international/support/u
ser_guides/gisoverview.asp - 25 http//www.ccg.leeds.ac.uk/software/gam/
- 26 http//geodacenter.asu.edu/software
- 27 http//www.unine.ch/CSCF/grasp/
- 28 http//grass.fbk.eu/
- 29 http//www.landserf.org
- 30 http//www.riks.nl/products/Map_Comparison_Ki
t - 31 http//mapserver.gis.umn.edu/
- 32 http//postgis.refractions.net/
- 33 http//cran.r-project.org/web/views/Spatial.h
tml - 34 http//regionalanalysislab.org/index.php/Main
/STARS - 35 P. Compieta, S. Di Martino, M. Bertolotto,
F. Ferrucci, and T. Kechadi. 2007. Exploratory
spatio-temporal data mining and visualization. J.
Vis. Lang. Comput. 18, 3 (June 2007), 255-279. - 36 C. Ahlberg, C. Williamson, B. Shneiderman,
Dynamic queries for information exploration an
implementation and evaluation, in Proceedings
ACM CHI92, ACM Press, New York, 1992, pp.
619626. - 37 M. Harrower, A.M. MacEachren, A.L. Griffin,
Developing a geographic visualization tool to
support earth science learning, Cartography and
Geographic Information Science 27 (4) (2000)
279293. - 38 W.L. Hibbard, B.E. Paul, D.A. Santek, C.R.
Dyer, A.L. Battaiola, M.-F. Voidrot-Martinez,
Interactive visualization of earth and space
science computations, Computer. 27 (7) (1994)
6572. - 39 A. Buja, J.A. McDonald, J. Michalak, W.
Stuetzle, Interactive data visualization using
focusing and linking, in Proceedings IEEE
Visualization91, IEEE Computer Society Press,
Washington, 1991, pp. 156163. - 40 D. Stojanovic, S. Djordjevic-Kajan, A.
Mitrovic, Z. Stojanovic, Cartographic
visualization and animation of the dynamic
geographic processes and phenomena, in
Proceedings of 19th International Cartographic
Conference, Ottawa, Canada, Vol. 1, 1999, pp.
739746.
60References (III)
- 41 Kumar, M. Bhatt, G. Beeson, P. Duffy, C.
Automated Detection and Spatio-Temporal
Classification of Channel Reaches in Semi-arid
Southwestern US Using ASTER. American Geophysical
Union, 2006 Joint Assembly. - 42 Tim E. Carpenter, Methods to investigate
spatial and temporal clustering in veterinary
epidemiology, Preventive Veterinary Medicine,
Volume 48, Issue 4, 29 March 2001, Pages 303-320. - 43 Fosgate, G.T., Carpenter, T.E., Case, J.T.,
Chomel, B.B., 2000. Timespatial clustering of
human cases of brucellosis California,
19731992. In Proceedings of the Ninth
International Society on Veterinary Epidemiology
and Economics, Breckenridge, CO - 44 McKenzie, J.S., Pfeiffer, D.U., Morris,
R.S., 2000. Spatial and temporal patterns of
vector-borne tuberculosis infection in beef
breeding cattle in New Zealand. In Proceedings
of the Ninth International Society on Veterinary
Epidemiology and Economics, Breckenridge, CO - 45 Chudova D, Gaffney S, Mjolsness E, Smyth P
(2003) Translation-invariant mixture models for
curve clustering. In KDD 03 Proceedings of the
ninth ACM SIGKDD international conference on
Knowledge discovery and data mining, ACM, New
York, NY, USA, pp 7988 - 46 Ankerst M, Breunig MM, Kriegel HP, Sander J
(1999) Optics ordering points to identify the
clustering structure. SIGMOD Rec 28(2)4960 - 47 Kalnis P, Mamoulis N, Bakiras S (2005) On
discovering moving clusters in spatio-temporal
data. Advances in Spatial and Temporal Databases
pp 364381 - 48 Birant, D. Kut, A.. Spatio-temporal
outlier detection in large databases. 28th
International Conference on Information
Technology Interfaces, 2006. - 49 Jeremy Mennis, Jun Wei Liu. Mining
Association Rules in Spatio-Temporal Data An
Analysis of Urban Socioeconomic and Land Cover
Change. http//onlinelibrary.wiley.com/doi/10.1111
/j.1467-9671.2005.00202.x/abstract. - 50 Y. Huang, S. Shekhar, and H. Xiong,
Discovering colocation patterns from spatial
datasets A general approach., IEEE Transactions
on Knowledge and Data Engineering, vol. 16, no.
12, pp. 14721485, 2004 - 51 Feng Qian Liang Yin Qinming He
Jiangfeng He . Mining spatio-temporal
co-location patterns with weighted sliding
window. IEEE International Conference on
Intelligent Computing and Intelligent Systems,
2009. ICIS 2009.