Yan Huang - PowerPoint PPT Presentation

About This Presentation
Title:

Yan Huang

Description:

Yan Huang. huangyan_at_unt.edu. 6350 Spatio-temporal Data Processing Course Overview – PowerPoint PPT presentation

Number of Views:408
Avg rating:3.0/5.0
Slides: 61
Provided by: cseUntEd3
Category:
Tags: data | huang | mining | stream | yan

less

Transcript and Presenter's Notes

Title: Yan Huang


1
6350 Spatio-temporal Data Processing Course
Overview
  • Yan Huang
  • huangyan_at_unt.edu

2
Basic Information
  • Instructor Yan Huang (huangyan at unt.edu)
  • Meeting place and time M 230-520pm B157
  • Office hours M 1230-230pm

3
Basic Information
  • TA Sasi Koneru (SasiKoneru_at_my.unt.edu)
  • Office hours Monday 1000 AM to 200 PM, F208

4
Evaluation
  • The evaluation scheme will be
  • class participation 10
  • paper analysis and presentation - 25
  • project - 40.
  • Term paper 30

5
Classroom policy
  • No computers or laptops unless told so.

6
Paper Analysis I
  • Collect 5 or more papers in one sub-area
  • Write short summaries for 3 (100-200 words)
  • Make a 15 minutes presentation on what you learn
    on this topic
  • The presentation will take an integrated approach
    where you introduce the motivation of the three
    papers, give a precise problem definition,
    compare and contrast the ways the 3 papers
    approach the problem and how they validate their
    results, present conclusions, and point to some
    future directions if you can identify

7
Paper Analysis II
  • Choose and present one paper from the reading
    list
  • Collect two questions from each group
  • Ask two questions yourself
  • Lead group discussion
  • Detail instructions are available from
  • http//www.cse.unt.edu/huangyan/6350/paperAnalysi
    s.txt
  • One paper every week

8
Find Related Work
  • Need to know the key words
  • May need to explore and refine during your search
  • Often you can find electronic version of the
    papers, especially for publications related to
    computer science
  • Authors website
  • ACM digital library
  • IEEE xplore
  • Springer Online
  • Google scholar
  • You school typically subscribes to these
    publishers
  • Search from a computer with IP address belonging
    to your school

9
Computer Science Bibliography Collections
  • CiteSeer
  • http//citeseer.ist.psu.edu/
  • DBLP
  • http//www.informatik.uni-trier.de/ley/db/
  • Google Scholar
  • http//scholar.google.com/
  • ACM Digital Library
  • http//portal.acm.org/dl.cfm
  • IEEE Xplore
  • http//portal.acm.org/dl.cfm

10
One Way to Find Related Papers
11
Term Project
  • ACMGIS CUP 2014
  • Team of up-to 2 person
  • March 03, 10 minutes presentation on algorithm
    design and cost analysis
  • Score is based on normalized grade you get from
    submission.

12
Term Paper
  • Two choices
  • Term paper
  • Survey paper

13
Term paper
  • Research oriented
  • Key components
  • Problem Statement, Significance of the problem
  • Related Work and Our Contributions
  • Proposed Approach
  • Validation of listed contributions (experimental,
    analytical)
  • Conclusions and Future Work

14
Survey paper
  • Key components
  • Problem Statement, Significance of the problem
  • Our Contributions (usually it is the
    categorization/classification of the research
    literature)
  • A classification of the papers related to the
    problem. Use a concept hierarchy, figures, and
    diagrams if necessary.
  • Summarize, classify, contrast, and compare the
    research literature according to your
    classification scheme
  • A summary of the trend and future work of this
    line of research.
  • Conclusion.

15
Spatial Databases (SDBMS)
  • Traditional (non-spatial) database management
    systems provide
  • Persistence across failures
  • Allows concurrent access to data
  • Scalability to search queries on very large
    datasets which do not fit inside main memories of
    computers
  • Efficient for non-spatial queries, but not for
    spatial queries
  • Non-spatial queries
  • List the names of all bookstore with more than
    ten thousand titles.
  • List the names of ten customers, in terms of
    sales, in the year 2001
  • Use an index to narrow down the search
  • Spatial Queries
  • List the names of all bookstores with ten miles
    of Minneapolis
  • List all customers who live in Tennessee and its
    adjoining states
  • List all the customers who reside within fifty
    miles of the company headquarter

16
Value of SDBMS
  • Examples of non-spatial data
  • Names, phone numbers, email addresses of people
  • Examples of Spatial data
  • Census Data
  • NASA satellites imagery - terabytes of data per
    day
  • Weather and Climate Data
  • Rivers, Farms, ecological impact
  • Medical Imaging
  • Exercise Identify spatial and non-spatial data
    items in
  • A phone book
  • A Product catalog

17
User, Application domains
  • Many important application domains have spatial
    data and queries. Some Examples follow
  • Army Field Commander Has there been any
    significant enemy troop movement since last
    night?
  • Insurance Risk Manager Which homes are most
    likely to be affected in the next great flood on
    the Mississippi?
  • Medical Doctor Based on this patient's MRI,
    have we treated somebody with a similar condition
    ?
  • Molecular BiologistIs the topology of the amino
    acid biosynthesis gene in the genome found in any
    other sequence feature map in the database ?
  • AstronomerFind all blue galaxies within 2 arcmin
    of quasars.
  • Exercise List two ways you have used spatial
    data. Which software did you use to manipulate
    spatial data?

18
SDBMS
  • A SDBMS is a software module that
  • can work with an underlying DBMS
  • supports spatial data models, spatial abstract
    data types (ADTs) and a query language from which
    these ADTs are callable
  • supports spatial indexing, efficient algorithms
    for processing spatial operations, and domain
    specific rules for query optimization
  • Example Oracle Spatial data cartridge, ESRI SDE
  • can work with Oracle DBMS
  • Has spatial data types (e.g. polygon), operations
    (e.g. overlap) callable from SQL3 query language
  • Has spatial indices, e.g. R-trees
  • IBM Spatial Option
  • Informix Spatial Datablade

19
SDDMB vs. GIS
  • GIS is a software to visualize and analyze
    spatial data using spatial analysis functions
    such as
  • Search Thematic search, search by region,
    (re-)classification
  • Location analysis Buffer, corridor, overlay
  • Terrain analysis Slope/aspect, catchment,
    drainage network
  • Flow analysis Connectivity, shortest path
  • Distribution Change detection, proximity, nearest
    neighbor
  • Spatial analysis/Statistics Pattern, centrality,
    autocorrelation, indices of similarity, topology
    hole description
  • Measurements Distance, perimeter, shape,
    adjacency, direction
  • GIS uses SDBMS
  • to store, search, query, share large spatial data
    sets

20
SDBMS vs. GIS
  • SDBMS focuses on
  • Efficient storage, querying, sharing of large
    spatial datasets
  • Provides simpler set based query operations
  • Example operations search by region, overlay,
    nearest neighbor, distance, adjacency, perimeter
    etc.
  • Uses spatial indices and query optimization to
    speedup queries over large spatial datasets.
  • SDBMS may be used by applications other than GIS
  • Astronomy, Genomics, Multimedia information
    systems, ...

21
Issues in SDBMS
  • Spatial data model
  • Query language
  • Query processing
  • File organization and indices
  • Query optimization, etc.

22
Spatio-temporal Databases
  • Add temporal dimension
  • Examples
  • Trajectories
  • Evolving region
  • Moving points

23
Geo-stream databases
  • Many data are generated continuously
  • Transaction data
  • Network monitoring
  • Financial application
  • Most recent data are commonly queried in a
    one-pass fashion
  • Monitoring
  • Aggregation
  • Database system provides abstractions and
    declarative languages that stream processing can
    benefit from

24
Stream Application
  • Environmental monitoring
  • Patient monitoring
  • Finance
  • Network monitoring
  • Click-streams
  • Transaction monitoring
  • Traffic analysis
  • Moving object queries
  • Sensor network
  • RFID

25
Sample Applications
  • Environmental monitoring
  • Notify me when UV is high, temperature is low
  • Traffic monitoring
  • Traffic jam aggregated speed much below speed
    limit on a road segment for extended time
  • Accident vehicle on unintended space, e.g. high
    way for longer than expected time
  • Click-streams
  • Find the school districts of the houses that the
    user browses the most.

26
Geo-streams
  • Current streams systems lack native spatial
    support
  • Spatial stream queries are common in
  • traffic monitoring
  • environment monitoring
  • moving object databases

27
Location Privacy
28
Route prediction
  • Next position
  • Next stop
  • The entire route
  • Application
  • Mobile commerce
  • Save energy
  • Traffic notification

29
Location-based social networking
  • Social networking with location
  • Loopts
  • Google latitude
  • Geocache
  • Social dynamics
  • Iphone applications

30
Volunteer Geographic Information System
  • OpenStreetMap,
  • Wikimapia
  • Foursquare
  • Trapster

31
Spatio-temporal Analytics
  • The analysis of data with both spatial and
    temporal information
  • The data are spatially and/or temporally
    correlated

"Everything is related to everything else, but
near things are more related than distant things."
32
Why do we need spatio-temporal analytics
  • Analytics help us to describe what happened in
    the past, understand what is happening now,
    predict what will happen in the future, and make
    decisions.
  • The proliferation of sensor devices makes
    spatio-temporal information a fundamental
    component for almost every analytical applications

33
Types of Spatio-Temporal Analytics Methods
  • Visualization and exploratory analysis
  • Segmentation (classification and clustering)
  • Outlier analysis
  • Colocation mining
  • Dependency analysis
  • Trend discovery

34
Data Visualization and Exploratory Analysis
  • Map querying task
  • Static query (one-time query using map tools
    available on the interface)
  • Dynamic query36 (setup of event alert
    conditions)
  • Spatial constraints are expressed using the map,
    while temporal constraints are expressed as
    linear time moments37
  • Map animation38
  • Focusing, linking and arranging views39
  • Map iteration40
  • Existential changes25
  • Location changes
  • Attribute Changes

35
Data Visualization and Exploratory Analysis
Example
36
Segmentation methods
  • Classification41
  • Spatial classification decision tree, Bayesian,
    ANN
  • Temporal classification decision tree, Bayesian,
    ANN
  • Temporal extensions to spatial classification/
    Spatial extension to temporal classification
  • Clustering42
  • Spatial clustering partitioning method,
    hierarchical method, density based method, and
    grid-based method.
  • Temporal clustering
  • Interactive spatio-temporal clustering perform
    clustering spatially or temporally and then test
    whether the cluster exist in both dimensions (EMM
    Test43)
  • Simultaneous spatio-temporal clustering
    space-time scan44

37
More on Spatio-Temporal Clustering
38
More on Spatio-Temporal Clustering
  • Model-based clustering46
  • define a multivariate density distribution and
    look for a set of fitting parameters for the
    model.
  • Distance-based method
  • Moving object similarity search
  • Density-based method
  • DBSCAN extensions, OPTICS47
  • Flocks and convoy
  • Moving clusters47
  • Applications movement data, cellular networks,
    environment data

39
Spatio-Temporal Clustering Example
40
Spatio-Temporal Outlier Analysis
  • Definition of outliers
  • spatial-temporal object whose thematic attribute
    values are signi?cantly di?erent from those of
    other spatially and temporally referenced objects
    in its spatial or/and temporal neighborhoods.
  • Methods48
  • Clustering-based approach
  • Distance based approach
  • Computational geometry based approach
  • Spatial scan based approach

41
Spatio-Temporal Outlier Detection Example
42
Co-Location Mining
  • Colocation mining finds subset of Boolean
    features located in spatial proximity
  • Methods50
  • Data mining-based approach
  • Spatial statistical approach
  • Buffer-based model
  • Temporal extension mixed-drove approach,
    weighted window-based model51

43
Co-Location Mining Example
44
Other methods
  • Association rule mining
  • Spatial preprocessing is required to discretize
    spatial measurements
  • Methods49
  • Bayesian networks
  • Hieratical approach
  • Trend discovery
  • Regression
  • Sequence mining

45
List of Current Spatio-Temporal Analytics Tools
  • Commercial
  • ESRI ArcGIS series
  • Microsoft SQL Spatial StreamInsight
  • Other commercial tools
  • Open source/free software
  • Descartes and CommonGIS
  • MapServer
  • Other free tools

46
ESRI ArcGIS Series
  • ArcGIS desktop and server provide most advanced
    and complete toolkit
  • Has many extensions for different domains
  • Can use APIs to develop extensions, web or
    desktop applications for customized needs. Many
    other commercial tools such as CUBE9 are built
    on top of ArcGIS.

47
ESRI ArcGIS Desktop and Server Extensions1
  • 3D Extension (Desktop and Server)
  • Analyze terrain data, model subsurface features,
    view and analyze impact zones, determine optimum
    facility placement, share 3D views, create a 3D
    virtual city.
  • Geostatistical Extension (Desktop and Server)
  • Visualize, model, and predict spatial
    relationships.
  • Link data, graphs, and maps dynamically.
  • Perform deterministic and geostatistical
    interpolation.
  • Evaluate models and predictions probabilistically

48
ESRI ArcGIS Desktop and Server Extensions
  • Network Extension (Desktop and Server)
  • Dynamically model realistic network
    conditions and solve vehicle routing problems
  • Multipoint optimized routing, time-sensitive,
    turn-by-turn driving directions , allocation of
    service areas, determining the fastest fixed
    route to the closest facility 
  • Schematics Extension (Desktop and Server)
  • Rapid checking of network connectivity
  • Automatically generate schematics

49
ESRI ArcGIS Desktop and Server Extensions
  • Spatial extension (Desktop and Server)
  • Comprehensive, raster-based spatial modeling and
    analysis.
  • Survey Extension (Desktop)
  • Capture, edit, and leverage land records using
    proven survey methodologies
  • Tracking Extension (Desktop)
  • Create time series visualizations so you can
    analyze information relative to time and location

50
ESRI Domain-Specific Solutions
  • ESRI Business Analyst Online 
  • Web-based solution that combines GIS technology
    with extensive demographic, consumer spending,
    and business data for the entire United States to
    deliver on-demand, boardroom-ready reports and
    maps
  • Perform drive-time analysis
  • Analyze trade areas
  • Evaluate sites
  • Identify most profitable customers and reach
    customers

51
ESRI Domain-Specific Solutions
  • ArcGIS Community Analyst  
  • Web-based solution that provides GIS
    capabilities to analyze data in a geographic
    context as granular as congressional district,
    block groups, census tracks, or ZIP Codes.
  • ArcLogistics
  • Create optimized routes and schedules based on
    multiple factors such as customer needs, business
    rules, vehicle traits, and street restrictions. 
  • Esri Situational Awareness
  • Provides a geospatial framework for immediate and
    long-term situational awareness needs.
  • Includes a powerful data fusion and analysis
    engine a set of fully customizable clients for
    data visualization and analysis and locally
    hosted, prerendered data.

52
Microsoft SQL Library StreamInsight2
  • Combines SQL Server spatial library with stream
    processing engine
  • Integrating SQL library within StreamInsight
    engine
  • Focuses on data stream event processing workflow
  • GIS Support relies on SQL Server (limited), and
    therefore need extensive customization for
    applications

53
Other Commercial tools by category
  • Complete GIS Suite (similar to ArcGIS)
  • Cardcorp SIS8,Geomedia17, IDRISI18 ,
    Mapinfo19
  • Spatio-temporal analysis
  • STIS23
  • Network (traffic) analysis tools
  • ACCESSION GIS3, AltaMap Suite4, CUBE9,
    DYNAMEQ15, EMME14
  • Terrain analysis
  • ANUDEM5
  • CAD applications
  • AutoCAD Civil3D6,
  • Emergency and hazard modeling and analysis
  • CadnaA10, Calpuff View11,Caris12,CATS13,Fl
    oodworks16
  • Specialized analysis
  • ClusterSeer and BoundarySeer7 (cluster and
    boundary analysis), Mathematica20
  • Mathematics toolkit
  • Matlab Plus toolbox21, SPSS22

54
Descartes and CommonGIS24
  • An interactive java based GIS tool for
    visualization and exploratory analysis.
  • Functionalities
  • Map and graph visualization (Choropleth maps,
    scatter plot)
  • Basic queries (distance, difference)
  • Dynamic queries
  • Open source and customizable, lack advanced GIS
    analytics functionalities

55
MapServer31
  • Open source GIS data rendering engine
  • Functionalities
  • Advanced cartographic output
  • Cross platform and APIs for all popular scripting
    languages
  • Support many formats
  • OGC standard compliant
  • Not a full GIS suite

56
Other Free/Open Source tools by category
  • Complete GIS Suite (similar to ArcGIS)
  • GRASS28
  • Spatio-temporal analysis
  • Map comparison kit30, STAR34
  • Terrain analysis
  • Landserf29
  • Exploratory data analysis
  • GeoDA26
  • Database extension
  • PostGIS32
  • Specialized analysis
  • GAM/K25(Clustering), GRASP27(Regression)
  • Mathematics toolkit
  • R Spatial33

57
  • Spatio-temporal analytics is becoming an
    fundamental component of business analytics
  • The future
  • Big data (bigger due to spatio-temporal
    dimension)
  • Real time (not only historical spatio-temporal
    data, but also streaming data that requires
    optimization at all levels)

58
References (I)
  • 1 http//www.esri.com/software/arcgis/index.html
  • 2 http//msdn.microsoft.com/en-us/library/ee3625
    41.aspx
  • 3 http//www.citilabs.com/accession.html
  • 4 http//www.geomicro.com/
  • 5 http//fennerschool.anu.edu.au/publications/so
    ftware/anudem.php
  • 6 http//usa.autodesk.com/adsk/servlet/pc/index?
    siteID123112id8777380
  • 7 http//www.terraseer.com/products_boundaryseer
    .php
  • 8 http//www.cadcorp.com/products_geographical_i
    nformation_systems/index.htm
  • 9 http//www.citilabs.com/cube_base.html
  • 10 http//www.datakustik.com/en/products/cadnaa/
  • 11 http//www.weblakes.com/calpuff/calpuff_overv
    iew.html
  • 12 http//www.caris.com
  • 13 http//www.saic.com/products/security/cats/
  • 14 http//www.inro.ca/en/products/emme/
  • 15 http//www.inro.ca/en/products/dynameq/
  • 16 http//www.wallingfordsoftware.com/uk/product
    s/floodworks/
  • 17 http//www.intergraph.com/sgi/default.aspx
  • 18 http//www.clarklabs.org/
  • 19 http//www.pbinsight.com/welcome/mapinfo/

59
References (II)
  • 24 http//www.esds.ac.uk/international/support/u
    ser_guides/gisoverview.asp
  • 25 http//www.ccg.leeds.ac.uk/software/gam/
  • 26 http//geodacenter.asu.edu/software
  • 27 http//www.unine.ch/CSCF/grasp/
  • 28 http//grass.fbk.eu/
  • 29 http//www.landserf.org
  • 30 http//www.riks.nl/products/Map_Comparison_Ki
    t
  • 31 http//mapserver.gis.umn.edu/
  • 32 http//postgis.refractions.net/
  • 33 http//cran.r-project.org/web/views/Spatial.h
    tml
  • 34 http//regionalanalysislab.org/index.php/Main
    /STARS
  • 35 P. Compieta, S. Di Martino, M. Bertolotto,
    F. Ferrucci, and T. Kechadi. 2007. Exploratory
    spatio-temporal data mining and visualization. J.
    Vis. Lang. Comput. 18, 3 (June 2007), 255-279. 
  • 36 C. Ahlberg, C. Williamson, B. Shneiderman,
    Dynamic queries for information exploration an
    implementation and evaluation, in Proceedings
    ACM CHI92, ACM Press, New York, 1992, pp.
    619626.
  • 37 M. Harrower, A.M. MacEachren, A.L. Griffin,
    Developing a geographic visualization tool to
    support earth science learning, Cartography and
    Geographic Information Science 27 (4) (2000)
    279293.
  • 38 W.L. Hibbard, B.E. Paul, D.A. Santek, C.R.
    Dyer, A.L. Battaiola, M.-F. Voidrot-Martinez,
    Interactive visualization of earth and space
    science computations, Computer. 27 (7) (1994)
    6572.
  • 39 A. Buja, J.A. McDonald, J. Michalak, W.
    Stuetzle, Interactive data visualization using
    focusing and linking, in Proceedings IEEE
    Visualization91, IEEE Computer Society Press,
    Washington, 1991, pp. 156163.
  • 40 D. Stojanovic, S. Djordjevic-Kajan, A.
    Mitrovic, Z. Stojanovic, Cartographic
    visualization and animation of the dynamic
    geographic processes and phenomena, in
    Proceedings of 19th International Cartographic
    Conference, Ottawa, Canada, Vol. 1, 1999, pp.
    739746.

60
References (III)
  • 41 Kumar, M. Bhatt, G. Beeson, P. Duffy, C.
    Automated Detection and Spatio-Temporal
    Classification of Channel Reaches in Semi-arid
    Southwestern US Using ASTER. American Geophysical
    Union, 2006 Joint Assembly.
  • 42 Tim E. Carpenter, Methods to investigate
    spatial and temporal clustering in veterinary
    epidemiology, Preventive Veterinary Medicine,
    Volume 48, Issue 4, 29 March 2001, Pages 303-320.
  • 43 Fosgate, G.T., Carpenter, T.E., Case, J.T.,
    Chomel, B.B., 2000. Timespatial clustering of
    human cases of brucellosis California,
    19731992. In Proceedings of the Ninth
    International Society on Veterinary Epidemiology
    and Economics, Breckenridge, CO
  • 44 McKenzie, J.S., Pfeiffer, D.U., Morris,
    R.S., 2000. Spatial and temporal patterns of
    vector-borne tuberculosis infection in beef
    breeding cattle in New Zealand. In Proceedings
    of the Ninth International Society on Veterinary
    Epidemiology and Economics, Breckenridge, CO
  • 45 Chudova D, Gaffney S, Mjolsness E, Smyth P
    (2003) Translation-invariant mixture models for
    curve clustering. In KDD 03 Proceedings of the
    ninth ACM SIGKDD international conference on
    Knowledge discovery and data mining, ACM, New
    York, NY, USA, pp 7988
  • 46 Ankerst M, Breunig MM, Kriegel HP, Sander J
    (1999) Optics ordering points to identify the
    clustering structure. SIGMOD Rec 28(2)4960
  • 47 Kalnis P, Mamoulis N, Bakiras S (2005) On
    discovering moving clusters in spatio-temporal
    data. Advances in Spatial and Temporal Databases
    pp 364381
  • 48 Birant, D.   Kut, A.. Spatio-temporal
    outlier detection in large databases. 28th
    International Conference on Information
    Technology Interfaces, 2006.
  • 49 Jeremy Mennis, Jun Wei Liu. Mining
    Association Rules in Spatio-Temporal Data An
    Analysis of Urban Socioeconomic and Land Cover
    Change. http//onlinelibrary.wiley.com/doi/10.1111
    /j.1467-9671.2005.00202.x/abstract.
  • 50 Y. Huang, S. Shekhar, and H. Xiong,
    Discovering colocation patterns from spatial
    datasets A general approach., IEEE Transactions
    on Knowledge and Data Engineering, vol. 16, no.
    12, pp. 14721485, 2004
  • 51 Feng Qian Liang Yin Qinming He
    Jiangfeng He . Mining spatio-temporal
    co-location patterns with weighted sliding
    window. IEEE International Conference on
    Intelligent Computing and Intelligent Systems,
    2009. ICIS 2009.
Write a Comment
User Comments (0)
About PowerShow.com