DIaD - PowerPoint PPT Presentation

About This Presentation
Title:

DIaD

Description:

EDINA a JISC funded National Data Centre delivering ... e.g uTorrent. Rationale increasingly researchers/students want access to resources from home ... – PowerPoint PPT presentation

Number of Views:200
Avg rating:3.0/5.0
Slides: 20
Provided by: lr56
Category:
Tags: diad | utorrent

less

Transcript and Presenter's Notes

Title: DIaD


1
DIaD Data Integration and Dissemination
May 2009 James.Reid_at_ed.ac.uk
Data Integration and Dissemination DIaD
2
Background
Who? EDINA a JISC funded National Data Centre
delivering on-line resources to UK Higher and
Further Education The ESRC's Geography Data Unit
for the Census Programme
What? DIaD an ESRC funded project aimed at
exploring innovation in census delivery
mechanisms The primary objective of this work is
to develop a data dissemination model which
demonstrates a more generic capability that of
geo-linking
3
Background - What?
  • The secondary objective of the work was to
    develop value
  • added services exploiting the results of the
    automated
  • linkage outputs, specifically
  • A cartogram service
  • A bitorrent based network for dissemination of
    the
  • Linked Outputs
  • More on these later...first the rationale...

4
Background - Why?
  • The two most heavily used of the data sources are
    the small area statistics
  • provided by the Census Dissemination Unit (CDU)
    and the digital boundary
  • datasets provided by the Geography Data Unit
    (UKBORDERS). Together
  • these sources allow end users (significantly
    researchers) to undertake a wide
  • range of analytical and visualisation tasks, from
    for example, simple
  • choropleth mapping to cartogram transformations
    to detailed small area
  • spatial analyses.
  • Each resource (the statistics on one hand and the
    boundary data on the other),
  • are extremely valuable in their own right but in
    combination they provide a
  • data resource of almost unparalleled versatility
    and richness to social science
  • investigators

5
Background - why?
Evidence from a recent ESRC survey of geospatial
services and requirements Source ESRC interim
survey results, march 2009. n512
6
How?
  • Via Open Standards (a la Open Geospatial
    Consortia)?
  • Specifically using
  • the Geographic Linkage Service (GLS)
    Specification
  • the Web Feature Service (WFS) Specification
  • investigate the Web Processing Service (WPS)
    Specification
  • Implicitly via use of Open Source Software
  • An open standard is a standard that is publicly
    available and has various rights to use
    associated with it, and may
  • also have various properties of how it was
    designed (e.g. open process).

7
How?
Geographic Linkage Service (GLS)
Specification Purpose to provide a simple way
to describe and exchange data that contains
geographically related information, but which
does not include the detailed geometry of the
geographic object. A GLS provides a simple
standardized way to exchange attribute
information that applies to a well-known
geospatial dataset known as a Framework
dataset. Attribute information delivered from a
GLS can be used in a variety of ways, including
use by models to perform calculations, or
visualization as a web map.
8
How?
Geographic Linkage Service (GLS)
Specification GLS includes two related sets of
operations. 1. GetData - Attribute data is
provided to other computers on the network by
implementing the GetData (and related)
operations. The response to a GetData
operation is an XML file, in a format known as
GDAS (Geographic Data Attribute Set). 2.
JoinData - At some other node on the network,
another GLS configured for the JoinData
operation allows a computer to incorporate the
contents of the XML file into a local spatial
framework dataset. This local dataset would
normally in turn be used to support mapping of
this information. In early versions of the
GLS the specification was split into two separate
specifications, one of which was known as the
Geographic Data Access Service (GDAS).
Subsequent revision integrated the two
specifications. Note that GDAS (original) and
GDAS (current) are not the same thing!
9
GML vs GDAS
  • In comparison to GML, GDAS provides the following
    specific benefits
  • It is a single logical encoding for attribute
    data.
  • It is extremely light-weight.
  • It is optimized for the efficient discovery of
    vector attributes.
  • It includes attributes to support automated
    mapping, including titles, legends, and
  • the classification of attributes.
  • It includes attributes to address the presence
    of null values in the dataset to
  • facilitate their exclusion from calculations
    and legends.
  • It includes attributes to support the joining
    of tabular data to geometry in a N1 or
  • NN fashion.
  • It is easy to validate its content and convert
    it into HTML or other formats.
  • It is easy to manipulate its content and
    enables the performance of calculations
  • using XSLT.
  • It is easy to generate directly from corporate
    database management technology,
  • using languages such as XQuery.

10
GLS Operations in more detail
GetData - the GetData operation returns an
entire GDAS file including attribute data and
its associated metadata. The related
operations, - DescribeFrameworks -
DescribeDatasets and - DescribeData
return selectively larger portions of the
metadata for the geographic attributes that
can be served up by the GLS instance.

11
GLS Operations in more detail
JoinData - the JoinData operation joins
attribute data in GDAS format to its
spatial framework and deliver references
to the joined output. -
DescribeJoinAbilities operation returns a list of
the spatial frameworks that are
available to the service - DescribeKey
operation lists the spatial identifiers.

12
GDAS in more detail
  • The GDAS format is designed to support simple
    as well as rich
  • and complicated attribute databases that may not
    always be easy
  • to interpret.
  • The metadata included in the encoding is
    designed to
  • ensure that the user knows exactly what the
    content of the dataset is
  • as well as which spatial framework it
    references, and has easy access
  • to any associated documentation.
  • GDAS is produced in response to a GLS GetData
    request
  • The general structure of the GDAS XML encoding is
    as follows.
  • ltGDASgt
  • ltFrameworkgt
  • ... spatial framework
    metadata
  • ltDatasetgt
  • ... attribute dataset
    metadata
  • ltAttributegt

13
Value Added Services (1) Cartograms?
A Cartogram Generation Service Cartograms
represent map feature surfaces in such a way,
as to make them proportional to a given
statistical variable. This representation
method mostly derives from "classical" maps
(i.e., maps representing ground topography) in
the sense that the transformation can only be
processed on an already given geometry.
Topographical polygon layers are thus mostly
used as a starting point for the production of
any cartogram. - ScapeToad In reality it
looks more like this...
  • Uses the excellent Scape Toad code at the backend
    to generate and output Cartograms
  • (chorogram.choros.ch/scapetoad)?
  • Uses the Gastner/Newman algorithm

14
Value Added Services (1) Cartograms?
We have developed a simple Cartogram Generation
Service which takes a number of parameters, some
of which have default values. We've mimicked
these using the ScapeToad's API. layerengland_o
a_2001 attributepopulation attrTypemassdensit
y (A mass (e.g. a population or a wealth) is
measured or estimated over the whole surface
of each polygon a density can be a
massmass ratio or a masssurface ration)? url
(example - must be encoded)? http//diad.edina.
ac.uk/service/joinedData?datasetdataset_name
quality50 gridtrue rows100 http//a.webs
ite.ac.uk/service/cartogram? layerengland_oa_2001
attributeks0080001urlhttp//anothersite.ac.uk/
test_1240418191235.zip
15
Value Added Services (1) Cartograms?
Worldmapper example Age of Death?http//www.wor
ldmapper.org/
DIaD generated, Deprivation(ONS)Income scores,
Swindon
16
Value Added Services (2) Bittorrent?
  • A peer-to-peer file sharing protocol used for
    distributing large amounts of data.
  • BitTorrent is one of the most common
    protocols for transferring large files, and
  • by some estimates it accounts for about 35
    of all traffic on the entire Internet.
  • The protocol works initially when a file
    provider makes his file (or group of files)?
  • available to the network. This is called a
    seed and allows others, named peers, to
  • connect and download the file. Each peer
    that downloads a part of the data makes
  • it available to other peers to download.
    After the file is successfully downloaded by
  • a peer, many continue to make the data
    available, becoming additional seeds.
  • This distributed nature of BitTorrent leads to
    a viral spreading of a file throughout
  • peers. As more seeds get added, the
    likelihood of a successful connection increases
  • exponentially. Relative to standard Internet
    hosting, this provides a significant
  • reduction in the original distributor's
    hardware and bandwidth resource costs.
  • Provides redundancy against system problems
    and reduces dependence
  • on the original distributor.

17
Value Added Services (2) Bittorrent?
  • A Bittorent creation service will be added to
    the linked outputs and a tracker
  • established to allow geolinked reasults to be
    downloaded via a p2p client
  • e.g uTorrent
  • Rationale increasingly researchers/students
    want access to resources from home
  • Home machines tend to have lower bandwidth than
    those directly available from
  • SuperJANET backbone
  • So the Bittorent approach means users can
    'share the load' of large files
  • Note that boundary data statistics data can
    quite lareg files sizes (Gb vs Mb)?

18
Whither open source?
  • Our demo client uses a front and back-end stack
    of OSS
  • Openlayers
  • Postgis
  • Geoserver
  • OGR
  • ScapeToad
  • Our own code for the GLS is (will be) opensource

19
General Observations
  • Open standards (and OSS) have a definite role
    but...
  • They are not an end in themselves
  • They are not always as mature (or static) as
    you might wish
  • Things evolve - often in short time periods
  • Users (!)?
  • Interoperability (the holy grail) is possible
    but there are significant
  • barriers
  • AA issues (UKAMF web services - GeoXACML?)?
  • Scalability (Cloud/Grid ??)?
  • Evolving delivery paradigms e.g. mobile
  • User expectations vs resourcing constraints

20
DIaD in progress..
21
http//devel.edina.ac.uk8080/diad/diad.html
Write a Comment
User Comments (0)
About PowerShow.com