Title: Development of a Grid Enabled Occupational Data Environment
1Development of a Grid Enabled Occupational Data
Environment
- GEODE www.geode.stir.ac.uk
- Paper presented to the Second International
Conference on e-Social Science, Manchester, 28-30
June 2006
2Development of a Grid Enabled Occupational Data
Environment
- Introduction Occupational Information
- Activities in two areas
- Occupational Information Depository
- Access to occupational information
- Conclusions and prospects
3Whats the problem?
- Indexed mainly by Occupational Unit Group (OUG).
But - Numerous alternative occupational data files
(time country format) - Alternative OUG schemes other index factors
(employment status) - Inconsistent translations to social
classifications by file or by fiat - Dynamic updates to occupational data resources
- Low uptake of existing occupational information
resources - Strict security constraints on users
micro-social survey data
4Some illustrative occupational information
resources
5GEODE Grid Enabled Occupational Data Environment
- Objectives
- Operate as a portal
- Facilitate linking occupational information to
users datasets - (initial focus on CAMSIS occupational information
resources) - GEODE data resources occupational information
data curated as data service in Stirling,
accessed by users via portal - Create an international Virtual Organization for
occupational data community - Sharing, indexing, curating diverse
occupational data - Other analytical functions on occupational data?
6GEODE Building blocks
- Globus Toolkit 4 (WSRF implementation)
- To build grid application services
- GridSphere 2.1.2 (portal framework JSR 168)
- OGSA-DAI (data access grid middleware)
- http//www.ogsadai.org.uk/
- DDI (social science metadata in XML)
- http//www.icpsr.umich.edu/DDI/
- Development environment
- Jakarta Tomcat 5.x
- Axis SOAP Engine
- Java
72) Occupational Information Depository
- Grid as a system that (e.g. Foster et al 2001)
- coordinates resources that are not subject to
centralized control - uses standard, open, general-purpose protocols
and interfaces - delivers non-trivial qualities of service
- Use with occupational information depository
- Create a community where members have abstract
access to heterogeneous resources securely, and
achieve wider collaboration
8GEODE - architecture
9GEODE Occupational Information Depository
- Data Index Service uses DDI and OGSA-DAI
- User Requirements / Evaluations
- Three elements
- Semantic data curation
- Data storage
- Data indexing / access
10Occupational information depository
- 2.1) Semantic curation of occupational
information - Establish a GEODE-M meta-data subset (.xml)
- Founded on Michigan Data Documentation Initiative
- Minimise curation requirements to suit occ.
information resource providers (pilots) - Web proforma entry
- via Portal using Gridsphere
11Occupational information depository
- 2.2) Storing occupational information resources
- Considerations
- All data stored at GEODE vs Linkage to external
data - Proprietary software (plain text / SPSS / STATA)
- Rectangular index files vs other formats (e.g.
pdf) - index file format is easy and aids data storage
/ indexing - Finite number of occ info. files / model of
plurality of supply - International community of data providers
- Negligible security restrictions (free online
resources) - Strategy
- GEODE-M proforma, suits all formats, completed
online - Translation to csv index file
- Modify GEODE-M record for index file
- (2) (3) performed automatically or manually
- Storage OGSA-DAI framework to link index files
12Occupational information depository
- OGSA-DAI implementations on prototype service
- Testing dynamic deployment of selected data
resources (CAMSIS) - Registration with index service (pilot tests)
- Searchable via portal service
- OGSA-DAI evaluations
- Foundations suited to collation of diverse occ
data resources - Also facilitates data access functions (see 3)
- Accepts GEODE-curated resources externally
curated resources and potential connections with
other Grid data services - issues in support for alterative security levels
to allow modification of initially deposited
resources
13Occupational information depository
- 2.3) Virtual Organisation for Occupational
Information Depository - MDS (via GT4) to manage VO access to and
distribution of occupational information
resources - International virtual community
- Dynamic data supply
143) Access to Occupational Information
15GEODE portal access
- 3.1) File linkage mechanisms
- Multiple occupational variables on (A)
- Strict security constraints on (A)
- Inconsistent OUG formats on (A)
- Prototype linkages (e.g. CAMSIS) require full
access to (A) - Cater to limited access to (A)
- Investigate digital certification (X.509) to
allow restricted data transfer A_OUGs
A_context - Requirements analysis
- Minimal user certification process
- Avoid application installation by users
- Users complex survey data (e.g. multiple
occupational records)
Micro-social data (A) ? Occupational information
resources (B)
16GEODE portal access
- 3.2) Analytical queries
- Process analytical tasks on aggregate
occupational information resources - Summary data
- Coverage searches
- Summary statistics
- Consider more complex analyses?
- CAMSIS derivations
- Involve interactive data management tasks
- cf. Nesstar / Data Web
174) Conclusions and prospects
- Occupational Information Depository
- OGSA-DAI implementations
- Index-files annotated through GEODE-M
- Some ongoing manual support requirements
- Portal framework
- Accessible GT4 / GSI structures
- Curation of occupational data
- Contribution widely used international resources
- Semantics data annotation (DDI)
- Generic data service
- Hinges on numeric OUG index cf. CASCOT
- other application areas e.g. Education,
Geography
18GEODE, eScience and eSocial Science
- Some tentative comparisons...