Title: Dissertation Defense
1Dissertation Defense
- Turning Information into Action Assessing and
Reporting GIS Metadata Integrity Using Integrated
Computing Technologies
Timothy Mulrooney
2Outline
- GIS Metadata Background
- The Problem
- Research Questions
- Methodology
- Testing and Results
- Conclusion
- Discussion
3GIS Data Facts
- Most data have a spatial element
- Most expensive component of GIS is data
development
4What is Metadata?
DUI in Winston-Salem
How Did you Create These Data?
What Do These Data Represent?
How can I Get in Touch with the Person Who
Manages the Data?
When Were These Data Published?
5The Problem
- Within a Single Metadata File
- More than 400 Elements
- 7 FGDC (as per CSDGM) Required Elements
- 15 FGDC Suggested Elements
- 21 Other Interesting Elements
- My previous work
- 120 Databases
- 50 110 layers per database
6Research Questions
- How can mathematical methods be applied to GIS
metadata to support the decision making process? -
7Methodology
- Idea arose from mass population and extraction of
metadata values - Used ArcObjects/VBA
- Previous metadata extraction limited to
- software package
- operating system
- How can this be done on a regular basis?
8Metadata Assessment and Reporting Tool (MART)
9Data Preparation
- Perl (Practical Extraction and Reporting
Language) - Perl extracts elements from various XML metadata
files and puts them into single CSV file
10FGDC Compliancy
- Test FGDC Compliancy
- Used Perl to check if appropriate metadata
elements were populated
11Data Analysis
- What descriptive metrics can best be applied to
GIS metadata? - Temporal Mean
- Temporal Median
- Converted big Endian format (ISO 8601) to ratio
number, performed calculations and then back to
date
12Data Analysis
- Use Perl and R programming language to
dynamically assess quantitative and qualitative
metadata fields - R is a language and environment for statistical
computing and graphics
13Results from this Analysis
Average Horizontal Accuracy 104.3
Meters Temporal Mean 20020402
Average Horizontal Accuracy 31.7
Meters Temporal Mean 20071002
14Supervised Techniques
- Custom application used to query database
containing metadata information - PHP language and MySQL database to query
information about all data layers using web
interface - Helps to guide the decision making process
15Unsupervised Data Mining Techniques
- Process of sorting through large amounts of data
in order to pick out relevant information - Rule Induction (Association Rule Mining)
- Discover interesting relationships between
variables - Beer and Diapers example
- Integrate existing Perl ARM module with custom
application to create transaction table from
which rules are derived.
16Transaction Table Creation
- Decomposition of a 2-dimensional array into
1-dimension. For 890 layers and 43 attributes,
there will be 38,270 transactions - How to express quantitative values in nominal or
ordinal environment (low, medium, high, location) - How to express categorical data within
transaction table
17MARTO-XML
- XML Standard used to describe output from
analysis (Metadata Assessment and Reporting Tool
Output)
18Data Rendering
- Results are published in a web page
- Modules tied together using batch files
- New data and graphs are created on a schedule
- Old data are archived
- New links established
- Saved and referenced from legacy XML files
19Testing
- Human Testing of 40 respondents using real GIS
data on MART - GIS professionals navigate results from analysis
in web environment - Technology Acceptance Model (TAM) used to assess
the effectiveness of this technology
20Testing Environment
- GIS database of 890 individual data layers
- Ran and published output from all aforementioned
modules - Surveyed 40 respondents for their opinions on the
applications ease of use, usefulness, potential
utility and the intention to use the software
21Testing FGDC Compliancy
22Temporal and Horizontal Accuracy
23Supervised Techiques
- Database from 890 data layers was queried using a
web interface created using PHP and dynamically
created HTML form elements - Output was published in a tabular HTML table with
records satisfying the data query being published
24Unsupervised Techniques
- Look for patterns within a large transaction
table within a support, confidence and strength - Used a support Level of 2 (with a support level
of 4, there would be more than 526,00 rules) - For support level 2 and confidence of .7 (1
antecedent and 1 consequence), 6204 rules were
created - Results are published in a .txt file
25Sample Rules Created
- 67 1.000 Place_KeyNorth_Carolina gt
GeoidNOT_FOUND - 508 0.992 Place_KeyForsyth_County gt
Publication_DateMedium - 32 0.800 LocationNorthwest gt Publication_DateOl
d - 353 1.000 Data_ThemePublic_Safety gt
EllipsoidNOT_FOUND - 14 0.824 Data_ThemeWetlands gt
Publication_DateUnknown - These rules combined with supervised techniques
can dictate allocation of resources and decisions
in the future
26Testing
- Technology Acceptance Model
- Uses survey-based questions to distinguish
relationship between a technologys - Perceived Ease of Use (PEOU)
- Perceived Usefulness (PU)
- Attitude Towards Using (ATTIUDE)
- Intention to Use (ITU)
27TAM Model Used
- H1 Perceived Ease of Use for MART has a
significant effect on the Perceived Usefulness of
MART. - H2 Perceived Ease of Use for MART has a
significant effect on the Attitude Towards Using
MART - H3 Perceived Usefulness of MART has a
significant effect on the Attitude Towards Using
MART. - H4 Perceived Usefulness of MART has a
significant effect on Intention to Use MART. - H5 Attitude towards using MART has a
significant effect on Intention to Use MART.
28TAM Question
29Precursory Analysis of People Taking Test
30Results from Respondents
- People responded most positively to the ease of
use - Least positively towards the intention to use
31Hypothesis Testing
- After performing Chronbachs alpha to measure
reliability and PCA to explain variance,
hypotheses were tested from groups of question
used in a survey.
32Hypothesis Testing
33Hypothesis Testing
34Explanation of Hypothesis Testing
- H3 not accepted Strong correlation between
Perceived Ease of Use and Attitude Towards Using.
Perceived Ease of Use is more of a contributing
factor towards the Attitude Towards Using than
Perceived Usefulness. - H5 not accepted Respondents impression about
open source environment in place to implement
MART, but was accepted at about 70 CI.
35Conclusions
- Increasing schism between data creation and its
assessment - Metadata reinforces quality control and quality
assurance procedures employed by an organization - We need a means so everyone can assess various
dimensions of metadata in a timely manner - MART serves as a way to quantify GIS metadata
- MART provides a forum so users can interact with
GIS metadata with an end goal of supporting
business decisions which ultimately save time and
money
36Conclusions
- Quantitative metadata elements such as FGDC
compliancy, date and horizontal accuracy can be
assessed using programming languages such as R
and Perl - Users can search GIS metadata using supervised
techniques via a web interface - Association Rule Mining can be applied to GIS
metadata - If given a choice, users prefer to query GIS
metadata as opposed to being given results from
unsupervised techniques - Using TAM, 3 out of 5 research hypotheses
supported at 95 CI - Based on user feedback, the implementation or
need of MART and open source environment within
their IT was the biggest hindrance to a users
intention to use MART
37Discussion
- Integration of MART with other forms of
geo-referenced data - Remotely sensed data and ortho-imagery
(Laboratory for Advanced Information Technology
and Standards) - TINs
- Topologies
- Relationship Classes
- Stand-alone tables
- Metadata and proprietary format
- Usability with current GIS software
- VBA / ArcObjects to convert metadata in BLOB
format to XML for ESRI software - Various Accuracies within MART
- Temporal and Horizontal
- Attribute
- Logical Consistency
- Semantic
38Discussion
- Interestingness problem
- 6,204 rules at support level 2 and confidence of
.7 - Cardinality of data
- 43 attributes ? 6,204 rules
- 6 attributes ? 73 rules
- Attributes selected were Data Theme, Location,
Horizontal Accuracy, Publication Date,
Responsible Party, Metadata POC - TAM Methodology different models and hypotheses
could be proposed - Presentation of unsupervised techniques in a text
file. Web environment may be more useful - Understanding of the open source environment
39 40(No Transcript)
41(No Transcript)
42Decomposition of XML File Using Perl
- metadata"r01_Data_Set_Title" gt
"idinfo_citation_citeinfo_title" - Using the following command
- traverse all files
- foreach filename (_at_files)
-
- filename s/\\/\//g change all forward
slashes to back-slashed to allow for proper
navigation - print "\n\n..... Decomposing ",
basename(filename), " ...........\n" - Create structure to traverse XML schema.
Before going to the next value, however, - we need to reset the hash value.
- tree XMLin(filename)
- metadata"rMissing" ''
- metadata"fileName" filename
- metadata"sMissing" ''
- foreach key (sort keys SearchList)
-
- print "key SearchListkey\n"
- Item FindItem(SearchListkey)
- print "Item Item\n\n"
43National Mapping Accuracy Standards
For Scales 120,000 or greater .033 inches
Scale of Map For Scales 120,000 or lower .02
inches Scale of Map
44Sample Output from Supervised Techniques
45Chronbachs Alpha
Chronbachs alpha is computed using the number of
respondents in the set, the variance of the data
and mean of the covariance between all members of
the set. While there is no universal threshold
to determine data consistency, Hair et. al.
(1998) suggested a minimum threshold between .6
and .7. As per Table 10, only 1 of these values
(Perceived Ease of Use) is between .6 and .7
while two of the values (Perceived Usefulness and
Attitude Towards Using) are between .7 and .8.
The Chronbachs Alpha constant for the Intention
to Use component is .807, which is considered
excellent (Nunnally 1978). Given these values,
it can be surmised that the questions posed for
the respondent ser
46Principal Components
To help understand the individual factors that
contribute to any potential inconsistency,
principal component analysis was performed on
each of the individual questions to help
determine their potential contribution to the
variability of the observed results. Four
factors were calculated, based on the different
components of the research hypotheses to be
tested. After rotation, the Perceived Ease of
Use accounted for 56.33 of the variance. The
Perceived Usefulness components account for
11.93, Attitude Towards Using accounted for
9.24 while the Intention to Use factor accounted
for 7.04. Table 11 shows the items and factor
loadings for the individual factors. Finally,
some other basic correlations were run between
potentially dependent factors such as age and sex
to help determine their potential contribution to
the results. However, no significant correlation
was found between participants age, gender and
even self-described GIS experience versus
dependent variables such as Perceived Ease of
use, Perceived Usefulness, Attitude and Intention
to Use that will be used in the TAM analysis.
47Potential RS Attributes