Unlocking data creating knowledge - PowerPoint PPT Presentation

1 / 64

About This Presentation

Title:

Unlocking data creating knowledge

Description:

Advantages: Create to suit individual needs of an organisation or a data series ... locating the Server directory on a LAN, and entering an appropriate username and ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 65

Provided by: margar64

Category:

more less

Transcript and Presenter's Notes

Title: Unlocking data creating knowledge

1

Unlocking data creating knowledge

2
Data Publishing with Nesstar Publisher
IASSIST/IFDO 2005 Edinburgh, Scotland Workshop 5
Margaret Ward Jostein Ryssevik Cliff Dive
3
Data publishing with Nesstar Publisher

This aim of this workshop is to provide an
introduction to Nesstar Publisher
By the end of this session you will be able to
prepare and publish micro-data (survey) files
prepare and publish a simple cube

4
Programme 1

Overview of the Nesstar system Publisher
functionality
Using Nesstar Publisher - Survey (micro) data
Practical session 1
Publishing - using Manage Server
Templates
Practical session 2

5
Programme 2

Overview of the Hierarchical Publisher
Using Nesstar Publisher - Cube (tabular) data
Practical session 3
Introduction to the Resource Publisher
Overview of the new Publisher v3.5
Questions
Practical session 4

6
Nesstar - an overview
Metadata and data editing/transformation
Data and metadata retrieval and display
Extract
Metadata and data input
Publisher
End-user clients
Retrieve
Load and manage
Internet
7
Nesstar Publisher

The Publisher is the ETL (Extract, Transform,
Load) tool of the Nesstar product suite. It
enables you to
extract data and metadata from a variety of
sources, systems and formats,
clean, change, edit and extend the data and
metadata, and
publish data and metadata to a Nesstar Server
The Publisher can also be used to manage the
content on a Nesstar server
The Publisher can serve as general data/metadata
entry, transformation and editing tool,
independently of its role in the Nesstar system.

8
Nesstar Publisher cont.

The Publisher supports
micro-data (e.g. survey-data)
hierarchical data (e.g. household studies)
aggregated data (multidimensional tables or
cubes)
additional information objects (e.g. reports,
factsheets, pictures) to be stored on a Nesstar
server

9
Hierarchical Publisher

Enables files to be linked together and analysed
as one study
For example
A study may contain household and individual
level data files which are linked by key
variables

10
Cube Builder

The Cube Builder adds the following cube specific
information
Time and Geographical dimensions
The default view of the cube
The additivity of the data
The cube measure(s)

11
Resource publisher

Used to publish external Nesstar resources,
e.g. PDF files, Word files etc.
Uses Dublin core or e-GMS for metadata
Enables these external resources to be viewable
on a Nesstar Server alongside survey data and
cubes

12
Using Nesstar Publisher
13
Preparing publishing micro-data

Meta- data
Data
Nesstar Publisher
Server
Hierarchical Publisher
NSD- stat file
14
Metadata

What is metadata? Basically defined as data
about data
Aim of metadata is to make a resource findable
and manageable
DDI Enables the effective, efficient and
accurate use of data resources.

15
Metadata standards supported by Nesstar Publisher

DDI (http//www.icpsr.umich.edu/DDI)
Enables the effective, efficient and accurate
use of data resources
Dublin Core (http//uk.dublincore.org/)
A standard for cross-domain information
resource description
e-Government Metadata Standard (e-GMS)
(http//www.govtalk.gov.uk/)
To ensure maximum consistency of metadata
across public sector organisations

16
Adding metadata - Publisher templates

Use metadata templates
Can use DDI or Dublin Core/e-GMS
Can add controlled vocabulary lists and default
text
Can rename template fields, i.e. use familiar
terms
Advantages
Create to suit individual needs of an
organisation or a data series
Use of standard templates ensures consistent use
of metadata fields
Can add help information about each field to
assist the data publisher

17
Importing / Exporting of data

Formats for
Import Export
DDI document .xml
SPSS .sav, .por, .sps .sav, .por, .sps
SAS .sp1 .sas (syntax)
Stata .dta (STATA 7 STATA 8) .dta
Statistica .sta .sta
NSDstat .nsf .nsf
dBase .dbf .dbf
DIF .dif .dif
Fixed Format ASCII .dat
Delimited text .txt, .csv .txt
PC-Axis .px

18
Import from DDI / Export DDI

Enables the re-use of metadata. Available options
are
Import from Dataset import the metadata from an
existing NSDstat file
Import from DDI import an existing XML file
Caution! Invisible Metadata may be present
Export DDI Export metadata to an XML file

19
Variable level metadata

Variable and category labels can easily be
edited/added
Able to change the case of variable/category
labels
Variable repository makes re-use of category
labels possible
Local and Global variable repositories - share
information with others
Add a map link to a variable
Adding question text and variable notes
to each variable separately
to a block of variables
Identify Weight variables
Identify Time variables
Missing data assignments

20
Data manipulation functions

The Publisher enables you to
View the data as a matrix allowing direct data
entry or editing
Cut and paste data
Add, insert and copy variables of different
types, e.g. numeric, Fixed string, Dynamic
string, Date
Insert/replace data insert data matrix from
dataset, or fixed format text
Delete variables
Sort cases
Delete cases
Conversion between variable types

21
Variable Groups

Useful for grouping variables that relate to the
same topic or theme together
Hierarchy of groups is supported
Variables can belong to more than one group
Groups can be arranged in any order
Information about that group can be added, e.g. a
group definition
Advantages
Make it easier for end-users to navigate the
dataset
Reduce the load time of a dataset when published

22
Using the Publisher

Demonstration
Practical session 1

23
Manage Server, PublishingandTemplates
24
Manage Server

Provides the means to link to Nesstar Servers to
enable publishing
Enables the data publisher to manage the
resources on a Nesstar server so that they can
then
Create new catalogues then name and describe them
Reorganise the catalogue hierarchy
Add files to a catalogue
Move files between catalogues
Delete files and catalogues

25
Publishing

Add a Nesstar Server using the Server-URL, or
locating the Server directory on a LAN, and
entering an appropriate username and password
Can publish to a Nesstar Server over a local area
network (LAN) or over the Internet
Able to publish to multiple Servers in a single
operation
Options to publish data and metadata, metadata
only or Republish
Catalogues can be automatically selected if
Keywords or Subject classification terms
within the metadata match existing catalogue
names
Able to publish to a Hidden catalogue not
visible to end-users
Able to view the published data directly from the
Publisher Open in Web client option

26
Adding metadata - Publisher templates

Use metadata templates
Can use DDI or Dublin Core/e-GMS
Can add controlled vocabulary lists and default
text
Can rename template fields, i.e. use familiar
terms
Advantages
Create to suit individual needs of an
organisation or a data series
Use of standard templates ensures consistent use
of metadata fields
Can add help information about each field to
assist the data publisher

27
Manage Server, Publishing and Templates

Demonstration
Practical session 2

28
The Hierarchical Publisher
29
Preparing publishing hierarchical data

Meta- data
Data
Nesstar Publisher
Server
Hierarchical Publisher
NSD- stat file
30
Hierarchical Publisher

Used for datasets that are hierarchically related
-
For example
Household file
- Individual file
Create NSDstat files using the main Publisher
Add Study metadata to one of the files
Within the Hierarchical Publisher identify the
key variables (used to link the files together)
Build the hierarchy of files
Validate the hierarchy
Publish

31
IntroductiontoNesstar Cubes
32
Cube agenda

What is a cube?
What is not a cube?
How to use Nesstar Publisher and the Cube Builder
to prepare a simple cube

33
What is a cube?

A cube (or table) typically consists of
aggregated data
This data is defined by its dimensions and
measures
Dimension variables describe the data, e.g.
gender, and consist of categories (male, female)
Measure variables represent the data, or
values, found in the table cells

34
What is a cube? (2)

Each cell in a table must be described by all
dimensions
A dimension can be hierarchical constructed
Geographical dimensions can be linked to a map

35
Example 1 - A simple cube(Population totals)
36
Example 1 information

Three dimensions
Area (East Anglia, Colchester, Chelmsford,
Clacton)
Gender (Male, Female)
Year (2002, 2003, 2004)
The Measure is the population figures

37
Hierarchical dimension (2 levels)
AREA
Regions East Anglia
Yorkshire South
West Sussex
Towns Colchester Clacton Chelmsford Leeds
York Sheffield Plymouth Exeter
Brighton Hove
38
Example 2a - Not a cube
39
What is not a cube?

How many dimensions does this cube have?
Do all dimensions describe each data point, i.e.
each cell in the table?
What is its measure?

40
Example 2b - A cube
41
Preparing and publishing a cube
XML File

Meta- data
Data
Nesstar Exporter
Nesstar Publisher
Server
Nesstar Cube Builder
NSD- stat Cube File
NSD- stat file
42
Creating a cube - Nesstar Publisher

Create a data file for input into the Nesstar
Publisher (.csv/.tab file)
Using the Publisher - import the .csv/.tab
file
Add any metadata required, e.g. title,
description
Create the hierarchy for any hierarchical
dimensions, e.g. Area
Add a link to a map, if required

43
Example 3 - Life expectancy (non-additive)(Age
in years)
44
Input file for the Publisher

Input files can be comma separated (.csv) or tab
delimited (.tab)
Each row in the file must describe a cell in the
table,
e.g. tab delimited
England 2002 75
South East 2002 77
Colchester 2002 76

45
Example 3 - Input file
46
(No Transcript)
47
(No Transcript)
48
Creating a cube - Cube Builder

Use the Cube Builder to
Select the cube type, e.g. Non-additive,
Stock-additive, Flow-additive
Define the time and geographical dimensions
Define the measure
Create the default view
Publish the cube to a Nesstar Server

49
Non-additive cubes

No aggregation of the measure is possible across
dimensions
Data typically found in this type of cube are
percentage figures, rates, life expectancy

50
Type of additive cube

For additive cubes, aggregation of the data
(measure values) is possible
Stock the measure represents a number at a point
in time so no aggregation over time is possible.
For example yearly population figures, number of
registered businesses
Flow (fully additive) the data can be aggregated
along all dimensions. For example sales figures,
number of reported crimes

51
Additive data

For additive data, a higher-level category is
automatically created containing the aggregated
data from the lower levels
No higher-level data should be included in the
data file as these are calculated automatically
This new category is called ALL unless it is
created within the Publisher, or was part of the
original table
For example in the following cube, East Anglia
Colchester Chelmsford Clacton

52
Example 4 - Additive (stock)(Population totals)
53
Example 4 - Input file
54
Multiple measures

Some cubes may contain a number of measures
Following cube contains Population totals with
relevant percentages. Both measures are
non-additive
Different measures in the same cube can be
different types, e.g. one may be non-additive and
the other additive.

55
Example 5 - Multiple measures
56
Example 5 - Input file
57
Measure types

There are 5 possible measure types used in
Nesstar
Average average of underlying values
Count number of underlying values
Minimum minimum of underlying values
Maximum maximum of underlying values
Sum total of underlying values

58
Examples of more complex tables

What if I have several identical tables, that
only differ in the year they refer to?
Combine them using YEAR as an additional
dimension
What if happens if I have several almost
identical tables, but information for one
category (e.g. Male) is missing for one year?
Combine the tables, and accept that there will be
an empty column for Male for that year

59
Related tables
60
Combining tables
61
Preparing cubesSummary

Once tables are combined they can be prepared in
the usual way, e.g. create a comma separated
(.csv) or tab delimited (.tab) file
Import into the Publisher
Add metadata
Add any necessary information, e.g. level names,
link to a map
Open the Cube Builder
Define type of cube, e.g. Non-additive
Create default view
Publish to a Nesstar Server

62
Publishing a simple cube

Demonstration
Practical session 3

63
Resource publisher

Used to publish external Nesstar resources,
e.g. PDF files, Word files etc.
Uses Dublin core or e-GMS for metadata
Enables these external resources to be viewable
on a Nesstar Server alongside survey data and
cubes

64
Resource Publisher