Title: Unlocking data creating knowledge
1- Unlocking data creating knowledge
2Data Publishing with Nesstar Publisher
IASSIST/IFDO 2005 Edinburgh, Scotland Workshop 5
Margaret Ward Jostein Ryssevik Cliff Dive
3Data publishing with Nesstar Publisher
- This aim of this workshop is to provide an
introduction to Nesstar Publisher - By the end of this session you will be able to
- prepare and publish micro-data (survey) files
- prepare and publish a simple cube
4Programme 1
- Overview of the Nesstar system Publisher
functionality - Using Nesstar Publisher - Survey (micro) data
- Practical session 1
- Publishing - using Manage Server
- Templates
- Practical session 2
5Programme 2
- Overview of the Hierarchical Publisher
- Using Nesstar Publisher - Cube (tabular) data
- Practical session 3
- Introduction to the Resource Publisher
- Overview of the new Publisher v3.5
- Questions
- Practical session 4
6Nesstar - an overview
Metadata and data editing/transformation
Data and metadata retrieval and display
Extract
Metadata and data input
Publisher
End-user clients
Retrieve
Load and manage
Internet
7Nesstar Publisher
- The Publisher is the ETL (Extract, Transform,
Load) tool of the Nesstar product suite. It
enables you to - extract data and metadata from a variety of
sources, systems and formats, - clean, change, edit and extend the data and
metadata, and - publish data and metadata to a Nesstar Server
- The Publisher can also be used to manage the
content on a Nesstar server - The Publisher can serve as general data/metadata
entry, transformation and editing tool,
independently of its role in the Nesstar system.
8Nesstar Publisher cont.
- The Publisher supports
- micro-data (e.g. survey-data)
- hierarchical data (e.g. household studies)
- aggregated data (multidimensional tables or
cubes) - additional information objects (e.g. reports,
factsheets, pictures) to be stored on a Nesstar
server
9Hierarchical Publisher
- Enables files to be linked together and analysed
as one study - For example
- A study may contain household and individual
level data files which are linked by key
variables
10Cube Builder
- The Cube Builder adds the following cube specific
information - Time and Geographical dimensions
- The default view of the cube
- The additivity of the data
- The cube measure(s)
11Resource publisher
- Used to publish external Nesstar resources,
e.g. PDF files, Word files etc. - Uses Dublin core or e-GMS for metadata
- Enables these external resources to be viewable
on a Nesstar Server alongside survey data and
cubes
12Using Nesstar Publisher
13Preparing publishing micro-data
Meta- data
Data
Nesstar Publisher
Server
Hierarchical Publisher
NSD- stat file
14Metadata
- What is metadata? Basically defined as data
about data - Aim of metadata is to make a resource findable
and manageable - DDI Enables the effective, efficient and
accurate use of data resources.
15Metadata standards supported by Nesstar Publisher
- DDI (http//www.icpsr.umich.edu/DDI)
- Enables the effective, efficient and accurate
use of data resources - Dublin Core (http//uk.dublincore.org/)
- A standard for cross-domain information
resource description - e-Government Metadata Standard (e-GMS)
(http//www.govtalk.gov.uk/) - To ensure maximum consistency of metadata
across public sector organisations
16Adding metadata - Publisher templates
- Use metadata templates
- Can use DDI or Dublin Core/e-GMS
- Can add controlled vocabulary lists and default
text - Can rename template fields, i.e. use familiar
terms - Advantages
- Create to suit individual needs of an
organisation or a data series - Use of standard templates ensures consistent use
of metadata fields - Can add help information about each field to
assist the data publisher
17Importing / Exporting of data
- Formats for
- Import Export
- DDI document .xml
- SPSS .sav, .por, .sps .sav, .por, .sps
- SAS .sp1 .sas (syntax)
- Stata .dta (STATA 7 STATA 8) .dta
- Statistica .sta .sta
- NSDstat .nsf .nsf
- dBase .dbf .dbf
- DIF .dif .dif
- Fixed Format ASCII .dat
- Delimited text .txt, .csv .txt
- PC-Axis .px
18Import from DDI / Export DDI
- Enables the re-use of metadata. Available options
are - Import from Dataset import the metadata from an
existing NSDstat file - Import from DDI import an existing XML file
- Caution! Invisible Metadata may be present
- Export DDI Export metadata to an XML file
19Variable level metadata
- Variable and category labels can easily be
edited/added - Able to change the case of variable/category
labels - Variable repository makes re-use of category
labels possible - Local and Global variable repositories - share
information with others - Add a map link to a variable
- Adding question text and variable notes
- to each variable separately
- to a block of variables
- Identify Weight variables
- Identify Time variables
- Missing data assignments
20Data manipulation functions
- The Publisher enables you to
- View the data as a matrix allowing direct data
entry or editing - Cut and paste data
- Add, insert and copy variables of different
types, e.g. numeric, Fixed string, Dynamic
string, Date - Insert/replace data insert data matrix from
dataset, or fixed format text - Delete variables
- Sort cases
- Delete cases
- Conversion between variable types
21Variable Groups
- Useful for grouping variables that relate to the
same topic or theme together - Hierarchy of groups is supported
- Variables can belong to more than one group
- Groups can be arranged in any order
- Information about that group can be added, e.g. a
group definition - Advantages
- Make it easier for end-users to navigate the
dataset - Reduce the load time of a dataset when published
22Using the Publisher
- Demonstration
- Practical session 1
23Manage Server, PublishingandTemplates
24Manage Server
- Provides the means to link to Nesstar Servers to
enable publishing - Enables the data publisher to manage the
resources on a Nesstar server so that they can
then - Create new catalogues then name and describe them
- Reorganise the catalogue hierarchy
- Add files to a catalogue
- Move files between catalogues
- Delete files and catalogues
25Publishing
- Add a Nesstar Server using the Server-URL, or
locating the Server directory on a LAN, and
entering an appropriate username and password - Can publish to a Nesstar Server over a local area
network (LAN) or over the Internet - Able to publish to multiple Servers in a single
operation - Options to publish data and metadata, metadata
only or Republish - Catalogues can be automatically selected if
Keywords or Subject classification terms
within the metadata match existing catalogue
names - Able to publish to a Hidden catalogue not
visible to end-users - Able to view the published data directly from the
Publisher Open in Web client option
26Adding metadata - Publisher templates
- Use metadata templates
- Can use DDI or Dublin Core/e-GMS
- Can add controlled vocabulary lists and default
text - Can rename template fields, i.e. use familiar
terms - Advantages
- Create to suit individual needs of an
organisation or a data series - Use of standard templates ensures consistent use
of metadata fields - Can add help information about each field to
assist the data publisher
27Manage Server, Publishing and Templates
- Demonstration
- Practical session 2
28The Hierarchical Publisher
29Preparing publishing hierarchical data
Meta- data
Data
Nesstar Publisher
Server
Hierarchical Publisher
NSD- stat file
30Hierarchical Publisher
- Used for datasets that are hierarchically related
- - For example
- Household file
- - Individual file
- Create NSDstat files using the main Publisher
- Add Study metadata to one of the files
- Within the Hierarchical Publisher identify the
key variables (used to link the files together) - Build the hierarchy of files
- Validate the hierarchy
- Publish
31IntroductiontoNesstar Cubes
32Cube agenda
- What is a cube?
- What is not a cube?
- How to use Nesstar Publisher and the Cube Builder
to prepare a simple cube
33What is a cube?
- A cube (or table) typically consists of
aggregated data - This data is defined by its dimensions and
measures - Dimension variables describe the data, e.g.
gender, and consist of categories (male, female) - Measure variables represent the data, or
values, found in the table cells
34What is a cube? (2)
- Each cell in a table must be described by all
dimensions - A dimension can be hierarchical constructed
- Geographical dimensions can be linked to a map
35Example 1 - A simple cube(Population totals)
36Example 1 information
- Three dimensions
- Area (East Anglia, Colchester, Chelmsford,
Clacton) - Gender (Male, Female)
- Year (2002, 2003, 2004)
- The Measure is the population figures
37Hierarchical dimension (2 levels)
AREA
Regions East Anglia
Yorkshire South
West Sussex
Towns Colchester Clacton Chelmsford Leeds
York Sheffield Plymouth Exeter
Brighton Hove
38Example 2a - Not a cube
39What is not a cube?
- How many dimensions does this cube have?
- Do all dimensions describe each data point, i.e.
each cell in the table? - What is its measure?
40Example 2b - A cube
41Preparing and publishing a cube
XML File
Meta- data
Data
Nesstar Exporter
Nesstar Publisher
Server
Nesstar Cube Builder
NSD- stat Cube File
NSD- stat file
42Creating a cube - Nesstar Publisher
- Create a data file for input into the Nesstar
Publisher (.csv/.tab file) - Using the Publisher - import the .csv/.tab
file - Add any metadata required, e.g. title,
description - Create the hierarchy for any hierarchical
dimensions, e.g. Area - Add a link to a map, if required
43Example 3 - Life expectancy (non-additive)(Age
in years)
44Input file for the Publisher
- Input files can be comma separated (.csv) or tab
delimited (.tab) - Each row in the file must describe a cell in the
table, - e.g. tab delimited
- England 2002 75
- South East 2002 77
- Colchester 2002 76
45Example 3 - Input file
46(No Transcript)
47(No Transcript)
48Creating a cube - Cube Builder
- Use the Cube Builder to
- Select the cube type, e.g. Non-additive,
Stock-additive, Flow-additive - Define the time and geographical dimensions
- Define the measure
- Create the default view
- Publish the cube to a Nesstar Server
49Non-additive cubes
- No aggregation of the measure is possible across
dimensions - Data typically found in this type of cube are
percentage figures, rates, life expectancy
50Type of additive cube
- For additive cubes, aggregation of the data
(measure values) is possible - Stock the measure represents a number at a point
in time so no aggregation over time is possible.
For example yearly population figures, number of
registered businesses - Flow (fully additive) the data can be aggregated
along all dimensions. For example sales figures,
number of reported crimes
51Additive data
- For additive data, a higher-level category is
automatically created containing the aggregated
data from the lower levels - No higher-level data should be included in the
data file as these are calculated automatically - This new category is called ALL unless it is
created within the Publisher, or was part of the
original table - For example in the following cube, East Anglia
Colchester Chelmsford Clacton
52Example 4 - Additive (stock)(Population totals)
53Example 4 - Input file
54Multiple measures
- Some cubes may contain a number of measures
- Following cube contains Population totals with
relevant percentages. Both measures are
non-additive - Different measures in the same cube can be
different types, e.g. one may be non-additive and
the other additive.
55Example 5 - Multiple measures
56Example 5 - Input file
57Measure types
- There are 5 possible measure types used in
Nesstar - Average average of underlying values
- Count number of underlying values
- Minimum minimum of underlying values
- Maximum maximum of underlying values
- Sum total of underlying values
58Examples of more complex tables
- What if I have several identical tables, that
only differ in the year they refer to? - Combine them using YEAR as an additional
dimension - What if happens if I have several almost
identical tables, but information for one
category (e.g. Male) is missing for one year? - Combine the tables, and accept that there will be
an empty column for Male for that year
59Related tables
60Combining tables
61Preparing cubesSummary
- Once tables are combined they can be prepared in
the usual way, e.g. create a comma separated
(.csv) or tab delimited (.tab) file - Import into the Publisher
- Add metadata
- Add any necessary information, e.g. level names,
link to a map - Open the Cube Builder
- Define type of cube, e.g. Non-additive
- Create default view
- Publish to a Nesstar Server
62Publishing a simple cube
- Demonstration
- Practical session 3
63Resource publisher
- Used to publish external Nesstar resources,
e.g. PDF files, Word files etc. - Uses Dublin core or e-GMS for metadata
- Enables these external resources to be viewable
on a Nesstar Server alongside survey data and
cubes
64Resource Publisher