Unlocking data creating knowledge - PowerPoint PPT Presentation

1 / 64
About This Presentation
Title:

Unlocking data creating knowledge

Description:

Advantages: Create to suit individual needs of an organisation or a data series ... locating the Server directory on a LAN, and entering an appropriate username and ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 65
Provided by: margar64
Category:

less

Transcript and Presenter's Notes

Title: Unlocking data creating knowledge


1
  • Unlocking data creating knowledge

2
Data Publishing with Nesstar Publisher
IASSIST/IFDO 2005 Edinburgh, Scotland Workshop 5
Margaret Ward Jostein Ryssevik Cliff Dive
3
Data publishing with Nesstar Publisher
  • This aim of this workshop is to provide an
    introduction to Nesstar Publisher
  • By the end of this session you will be able to
  • prepare and publish micro-data (survey) files
  • prepare and publish a simple cube

4
Programme 1
  • Overview of the Nesstar system Publisher
    functionality
  • Using Nesstar Publisher - Survey (micro) data
  • Practical session 1
  • Publishing - using Manage Server
  • Templates
  • Practical session 2

5
Programme 2
  • Overview of the Hierarchical Publisher
  • Using Nesstar Publisher - Cube (tabular) data
  • Practical session 3
  • Introduction to the Resource Publisher
  • Overview of the new Publisher v3.5
  • Questions
  • Practical session 4

6
Nesstar - an overview
Metadata and data editing/transformation
Data and metadata retrieval and display
Extract
Metadata and data input
Publisher
End-user clients
Retrieve
Load and manage
Internet
7
Nesstar Publisher
  • The Publisher is the ETL (Extract, Transform,
    Load) tool of the Nesstar product suite. It
    enables you to
  • extract data and metadata from a variety of
    sources, systems and formats,
  • clean, change, edit and extend the data and
    metadata, and
  • publish data and metadata to a Nesstar Server
  • The Publisher can also be used to manage the
    content on a Nesstar server
  • The Publisher can serve as general data/metadata
    entry, transformation and editing tool,
    independently of its role in the Nesstar system.

8
Nesstar Publisher cont.
  • The Publisher supports
  • micro-data (e.g. survey-data)
  • hierarchical data (e.g. household studies)
  • aggregated data (multidimensional tables or
    cubes)
  • additional information objects (e.g. reports,
    factsheets, pictures) to be stored on a Nesstar
    server

9
Hierarchical Publisher
  • Enables files to be linked together and analysed
    as one study
  • For example
  • A study may contain household and individual
    level data files which are linked by key
    variables

10
Cube Builder
  • The Cube Builder adds the following cube specific
    information
  • Time and Geographical dimensions
  • The default view of the cube
  • The additivity of the data
  • The cube measure(s)

11
Resource publisher
  • Used to publish external Nesstar resources,
    e.g. PDF files, Word files etc.
  • Uses Dublin core or e-GMS for metadata
  • Enables these external resources to be viewable
    on a Nesstar Server alongside survey data and
    cubes

12
Using Nesstar Publisher
13
Preparing publishing micro-data

Meta- data
Data
Nesstar Publisher
Server
Hierarchical Publisher
NSD- stat file
14
Metadata
  • What is metadata? Basically defined as data
    about data
  • Aim of metadata is to make a resource findable
    and manageable
  • DDI Enables the effective, efficient and
    accurate use of data resources.

15
Metadata standards supported by Nesstar Publisher
  • DDI (http//www.icpsr.umich.edu/DDI)
  • Enables the effective, efficient and accurate
    use of data resources
  • Dublin Core (http//uk.dublincore.org/)
  • A standard for cross-domain information
    resource description
  • e-Government Metadata Standard (e-GMS)
    (http//www.govtalk.gov.uk/)
  • To ensure maximum consistency of metadata
    across public sector organisations

16
Adding metadata - Publisher templates
  • Use metadata templates
  • Can use DDI or Dublin Core/e-GMS
  • Can add controlled vocabulary lists and default
    text
  • Can rename template fields, i.e. use familiar
    terms
  • Advantages
  • Create to suit individual needs of an
    organisation or a data series
  • Use of standard templates ensures consistent use
    of metadata fields
  • Can add help information about each field to
    assist the data publisher

17
Importing / Exporting of data
  • Formats for
  • Import Export
  • DDI document .xml
  • SPSS .sav, .por, .sps .sav, .por, .sps
  • SAS .sp1 .sas (syntax)
  • Stata .dta (STATA 7 STATA 8) .dta
  • Statistica .sta .sta
  • NSDstat .nsf .nsf
  • dBase .dbf .dbf
  • DIF .dif .dif
  • Fixed Format ASCII .dat
  • Delimited text .txt, .csv .txt
  • PC-Axis .px

18
Import from DDI / Export DDI
  • Enables the re-use of metadata. Available options
    are
  • Import from Dataset import the metadata from an
    existing NSDstat file
  • Import from DDI import an existing XML file
  • Caution! Invisible Metadata may be present
  • Export DDI Export metadata to an XML file

19
Variable level metadata
  • Variable and category labels can easily be
    edited/added
  • Able to change the case of variable/category
    labels
  • Variable repository makes re-use of category
    labels possible
  • Local and Global variable repositories - share
    information with others
  • Add a map link to a variable
  • Adding question text and variable notes
  • to each variable separately
  • to a block of variables
  • Identify Weight variables
  • Identify Time variables
  • Missing data assignments

20
Data manipulation functions
  • The Publisher enables you to
  • View the data as a matrix allowing direct data
    entry or editing
  • Cut and paste data
  • Add, insert and copy variables of different
    types, e.g. numeric, Fixed string, Dynamic
    string, Date
  • Insert/replace data insert data matrix from
    dataset, or fixed format text
  • Delete variables
  • Sort cases
  • Delete cases
  • Conversion between variable types

21
Variable Groups
  • Useful for grouping variables that relate to the
    same topic or theme together
  • Hierarchy of groups is supported
  • Variables can belong to more than one group
  • Groups can be arranged in any order
  • Information about that group can be added, e.g. a
    group definition
  • Advantages
  • Make it easier for end-users to navigate the
    dataset
  • Reduce the load time of a dataset when published

22
Using the Publisher
  • Demonstration
  • Practical session 1

23
Manage Server, PublishingandTemplates
24
Manage Server
  • Provides the means to link to Nesstar Servers to
    enable publishing
  • Enables the data publisher to manage the
    resources on a Nesstar server so that they can
    then
  • Create new catalogues then name and describe them
  • Reorganise the catalogue hierarchy
  • Add files to a catalogue
  • Move files between catalogues
  • Delete files and catalogues

25
Publishing
  • Add a Nesstar Server using the Server-URL, or
    locating the Server directory on a LAN, and
    entering an appropriate username and password
  • Can publish to a Nesstar Server over a local area
    network (LAN) or over the Internet
  • Able to publish to multiple Servers in a single
    operation
  • Options to publish data and metadata, metadata
    only or Republish
  • Catalogues can be automatically selected if
    Keywords or Subject classification terms
    within the metadata match existing catalogue
    names
  • Able to publish to a Hidden catalogue not
    visible to end-users
  • Able to view the published data directly from the
    Publisher Open in Web client option

26
Adding metadata - Publisher templates
  • Use metadata templates
  • Can use DDI or Dublin Core/e-GMS
  • Can add controlled vocabulary lists and default
    text
  • Can rename template fields, i.e. use familiar
    terms
  • Advantages
  • Create to suit individual needs of an
    organisation or a data series
  • Use of standard templates ensures consistent use
    of metadata fields
  • Can add help information about each field to
    assist the data publisher

27
Manage Server, Publishing and Templates
  • Demonstration
  • Practical session 2

28
The Hierarchical Publisher
29
Preparing publishing hierarchical data

Meta- data
Data
Nesstar Publisher
Server
Hierarchical Publisher
NSD- stat file
30
Hierarchical Publisher
  • Used for datasets that are hierarchically related
    -
  • For example
  • Household file
  • - Individual file
  • Create NSDstat files using the main Publisher
  • Add Study metadata to one of the files
  • Within the Hierarchical Publisher identify the
    key variables (used to link the files together)
  • Build the hierarchy of files
  • Validate the hierarchy
  • Publish

31
IntroductiontoNesstar Cubes
32
Cube agenda
  • What is a cube?
  • What is not a cube?
  • How to use Nesstar Publisher and the Cube Builder
    to prepare a simple cube

33
What is a cube?
  • A cube (or table) typically consists of
    aggregated data
  • This data is defined by its dimensions and
    measures
  • Dimension variables describe the data, e.g.
    gender, and consist of categories (male, female)
  • Measure variables represent the data, or
    values, found in the table cells

34
What is a cube? (2)
  • Each cell in a table must be described by all
    dimensions
  • A dimension can be hierarchical constructed
  • Geographical dimensions can be linked to a map

35
Example 1 - A simple cube(Population totals)
36
Example 1 information
  • Three dimensions
  • Area (East Anglia, Colchester, Chelmsford,
    Clacton)
  • Gender (Male, Female)
  • Year (2002, 2003, 2004)
  • The Measure is the population figures

37
Hierarchical dimension (2 levels)
AREA
Regions East Anglia
Yorkshire South
West Sussex
Towns Colchester Clacton Chelmsford Leeds
York Sheffield Plymouth Exeter
Brighton Hove
38
Example 2a - Not a cube
39
What is not a cube?
  • How many dimensions does this cube have?
  • Do all dimensions describe each data point, i.e.
    each cell in the table?
  • What is its measure?

40
Example 2b - A cube
41
Preparing and publishing a cube
XML File

Meta- data
Data
Nesstar Exporter
Nesstar Publisher
Server
Nesstar Cube Builder
NSD- stat Cube File
NSD- stat file
42
Creating a cube - Nesstar Publisher
  • Create a data file for input into the Nesstar
    Publisher (.csv/.tab file)
  • Using the Publisher - import the .csv/.tab
    file
  • Add any metadata required, e.g. title,
    description
  • Create the hierarchy for any hierarchical
    dimensions, e.g. Area
  • Add a link to a map, if required

43
Example 3 - Life expectancy (non-additive)(Age
in years)
44
Input file for the Publisher
  • Input files can be comma separated (.csv) or tab
    delimited (.tab)
  • Each row in the file must describe a cell in the
    table,
  • e.g. tab delimited
  • England 2002 75
  • South East 2002 77
  • Colchester 2002 76

45
Example 3 - Input file
46
(No Transcript)
47
(No Transcript)
48
Creating a cube - Cube Builder
  • Use the Cube Builder to
  • Select the cube type, e.g. Non-additive,
    Stock-additive, Flow-additive
  • Define the time and geographical dimensions
  • Define the measure
  • Create the default view
  • Publish the cube to a Nesstar Server

49
Non-additive cubes
  • No aggregation of the measure is possible across
    dimensions
  • Data typically found in this type of cube are
    percentage figures, rates, life expectancy

50
Type of additive cube
  • For additive cubes, aggregation of the data
    (measure values) is possible
  • Stock the measure represents a number at a point
    in time so no aggregation over time is possible.
    For example yearly population figures, number of
    registered businesses
  • Flow (fully additive) the data can be aggregated
    along all dimensions. For example sales figures,
    number of reported crimes

51
Additive data
  • For additive data, a higher-level category is
    automatically created containing the aggregated
    data from the lower levels
  • No higher-level data should be included in the
    data file as these are calculated automatically
  • This new category is called ALL unless it is
    created within the Publisher, or was part of the
    original table
  • For example in the following cube, East Anglia
    Colchester Chelmsford Clacton

52
Example 4 - Additive (stock)(Population totals)
53
Example 4 - Input file
54
Multiple measures
  • Some cubes may contain a number of measures
  • Following cube contains Population totals with
    relevant percentages. Both measures are
    non-additive
  • Different measures in the same cube can be
    different types, e.g. one may be non-additive and
    the other additive.

55
Example 5 - Multiple measures
56
Example 5 - Input file
57
Measure types
  • There are 5 possible measure types used in
    Nesstar
  • Average average of underlying values
  • Count number of underlying values
  • Minimum minimum of underlying values
  • Maximum maximum of underlying values
  • Sum total of underlying values

58
Examples of more complex tables
  • What if I have several identical tables, that
    only differ in the year they refer to?
  • Combine them using YEAR as an additional
    dimension
  • What if happens if I have several almost
    identical tables, but information for one
    category (e.g. Male) is missing for one year?
  • Combine the tables, and accept that there will be
    an empty column for Male for that year

59
Related tables
60
Combining tables
61
Preparing cubesSummary
  • Once tables are combined they can be prepared in
    the usual way, e.g. create a comma separated
    (.csv) or tab delimited (.tab) file
  • Import into the Publisher
  • Add metadata
  • Add any necessary information, e.g. level names,
    link to a map
  • Open the Cube Builder
  • Define type of cube, e.g. Non-additive
  • Create default view
  • Publish to a Nesstar Server

62
Publishing a simple cube
  • Demonstration
  • Practical session 3

63
Resource publisher
  • Used to publish external Nesstar resources,
    e.g. PDF files, Word files etc.
  • Uses Dublin core or e-GMS for metadata
  • Enables these external resources to be viewable
    on a Nesstar Server alongside survey data and
    cubes

64
Resource Publisher
  • Demonstration
Write a Comment
User Comments (0)
About PowerShow.com