Preparing Data for Analysis - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Preparing Data for Analysis

Description:

Collecting, entering, cleaning, and processing of information ... Editor - a text editor where you compose SPSS commands and submit them to the SPSS processor. ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 29
Provided by: tjos
Category:

less

Transcript and Presenter's Notes

Title: Preparing Data for Analysis


1
Preparing Data for Analysis
  • SPSS Training
  • Thomas Joshua, MS
  • July, 2008

2
Lecture Overview
  • Why do we need the data management and data
    preparation for analysis
  • Data preparation and general format in SPSS
  • Introduction to SPSS and overview of SPSS for
    Windows

3
What is Data Management (DM)
  • Collecting, entering, cleaning, and processing of
    information gathered during a research project.
  • Is a work that involves the planning,
    development, implementation, and administration
    of systems for the acquisition, storage, and
    retrieval of data while protecting it by
    implementing high security levels.

4
Why need DM?
  • Decision-making, strategic planning and program
    design should be data driven - Appropriate data
    is often available, but the process of analysis
    is daunting for many.
  • Improving the communication
  • Ensure that the data are transferred in the
    proper format for the proposed analyses.
  • Help you understanding reality from data

5
Data Preparation
  • We suggest a phased approach that produces
    analysis-ready data without destroying the
    original dataset.
  • Also look at ways to document your dataset so
    that it will make sense when reviewed at a later
    point, or by other people.
  • In general, SPSS, Microsoft Excel and Access are
    acceptable as long as it is appropriately
    formatted. We will use SPSS as the general
    example.

6
Main Steps for Data Preparation
  • Create the data file.
  • Original data
  • Interim data
  • Documentation
  • Clean the data
  • Process the data
  • Create an analysis-ready copy of the data
  • Document the data

7
Some critical points to format your data set
  • Do not include header, trailer information,
    subtotals, or other extraneous information. For
    descriptive purposes you may include one row
    giving variable names.

8
  • Format data so that each variable is in its own
    single column.
  • For example,
  • and better not
  • and absolutely not

9
  • Columns must be explicitly filled with data
  • and not

10
  • All columns must have the same number of rows and
    Missing data (empty cells) Use a blank space or a
    . to indicate missing data.
  • - Missing value cannot be treated as 0.
  • Keep an identical field, such the subject ID.

11
  • If there are multiple data files, do not rely on
    the file names to carry variable information. For
    example, if separate files are used for the
    results of two treatments, include a column in
    each file containing the name of the treatment.

12
and Not
13
  • Counted proportion data.
  • If data consists of counted proportions,
  • e.g. number of individuals responding out of
    total number of individuals,
  • do not reduce the data to percentages or
    proportions beforehand.
  • It is recommended that both numerator and
    denominator of the proportion be entered as
    separate columns.

14
  • For example,
  • Not
  • It is easy to compute proportions during the
    analysis if they are required, but alternative
    analyses such as logistic regression may be
    precluded if original counts are unavailable.

15
  • Polytomous data.
  • If data consists of numbers falling into a
    number of mutually exclusive classes, do not
    reduce to proportions or percentages beforehand,
    but enter the integer counts.
  • For example,
  • not

16
  • Each column has its own criteria or meaning.
  • Narrow the definition for each variable means to
    create a new variable.
  • For example,
  • gender, age ? gt45 year-old male

17
Introduction to SPSS software
  • SPSS is a software package used for conducting
    statistical analyses, manipulating data, and
    generating tables and graphs that summarize data.
  • Statistical analyses range from basic descriptive
    statistics, such as averages and frequencies, to
    advanced inferential statistics, such as
    regression models, analysis of variance, and
    factor analysis.
  • SPSS also contains several tools for manipulating
    data, including functions for recoding data and
    computing new variables, as well as for merging
    and aggregating datasets.

18
Overview of SPSS for Windows
  • SPSS for Windows consists of five different
    windows, each of which is associated with a
    particular SPSS file type.
  • Data Editor - the window that is open at start-up
    and is used to enter and store data in a
    spreadsheet format. Includes Data View and
    Variable View.
  • Output Viewer opens automatically when you
    execute an analysis or create a graph using a
    dialog box or command syntax to execute a
    procedure. The Output Viewer contains the results
    of all statistical analyses and graphical
    displays of data. All output from these commands
    will appear in the Output Viewer.
  • Syntax Editor - a text editor where you compose
    SPSS commands and submit them to the SPSS
    processor.
  • This document focuses on the methods necessary
    for inputting, defining, and organizing data in
    SPSS.

19
The Data Editor (.sav)
  • The Data Editor window displays the contents of
    the working dataset. It is arranged in a
    spreadsheet format that contains variables in
    columns and cases in rows.
  • The Data View is the sheet that is visible when
    you first open the Data Editor this sheet
    contains the data.
  • The Variable View is the sheet that contains
    information about the variables in the dataset.
  • Datasets that are currently open are called
    working datasets all data manipulations,
    statistical functions, and other SPSS procedures
    operate on these datasets.

20
  • From the menu in the Data Editor window, choose
    the following menu options
  • File ? Open ? Data...
  • The Open File dialog box should automatically
    open to the SPSS directory of example files.
    Choose Employee data.sav from the list and click
    Open. Your Data Editor should now look like this

21
The Syntax Editor (.sps)
  • This SPSS training focuses on the use of dialog
    boxes to execute procedures however, there are
    at least two reasons why you should be aware of
    SPSS syntax, even if you plan to primarily use
    the dialog boxes.
  • First, not all procedures are available through
    the dialog boxes. Therefore, you may occasionally
    have to submit commands from the Syntax Editor.
  • Second, the Syntax Editor is a useful way to save
    a log of what you have done, and to re-run what
    you have done at a later date.

22
The dialog boxes
  • The dialog boxes available through the pull-down
    menus have a button labeled Paste, which will
    print the syntax for the procedure you are
    running in the dialog box environment to the
    Syntax Editor. Thus, you can easily generate SPSS
    syntax without typing in the Syntax Editor.
  • The following dialog box is used to generate
    descriptive statistics. You can also get this
    dialog box by choosing Analyze ? Descriptive
    Statistics? Descriptives
  • then clicking over the two variables using the
    arrow button.

23
  • By clicking on the Paste button, the procedure
    that the above dialog box is prepared to run will
    be written in the form of SPSS syntax to the
    Syntax Editor. Thus, clicking the Paste button in
    the above example would produce the following
    syntax
  • DESCRIPTIVES   VARIABLESsalbegin salary  
    /STATISTICSMEAN STDDEV MIN MAX .

24
The Output Viewer (.spo)
25
  • The left frame of the Output Viewer an outline
    of the objects contained in the window.
  • Descriptives in the outline refers to objects
    associated with the descriptive statistics.
  • The Title object refers to the bold title
    Descriptives in the output.
  • The Active Dataset object refers to the line in
    the output that designates which dataset was used
    to run the analysis.
  • The Descriptive Statistics refers to the table
    containing descriptive statistics.
  • The Notes icon has no referent in the above
    example, but it would refer to any notes that
    appeared between the title and the table.

26
Control the Ouput the Edit menu and Options
menu
27
references
  • Categorical Data Analysis Using the SAS System,
    2nd edition SAS Institute, 2000, M.E Stokes, C.
    S. Davis, G.G Koch ISBN 1580257107
  • Basic and clinical Biostatistics
  • ISBN-139780071410175
  • ISBN 0071410171
  • An Introduction to Categorical Data Analysis, 2nd
    edition by A. Agresti (John Wiley Sons) 2007
    ISBN 0471-22618-1
  • SPSS for windows step by step A simple guide and
    reference,13.0 update.
  • ISBN-13 9780205480715

28
Thank you!!
Write a Comment
User Comments (0)
About PowerShow.com