Title: Research Methodology
1Research Methodology
-
- Lecture No 21
- Data Preparation and Data Entry
2Recap Lecture
- In the last few lectures we discussed about
-
- Research Design
- The purpose, investigation type, researcher
interference, study setting, unit of analysis,
time horizon, Measurement of variables - Sources of Data
- Sampling
- Experimental Design
3Lecture Objectives
- Getting the data ready for analysis
- Data preparation
- Coding, codebook, pre-coding, coding rules
- Data entry
- Editing data
- Data transformation
4Data Preparation and Description
- Data preparation includes editing, coding, and
data entry - It is the activity that ensures the accuracy of
the data and their conversion from raw form to
reduced and classified forms that are more
appropriate for analysis. - Preparing a descriptive statistic summary is
another preliminary step that allows data entry
errors to be identified and corrected.
5Getting the Data Ready for Analysis
- After data obtained through questionnaire, they
need to be coded, keyed in, and edited. - Outliers, inconsistencies and blank responses, if
any, have to be handled in some way.
6Coding
- Data coding involves assigning a number to the
participants responses so, they can be entered
into data base. - In coding, categories are the partitions of a
data set of a given variable. For instance, if
the variable is gender, the categories are male
and female. - Categorization is the process of using rules to
partition a body of data. - Both closed and open questions must be coded.
7Coding Cont.
- Numeric coding simplifies the researchers task
in converting a nominal variable like gender to a
1 or 2.
8Code Construction
- There are two basic rules for code construction.
- First, the coding categories should be
exhaustive, meaning that a coding category should
exist for all possible responses. - For example, household size might be coded 1, 2,
3, 4, and 5 or more. - The 5 or more category assures all subjects of
a place in a category.
9Code Construction Cont.
- Second, the coding categories should be mutually
exclusive and independent. - This means that there should be no overlap among
the categories to ensure that a subject or
response can be placed in only one category.
10Code Construction Cont.
- Missing data should also be represented with a
code. - In the good old days of computer cards, a
numeric value such as 9 or 99 was used to
represent missing data. - Today, most software will understand that either
a period or a blank response represents missing
data.
11Codebook
- A codebook contains each variable in the study
and specifies the application of coding rules to
the variable. - It is used by the researcher or research staff to
promote more accurate and more efficient data
entry. - It is the definitive source for locating the
positions of variables in the data file during
analysis.
12Sample Codebook
13Pre-coding
- Pre-coding means assigning codebook codes to
variables in a study and recording them on the
questionnaire. - Or you could design the questionnaire in such a
way that apart from the respondents choice it
also indicates the appropriate code next to it. - With a pre-coded instrument, the codes for
variable categories are accessible directly from
the questionnaire.
14Sample Pre-coded Instrument
15Coding Open-Ended Questions
- One of the primary reasons for using open-ended
questions is that insufficient information or
lack of a hypothesis may prohibit preparing
response categories in advance. Researchers are
forced to categorize responses after the data are
collected.
16Coding Open-Ended Questions Cont.
- In the Figure on the next slide, question 6
illustrates the use of an open-ended question.
After preliminary evaluation, response categories
were created for that item. They can be seen in
the codebook.
17Coding Open-Ended Questions Cont.
18Coding Rules
19Data Entry
- After responses have been coded, they can be
entered into data base. - Raw data can be entered through any software
program. - For example SPSS Data Editor.
20Data Entry Cont.
21Editing Data
- After data entered, the blank responses, if any,
have to be handled in some way, and inconsistent
data have to be checked and followed up. - Data editing deals with detecting and correcting
illogical, inconsistent, or illegal data and
omissions in the information returned by the
participants of study.
22Editing Data Cont.
23Field Editing
- Field Editing Review
- Entry Gaps ?? Callback
- Validates ?? Re-interviewing
24Field Editing Review
- In large projects, field editing review is a
responsibility of the field supervisor. - It should be done soon after the data have been
collected. - During the stress of data collection, data
collectors often use ad hoc abbreviations and
special symbols.
25- If the forms are not completed soon, the field
interviewer may not recall what the respondent
said. - Therefore, reporting forms should be reviewed
regularly.
26Field Editing Cont.
- Entry Gaps ?? Callback
- When entry gaps are present, a callback should be
made rather than guessing what the respondent
probably said.
27Field Editing Cont.
- Validates ?? Re-interviewing
- The field supervisor also validates field results
by re-interviewing some percentage of the
respondents on some questions to verify that they
have participated. - Ten percent is the typical amount used in data
validation.
28Central Editing
- Scale of Study ?? Number of Editors
- At this point, the data should get a thorough
editing. - For a small study, a single editor will produce
maximum consistency. - For large studies, editing tasks should be
allocated by sections.
29Central Editing Cont.
- Wrong Entry ?? Replacements
- Sometimes it is obvious that an entry is
incorrect and the editor may be able to detect
the proper answer by reviewing other information
in the data set. - This should only be done when the correct answer
is obvious. - If an answer given is inappropriate, the editor
can replace it with a no answer or unknown.
30Central Editing Cont.
- Fakery ?? Open-ended Questions
- The editor can also detect instances of armchair
interviewing, fake interviews, during this phase.
- This is easiest to spot with open-ended
questions.
31Central Editing Cont.
Guidelines for Editors
Be familiar with instructions given to
interviewers and coders
Do not destroy the original entry
Make all editing entries identifiable and in
standardized form
Initial all answers changed or supplied
Place initials and date of editing on each
instrument completed
32Handling Dont Know Responses
- When the number of dont know (DK) responses is
low, it is not a problem. However, if there are
several given, it may mean that the question was
poorly designed, too sensitive, or too
challenging for the respondent. - The best way to deal with undesired DK answers is
to design better questions at the beginning. - If DK response is legitimate, it should be kept
as a separate reply category.
33Data Transformation
- Data transformation, a variation of data coding,
is a process of changing the original numerical
representation of a quantitative value to another
value. - E.g The data given is in per year consumption
and we need it for each month. -
- Data are typically changed to avoid problems in
the next stage of data analysis process.
34Data Transformation Cont.
- For example, economists often use a logarithmic
transformation so that the data are more evenly
distributed. - Data transformation is also necessary when
several questions have been used to measure a
single concept. - E.g Intentions to leave is measured through 10
questions which need to be transformed into a
single value for a single respondent
35Recap
- Questionnaire checking involves eliminating
unacceptable questionnaires. - These questionnaires may be incomplete,
instructions not followed, missing pages, past
cutoff date or respondent not qualified. - Editing looks to correct illegible, incomplete,
inconsistent and ambiguous answers. - Coding typically assigns alpha or numeric codes
to answers that do not already have them so that
statistical techniques can be applied.
36Recap Cont.
- Cleaning reviews data for consistencies.
Inconsistencies may arise from faulty logic, out
of range or extreme values. - Statistical adjustments applies to data that
requires weighting and scale transformations.