Title: Dr. Mona Hassan Ahmed Hassan
1- Dr. Mona Hassan Ahmed Hassan
- Prof. Biostatistics
2Computer Utilization
for Diabetes Epidemiology
3Statistical Software
What to do before sitting to PC?
How to generate and interpret results?
Objectives
4Data Coding
- Transformation of qualitative information into
- Numbers
- OR
- Symbols
5Data Preparation
Either the information is transferred from the
original record to a coding sheet
Ser. Ser. Ser. Ser. Ser. Ser.
Column Code Column Code Column Code
Age Age Age
Sex Sex Sex
MS MS MS
Educ.
Coding form
6Code
1
10/01/2008
25/08/1986
f
1
160
58
- ID 1
- 1. Date of Interview
- 10/1/2008
- 2. What is your date of Birth?
- 25/8/1986
- 3. What sex are you?
- Male (m) Female (f)
- 4. What is your marital status?
- Single (1) Married (2)
- Widowed (3) Divorced (4)
- 5. What is your height (cm)?160
- 6. What is your weight (kg)?58
7Coding by more than one person
- Precise instructions should be developed for
coders - Coders, must be trained
- check for inter-coder reliability
8Sorting of the questionnaires
101-200
1-100
9- Describing the Sample
- measures of central tendency and variability.
- The appropriate measure of central tendency and
variability will depend upon the variables level
of measurement and the shape of the distribution.
10Scales of measurement
11Scales of Measurement
Ali
Ramy
Samy
Nominal Symbols Assigned to
Runners Ordinal Rank Order of
Winners Interval Performance Rating on a
0 to 10 Scale Ratio Time to Finish, in
Seconds
Finish
Finish
3
7
9
15.2
14.1
13.4
12Scales of Measurement
13Shapes of Distribution
Mean Median Mode
68 within meanSD 95 within mean2SD 99 within
mean3SD
14Right-skewed distribution
Mode Median Mean
- If Mean gt Median ? Positive or right skewness
-
(long right tail) - It arises when the mean is increased by some
unusually high values
15Left-skewed distribution
Mean Median Mode
- If Mean lt Median ? Negative or left skewness
(long left tail). - Negative skewness occurs when the mean is reduced
by some extremely low values.
16Inference
- Developing and Testing a Hypothesis
- differences in frequency distributions of nominal
level variables chi-square - associations or correlations between variables,
bivariate correlations - differences between groups with respect to the
distribution of interval/ratio level data. t-tests
17The most popular statistical packages
1 SAS
2 SPSS
3 STATA
4 Epi Info
5 SUDAAN
6 S-PLUS
7 MedCalc
8 Excel
9 Statistica
10 Minitab
Sample size
18Using Epitable (Under EpiInfo) to Calculate
Sample Size
19(No Transcript)
20SPSS
Statistical
Sciences
Packages
Social
FOR
21Creating a Data File in SPSS
- ID
- Gender Male Female
- Date of Birth
- Educational Level (years)
- Employment Category 1 Clerical 2 Custodial 3
Manager - Current Salary
- Beginning Salary
- Months since Hire
- Previous Experience (months)
- Minority Classification 0 No 1 Yes
22Data Entry
- Excel
- Access
- Word
- Any Statistical software
23Data entry
24Data cleaning
- General data check
- Printout
- Quick data check (Frequency tables)
- 1- Wild codes check (invalid codes)
- 2- Completeness check ensure that
- all cases collected are
- represented in the data file
- without replication
-
25Simple frequencyData check
26- Perform Descriptive Statistics
27Descriptive
28(No Transcript)
29Conduct Simple Correlations and regression
30Correlation
31Regression
32Scatter
33t- test (Two independent groups)
34t- test (Two independent groups)
35t- test (Two independent groups)
36Paired t- test (Dependent groups)
37Chi-Square test
38Thank You