CSEM03 REPLI

About This Presentation

Title:

CSEM03 REPLI

Description:

Be able to plot graphs and know your way around SPSS ... on the back of football shirts denotes the type of player ( 1 = goalkeeper). Measurement data ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 19

Provided by: lynnehu

Category:

more less

Transcript and Presenter's Notes

Title: CSEM03 REPLI

1
CSEM03REPLI

Research and the use of statistical tools

2
Objectives

At the end of this lecture you will be able to
Be able to plot graphs and know your way around
SPSS
Describe the shape of normal and non-normal
distributions
Describe the characteristics of a normal and
non-normal distribution
Mode, median, mean, standard deviation

3
The SPSS statistical tool

Recommended text
Andy Fields Field, A (2005) Discovering
Statistics using SPSS, 2nd Edition, Sage
Publishing, London (ISBN 0-7619-4452-4)
You can get SPSS for your own machine for 5 from
the LRC

4
Simple statistical models mean, sum of squares,
variance and standard deviation

Mean
We can consider the mean of a sample as one of
the simplest statistical models because it
represents a summary of the data.
For example
How may CDs does a group of students own?
If we take 5 students then the numbers of CDs
owned respectively are 1,2,3,4,6.
The mean is the sum of these (?) divided by the
number of students.
Mean 16 3.2
5
This is a theoretical mean, you cannot have .2 of
a CD.
How well does mean represent the data?
What are the differences between the observed and
the mean?

5
Differences between the mean and the observed data
Figure 1 Differences between observed no of CDs
and the mean
6
Cancelling out the errors in the data (deviation
from the mean)

xi - 1-3.2 -2.2
where xi first data point x1
so x2 - 2 3.2 -1.2 etc
The deviances are -2.2, -1.2, -0.2, 0.8 and
2.8
Total error ? xi - (sum of deviances) 0

7
Sum of Squared errors (SS)

So there is no total between our model and the
observed data. Some errors are negative, some are
positive but they cancel each other out. To avoid
the problem of knowing the direction of error (eg
in a large dataset) we square each error (a
negative squared becomes positive).
This is called the Sum of Squared errors (SS) and
is a good measure of the accuracy of our model.
This however depends on the amount of data
collected and the more data points the higher the
SS. To overcome this we average the error by
dividing SS by the number of observations N.

A more useful statistic is to use the error in
the sample to estimate the error in the
population this is done by dividing SS by the
no of observations 1.
This measure is known as the variance

9
Variance and Standard Deviation

Variance s2 SS ? (xi - )2
N-1
N-1
? ( xi - )2 4.84 1.44 0.04 0.64 7.84
14.8
? (xi - )2 14.8 3.7
N-1 4
From this statistic we can derive a very useful
measure called Standard Deviation (SD)
SD ?Variance ? 3.7 1.92

10
Levels of data

The type of data you collect will depend on the
design of your study. Data can be measured on
different scales
Interval data
These data are measured on a scale along which
intervals are equal. For example if you record
ratings of a pop video on a range of 1 to 5 the
change between each number should be equal.
Categorical data
These are any variables that are made up of
objects/ entities. For example the UK degree
classification system comprises 1, 21, 22, 3,
pass or fail. The interval between each class is
not equal.
Nominal data
This is where numbers can represent names e.g.
the numbers on the back of football shirts
denotes the type of player ( 1 goalkeeper).
.
Measurement data
The objects being studied are measured on a
quantitative scale. With discrete measurement
data only certain values are possible The data
can be discrete or continuous. Examples of
continuous measurement data are age, height,
cholesterol level.
Ordinal data
A type of categorical data where the order is
important e.g Degree classification, seriousness
of illness.

11
Median

The median is the "Middle value" of a list. The
smallest number such that at least half the
numbers in the list are no greater than it. If
the list has an odd number of entries, the median
is the middle entry in the list after sorting the
list into increasing order.
If the list has an even number of entries, the
median is equal to the sum of the two middle
(after sorting) numbers divided by two.
The median can be estimated from a histogram by
finding the smallest number such that the area
under the histogram to the left of that number is
50

12
Mode

For lists, the mode is the most common (frequent)
value. A list can have more than one mode. In a
histogram, a mode is the most frequently
occurring interval (seen as a bump).

13
Normal distribution example (scores in a test)
14
Positive skew
15
Standard deviation ?
Mean ?
34.1
34.1
13.6
13.6
0.1
0.1
2.1
2.1
-3?
-2?
-1?
1?
2?
3?
?
16
Find the mean, median, mode, and range of these
data.

ExampleThree dice are rolled 12 times. The sum
of the numbers after each roll is recorded
below
Numbers rolled
12, 11, 3 , 7, 4, 4, 17, 13, 12, 5, 8, 12
Step 1 Rearrange the data elements.3, 4, 4, 5,
7, 8, 11, 12 ,12, 12, 13, 17Step 2 Find the
mean.Add all the numbers and divide by 12
116/12 9.7Step 3 Find the median.The
sample size is even, for there are 12 data
elements.The median is the average value of the
sixth and the seventh elements.median
(811)/2 9.5

Step 4 Find the mode.The number 3 occurs
once.The number 4 occurs twice.The number 5
occurs once.The number 7 occurs once.The number
8 occurs once.The number 11 occurs once.The
number 12 occurs three times.The number 13
occurs once.The number 17 occurs once.mode
12Step 5 Find the range.The highest value is
17.The lowest value is 3.range 17-3 14

18
Examples for practice

The weekly salaries of six employees at McDonalds
are70, 100, 90, 80, 70, 100. For these
six salaries, find (a) the mean (b) the median
(c) the mode
List the data in order 70,70,80,90,100,100

                             Mean 60 70 80
90 90 120 510 85
              6                         6
Median    60, 70, 80, 90 , 90 , 120 The two
numbers that fall in the middle need to
beaveraged.     80 90 85
2 Mode The number
that appears the most is 90