Title: CSEM03 REPLI
1CSEM03REPLI
- Research and the use of statistical tools
2Objectives
- At the end of this lecture you will be able to
- Be able to plot graphs and know your way around
SPSS - Describe the shape of normal and non-normal
distributions - Describe the characteristics of a normal and
non-normal distribution - Mode, median, mean, standard deviation
3The SPSS statistical tool
- Recommended text
- Andy Fields Field, A (2005) Discovering
Statistics using SPSS, 2nd Edition, Sage
Publishing, London (ISBN 0-7619-4452-4) - You can get SPSS for your own machine for 5 from
the LRC
4Simple statistical models mean, sum of squares,
variance and standard deviation
- Mean
- We can consider the mean of a sample as one of
the simplest statistical models because it
represents a summary of the data. - For example
- How may CDs does a group of students own?
- If we take 5 students then the numbers of CDs
owned respectively are 1,2,3,4,6. - The mean is the sum of these (?) divided by the
number of students. - Mean 16 3.2
- 5
- This is a theoretical mean, you cannot have .2 of
a CD. - How well does mean represent the data?
- What are the differences between the observed and
the mean?
5Differences between the mean and the observed data
Figure 1 Differences between observed no of CDs
and the mean
6Cancelling out the errors in the data (deviation
from the mean)
- xi - 1-3.2 -2.2
- where xi first data point x1
- so x2 - 2 3.2 -1.2 etc
- The deviances are -2.2, -1.2, -0.2, 0.8 and
2.8 - Total error ? xi - (sum of deviances) 0
7Sum of Squared errors (SS)
- So there is no total between our model and the
observed data. Some errors are negative, some are
positive but they cancel each other out. To avoid
the problem of knowing the direction of error (eg
in a large dataset) we square each error (a
negative squared becomes positive). - This is called the Sum of Squared errors (SS) and
is a good measure of the accuracy of our model.
This however depends on the amount of data
collected and the more data points the higher the
SS. To overcome this we average the error by
dividing SS by the number of observations N.
8- A more useful statistic is to use the error in
the sample to estimate the error in the
population this is done by dividing SS by the
no of observations 1. - This measure is known as the variance
9Variance and Standard Deviation
- Variance s2 SS ? (xi - )2
- N-1
N-1 - ? ( xi - )2 4.84 1.44 0.04 0.64 7.84
14.8 - ? (xi - )2 14.8 3.7
- N-1 4
- From this statistic we can derive a very useful
measure called Standard Deviation (SD) - SD ?Variance ? 3.7 1.92
10Levels of data
- The type of data you collect will depend on the
design of your study. Data can be measured on
different scales - Interval data
- These data are measured on a scale along which
intervals are equal. For example if you record
ratings of a pop video on a range of 1 to 5 the
change between each number should be equal. - Categorical data
- These are any variables that are made up of
objects/ entities. For example the UK degree
classification system comprises 1, 21, 22, 3,
pass or fail. The interval between each class is
not equal. - Nominal data
- This is where numbers can represent names e.g.
the numbers on the back of football shirts
denotes the type of player ( 1 goalkeeper). - .
- Measurement data
- The objects being studied are measured on a
quantitative scale. With discrete measurement
data only certain values are possible The data
can be discrete or continuous. Examples of
continuous measurement data are age, height,
cholesterol level. - Ordinal data
- A type of categorical data where the order is
important e.g Degree classification, seriousness
of illness.
11Median
- The median is the "Middle value" of a list. The
smallest number such that at least half the
numbers in the list are no greater than it. If
the list has an odd number of entries, the median
is the middle entry in the list after sorting the
list into increasing order. - If the list has an even number of entries, the
median is equal to the sum of the two middle
(after sorting) numbers divided by two. - The median can be estimated from a histogram by
finding the smallest number such that the area
under the histogram to the left of that number is
50
12Mode
- For lists, the mode is the most common (frequent)
value. A list can have more than one mode. In a
histogram, a mode is the most frequently
occurring interval (seen as a bump).
13Normal distribution example (scores in a test)
14Positive skew
15Standard deviation ?
Mean ?
34.1
34.1
13.6
13.6
0.1
0.1
2.1
2.1
-3?
-2?
-1?
1?
2?
3?
?
16Find the mean, median, mode, and range of these
data.
- ExampleThree dice are rolled 12 times. The sum
of the numbers after each roll is recorded
below - Numbers rolled
- 12, 11, 3 , 7, 4, 4, 17, 13, 12, 5, 8, 12
- Step 1 Rearrange the data elements.3, 4, 4, 5,
7, 8, 11, 12 ,12, 12, 13, 17Step 2 Find the
mean.Add all the numbers and divide by 12 - 116/12 9.7Step 3 Find the median.The
sample size is even, for there are 12 data
elements.The median is the average value of the
sixth and the seventh elements.median
(811)/2 9.5
17- Step 4 Find the mode.The number 3 occurs
once.The number 4 occurs twice.The number 5
occurs once.The number 7 occurs once.The number
8 occurs once.The number 11 occurs once.The
number 12 occurs three times.The number 13
occurs once.The number 17 occurs once.mode
12Step 5 Find the range.The highest value is
17.The lowest value is 3.range 17-3 14
18Examples for practice
- The weekly salaries of six employees at McDonalds
are70, 100, 90, 80, 70, 100. For these
six salaries, find (a) the mean (b) the median
(c) the mode - List the data in order 70,70,80,90,100,100
Mean 60 70 80
90 90 120 510 85
6 6
Median 60, 70, 80, 90 , 90 , 120 The two
numbers that fall in the middle need to
beaveraged. 80 90 85 - 2 Mode The number
that appears the most is 90