4' Summary Statistics - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

4' Summary Statistics

Description:

10 people were asked how many cups of tea they drank yesterday: Person 1 2 3 4 5 6 7 8 9 10 ... Example: Cups of Tea. 3 - 7 / 16. Semi-Inter-Quartile Range ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 17
Provided by: zhong8
Category:

less

Transcript and Presenter's Notes

Title: 4' Summary Statistics


1
4. Summary Statistics
  • Reading RB 50-70 R 82-89
  • In section 3 we represented the data in some
    graphical forms.
  • This sometimes involved some summarisation
  • In this section we look at some summary measures
    of the data, i.e. measures that says something
    important about the data but do not represent the
    full data
  • The following are some example summary
    statistics
  • Average the average age of patients undergoing
    heart operations is 57
  • Range each operating theatre is used between 4
    and 11 times each day
  • Maximum nurses in Antrim hospital earn a maximum
    of 32,000 a year
  • The main summaries used are the average and the
    spread.

2
4.1 Average (Measure of Location)
  • The term "average" refers to any measure of
    location
  • There are 2 commonly used measures of location
    the mean and the median
  • Mean
  • sum all the data values and divide by the total
    number of data items in the sample
  • Median
  • arrange the data in order of size and pick the
    middle value
  • If the number of data items is odd, pick the
    middle data item value as the median
  • if the number of data items is even, pick the
    mean of the two middle values as the median
  • Mode
  • The sample mode is defined as the value which
    occurs with the highest frequency. Modes may not
    be unique.

3
4.1 Average (Measure of Location)
  • Examples of Mean and Median
  • 10 people were asked how many cups of tea they
    drank yesterday
  • Person 1 2 3 4 5
    6 7 8 9 10
  • No. cups 2 3 1 5 3
    3 8 0 2 1
  • Mean (2315338021)/10 2.8
  • Median
  • Order the data 0 1 1 2 2 3 3 3 5 8
  • Median the average of the 5th and 6th
    (23)/2 2.5
  • Note It only makes sense to calculate the mean
    or median of numerical data, e.g. can you talk
    about the average country of origin of patients?

4
4.1 Average (Measure of Location)
  • When to choose the mean and when the median?
  • Symmetric sample ? mean
  • Skewed sample ? median
  • Why should you not use the mean when the data is
    skewed? Consider calculating the mean salary in a
    private nursing home where the owner pays himself
    110,000 a year and the other 9 employees only
    get 10,000 each. Does it make sense to say that
    on average people in the nursing home earn this
    much? What is the median?
  • Mean (110000100009)/10 20,000
  • Median 10000
  • If you have doubt about whether a sample is
    sufficiently symmetric, a safer option would be
    to use median.

5
4.2 Spread
  • Spread
  • is a measure of how "spread out" the data is
    about the average.
  • Samples with the same average may have very
    different spreads.
  • Here are two measures of spread
  • If we use the mean
  • Symmetric sample ? Standard Deviation (SD)
  • If we use the median
  • Skewed sample ? Semi-Inter-Quartile Range (SIQR)
  • SD is the partner of the mean, and SIQR is the
    partner of the median

6
Standard Deviation (SD)
  • To calculate SD
  • calculate mean
  • subtract mean from each data value (x-mean)
  • square each difference (x-mean)2
  • sum the squares of differences ? (x - mean )2
  • divide by (sample size - 1) ? (x - mean )2
    (This is variance)
  • take square root
  • Example Cups of Tea

7
Semi-Inter-Quartile Range (SIQR)
  • To calculate SIQR
  • place values in order of size
  • determine the two inter-quartile values Q1 and Q3
    (Q2 is median)
  • SIQR is (Q3-Q1) / 2
  • Example Sups of Tea
  • Data in order 0 1 1 2 2 3 3 3 5 8
  • Q2 2.5, Q1 1 Q3 3 --gt SIQR (3-1)/2 1
  • Cups of Tea example summary
  • Median 2.5 SIQR 1
  • Mean 2.8 SD 2.29

8
4.2 Spread
  • Comments
  • Note that extreme values affect the SD much more
    than the SIQR. Also, extreme values affect the
    mean more than the median
  • If the extreme value 8 is omitted, the new mean
    and median should be quite close. What are they?
  • Mean 2.22 median 2
  • SD sqrt (2.57) 1.60 SIQR 1
  • if the extreme values 8 and 0 are omitted
  • mean2.5 median2.5 SD sqrt(1.82) 1.35
    SIQR0.75
  • Note It never makes sense to calculate an
    average or spread for nominal data.

9
5. Cross-tabulation and Data Coding
  • 5.1 Cross-tabulation
  • Sometimes your sample may contain different
    groups, e.g. you measure salary grades of men and
    women. You can view this as simply a set of
    grades but you may instead be interested in
    examining the data values for the groups
    separately, e.g. to see if there is a difference
    between the grades of men and women.
  • Of course to look at the data in this way you
    must have recorded for each grade whether it was
    a man or woman. Essentially then you have
    measured two variables for each individual in the
    sample
  • sex (male or female) - a nominal variable
  • grade (1, 2, 3 or 4) - an ordinal variable
  • A cross-tabulation is simply a table listing the
    frequencies of each data value for each group,
    e.g. the number of people at each grade for each
    sex. (summarising data)

10
5.1 Cross-tabulation
  • For example the cross-tabulation might look like
  • 1 2 3 4 total
  • male 18 12 15 15 60
  • female 26 3 1 0 30
  • total 44 15 16 15 90
  • Notice where the row and column totals come from.
  • Note that there should usually only be few values
    that each variable can take.
  • According to this table
  • how many females are in either grade 2 or 3? 4
  • what percentage of males are in grade 1?
    18/44
  • what overall percentage of staff are in grade 4?
    15/90

11
5.2 Expected values
  • Note how the totals are calculated from the
    frequencies.
  • Suppose we look at the totals by themselves for a
    moment, e.g.
  • 1 2 3 4 total
  • male 60
  • female 30
  • total 44 15 16 15 90
  • Suppose that we were just given this information
    and we were asked how many males at grade 2 we
    would expect there to be.
  • How can you calculate your expectations?

12
5.2 Expected values
  • We might think about this as follows
  • Since there are twice as many men as women and
    there are a total of 15 at grade 2 we would
    expect that 10 of them are men and 5 are women.
  • For any position in the table you get the
    expected value for a particular row and column
    position from
  • row total X column total
  • overall total
  • Use the above formula to calculate all the
    expected cell values
  • Check that you get the same number in the "males
    at grade 2" position that we calculated above.

13
5.2 Expected values
  • We can include our expected values with the
    observed values in the table (expected values in
    italics)
  • 1 2 3 4 Total
  • male 18 12 15 15 60
  • 29.33 10.00 10.67 10.00
  • female 26 3 1 0 30
  • 14.67 5.00 5.33 5.00
  • Total 44 15 16 15 90
  • Observations and Expectations can be compared
  • From this cross-tabulation with expected values
    we can see that more males than expected are at
    the higher grades and correspondingly less women
    than expected are at the higher grades.

14
5.3 Coding values to form groups
  • An example from the paper by Rukhholm et. Al
  • A survey involved asking many patients 20
    questions about how anxious they were about... .
    The answers were coded as follows
  • 1 Not at all, to, 4 Very much
  • Combining the answers gave a scale from 20 to 80
    for anxiety. The sex and age of each patient was
    also recorded.
  • The sample was first divided into different age
    groups by coding the ages as follows
  • MTBgt CODE (1834)1 (3551)2 (5268)3 (6985)4 c1
    c6
  • The means and SDs of each age group was then
    calculated in order to examine if the anxiety
    level was different for different age groups.
  • Next each of these 4 age groups was divided into
    a male and female group. This gives 8 groups in
    all.
  • Male18-34 Female18-34 Male36-51 ..
  • This allowed the mean anxiety level of each age
    group for a given sex to be compared with another
    such group.

15
5.3 Coding values to form groups
  • Another example
  • Suppose we have the following data
  • Age BScN(1)/BScPDN(2)
  • 20 1
  • 23 1
  • 32 2
  • 28 2
  • 24 1
  • This gives us a BScN group with data 20, 23 and
    24 and a BScPDN group with data 32 and 28.
  • Supose we divide the ages into categories less
    than 26 and more than 26. (Collapsing for
    summarising)
  • Draw the cross-tabulation table (including
    expected frequencies) that you get from this data.

16
5.3 Coding values to form groups
  • An example (cont.)
  • BScN BSCPDN Total
  • Age lt 26 3 0 3
  • Age ? 26 0 2 2
  • Total 3 2 5
  • Notice how we can take a ratio variable (such as
    age above) and by coding it into groups it can be
    treated as an ordinal variable. After being
    grouped, age gave rise to a new variable with
    values 1 and 2 (corresponding to less than 26 and
    greater than 26).
Write a Comment
User Comments (0)
About PowerShow.com