Course Lecture: Analytic Techniques - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Course Lecture: Analytic Techniques

Description:

Distance, speed, income The arithmetic mean Usually referred to as simply the mean in a sample is called x bar, symbol in a population is called mu ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 41
Provided by: SchoolofP54
Category:

less

Transcript and Presenter's Notes

Title: Course Lecture: Analytic Techniques


1

Course Lecture Analytic Techniques
Descriptive Statistics
2
Terminology
  • Population complete collection of elements to be
    studied
  • Sample sub-collection of elements drawn from the
    population
  • Parameters population characteristics,
  • eg. Mean, Median, Variance
  • Estimates sample statistics that estimate the
    population parameters, eg. Sample mean, sample
    median, sample variance.

3
Terminology Levels of measurement
  • Nominal
  • names, labels or categories only
  • Eg. Marital status, Gender,ethnic groups
  • Ordinal
  • can be arranged an some order
  • (but the difference between data values is
    meaningless)
  • Eg. Level of education, qualitative evaluation
    good, very good, excellent
  • Interval
  • like ordinal but the difference between values
    is meaningful
  • (however there is no zero starting point)
  • Eg. Dates
  • Ratio
  • like interval but there is a natural zero
    starting point
  • (meaning that, say, doubling the number in
    meaningful)
  • Eg. Distance, speed, income

4
MEASURES OF LOCATION
  • A measure of central tendency is probably the
    most common number used to describe data sets.
    It gives us some idea of what the average or
    middle or most occurring number in the data
    set is.
  • There are many different measures, in this unit
    we will look at only the three most important
    measures

5
MEASURES OF CENTRAL TENDENCY
1. Mean - µ or 2. Mode - most common 3.
Median - middle value i.e. the (N1)th value
when data arranged in order 2 They can be
calculated even when data has been grouped Note
Sample and population means are different.
6
The arithmetic mean
  • Usually referred to as simply the mean
  • in a sample is called x bar, symbol
  • in a population is called mu (pronounced Mew as
    in dew), symbol

7
The arithmetic mean (Cont.)
  • The mean is the sum of all the values, divided by
    the number of values. IE for a sample with n
    observations, the mean would be

8
The arithmetic mean (Cont.)
  • The sample mean is often used as an estimate of
    the population mean ?
  • Because the mean is calculated by summing every
    observation, it is greatly affected by any
    extreme values, and can as such present a
    distorted representation of the data.

9
The Median
  • When the data are not unimodal and symmetrical
    (ie skewed), the median is preferred
  • The median is the middle value when the data are
    arranged in order. IE there are an equal number
    of observations above and below the median

10
The Median (Cont.)
  • If there are an odd number of values, the median
    is the value of the middle observation.
    Otherwise it is somewhere between the two middle
    values, and generally calculated as the average
    of these two numbers

11
The Median (Cont.)
  • arrange the data in order (decreasing or
    increasing)
  • locate the middle value using the formula
  • middle value th value

12
The Mode
  • The mode is the value(s) that occurs most often
  • Useful on Nominal scale data, where it is not
    possible to calculate the mean or median
  • A distribution can have more than one mode (eg
    two modes bimodal)

13
Measures of Central tendency Example
  • Consider the following data
  • 12 34 56 34 21 23 1 19 17 12 34 53
  • Calculate the mean, median and mode

14
Measures of Central tendency Example (Cont.)
  • The mean.
  • (12345634212311917123453)
    12
  • 26.33

15
Measures of Central tendency Example (Cont.)
  • The Median
  • middle value th value
  • th 6.5th value
  • average of 6th and 7th values
  • in order
  • 1 12 12 17 19 21 23 34 34 34 53 56
  • Median (2123)/2 22

16
Measures of Central tendency Example (Cont.)
  • The mode
  • number 1 12 17 19 21 23 34 53 56
  • frequency 1 2 1 1 1 1 3 1 1
  • Therefore, the mode is 34

17
Mean, Median and Mode
  • If the distribution is exactly symmetric, the
    mean, the median and the mode are exactly the
    same.
  • If the distribution is skewed, the three measures
    differ.

18
Which one to use?
  • Different by definition
  • Mean and median are unique, and only for
    quantitative variables.
  • Mode is not unique.
  • Mode is defined for categorical variables also.
  • The choice depends on the shape of the
    distribution, the type of data and the purpose of
    your study
  • Skewed
  • Categorical
  • Total quantity

19
Measures of Dispersion
  • A second important property of a distribution is
    a measure of dispersion. IE how variable the
    data are
  • The four most commonly used measures are the
    range, variance, standard deviation and
    coefficient of variation
  • We will also look at the Inter Quartile Range

20
The Range
  • The range is simply the difference between the
    highest and lowest values in a data set
    Range xmax - xmin
  • The range however gives no indication of the
    dispersion of values between these two extreme
    values. IE there may be a lot of values clumped
    at either end of the distribution

21
The Variance
  • The two most commonly used measures which take
    into account all the data values are the variance
    and the standard deviation
  • A data set that is more variable will have a
    larger variance than a data set that which is
    relatively homogeneous
  • The variance is the sum of the squared deviations
    divided by the number of observations

22
The variance (Cont.)
  • Consider these data 5, 17, 12, 10
  • The mean of the data is
  • (5171210)/4 11
  • A deviation is the distance of each observation
    from the mean

23
The variance (Cont.)
Deviations
  • For these data, the deviations are
  • 5 - 11 -6
  • 10 - 11 -1
  • 12 - 11 1
  • 17 - 11 6

24
The variance (Cont.)
  • We are interested in the squared deviations, so
    the numbers are squared
  • Number Deviation Squared deviation
  • 5 -6 36
  • 10 -1 1
  • 12 1 1
  • 17 6 36

25
The variance (Cont.)
  • The squared deviations are then summed and
    divided by the number of observations to give the
    variance
  • Variance (36 1 1 36) / 4
  • 18.5
  • The variance is hence the average squared
    deviation of the data

26
The variance (Cont.)
  • For a population, the variance is notated by and
    the formula

27
The variance (Cont.)
  • For a sample, the Variance is notated by s2 and
    given by the formula
  • Note the subtle difference in these two formulas.
    Your calculator can calculate both these numbers
    in a matter of seconds

28
The Standard Deviation
  • The standard deviation is simple the ve square
    root of the variance. Hence for a population the
    standard deviation is and for s a sample,
    IE
  • The standard deviation is in the same units as
    the mean.

29
The coefficient of variation
  • The coefficient of variation (CV) is a relative
    measure of variability which has no units and is
    generally expressed in terms of a percentage
  • It is used for comparing data that are not
    measured using the same units, or when comparing
    data with quite different means
  • It is simply the standard deviation divided by
    the mean

30
The coefficient of variation (Cont.)
  • The CV can only be calculated on data collected
    at the ratio level

31
The Quartiles
  • We can improve the description by also looking at
    the middle half of the data
  • Recall that the Median is the middle value of the
    data set. IE the value that 50 of observations
    are greater than and 50 of observations are less
    than
  • The quartiles are calculated in a similar fashion

32
The Quartiles (Cont.)
  • The first quartile lies one quarter of the way
    through the data. IE One quarter of the data
    values are less than the first quartile
  • The third quartile lies three quarters of the way
    through the data. IE Three quarters of the data
    values are less than the third quartile

33
The Quartiles (Cont.)
  • EG Consider the following data (ordered)
  • 2 3 5 9 12 17 23 29 31 32
    35
  • There are 11 values, so the median is the 6th
    value, in this case 17. The first quartile is
    the middle value of the observations below the
    median,
  • 2 3 5 9 12

34
The Quartiles (Cont.)
  • The third quartile is the middle value of the
    observations above the median,
  • 23 29 31 32 35
  • So, the data with Q1, M and Q3 are
  • 2 3 5 9 12 17 23 29 31 32
    35
  • And the middle 50 of data lie between Q1 and Q3.
    In this case, between 5 and 31

35
The Quartiles (Cont.)
  • The difference between the 1st and 3rd quartiles
    is called the Inter Quartile Range

Inter Quartile Range Q3-Q1
31-5 26
36
Approximate statistics for grouped data
  • When the data are given in a frequency
    distribution table, we cannot calculate the exact
    mean and standard deviation
  • We can however calculate the approximate values

37
Statistics for grouped data
  • For the mean
  • and for the variance

38
Statistics for grouped data eg.
  • Consider the following frequency table
  • Class Interval Frequency
  • 1 2 - 5 3
  • 2 5 - 8 6
  • 3 8 - 11 8
  • 4 11 - 14 7
  • 5 14 - 17 4
  • 6 17 - 20 2

39
Grouped data example (Cont.)
  • Class Interval Frequency Midpoint fimi
    fimi2
  • 1 2 - 5 3 3.5 10.5 36.75
  • 2 5 - 8 6 6.5 39 253.5
  • 3 8 - 11 8 9.5 76 722
  • 4 11 - 14 7 12.5 87.5 1093.75
  • 5 14 - 17 4 15.5 62 961
  • 6 17 - 20 2 18.5 37 684.5
  • totals 30 312 3751.5

40
Grouped data example (Cont.)
  • 17.47
  • s 4.18
Write a Comment
User Comments (0)
About PowerShow.com