Understanding Variability

About This Presentation

Title:

Understanding Variability

Description:

Title: PowerPoint Presentation Author: Ron Kenett Last modified by: Maskit Rubinstein Created Date: 9/15/2000 6:51:24 AM Document presentation format – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 56

Provided by: RonK166

Category:

more less

Transcript and Presenter's Notes

Title: Understanding Variability

1
Understanding Variability
Instructor Ron S. Kenett Email
ron_at_kpa.co.il Course Website www.kpa.co.il/biosta
t Course textbook MODERN INDUSTRIAL
STATISTICS, Kenett and Zacks, Duxbury Press, 1998
2
Course Syllabus

Understanding Variability
Variability in Several Dimensions
Basic Models of Probability
Sampling for Estimation of Population Quantities
Parametric Statistical Inference
Computer Intensive Techniques
Multiple Linear Regression
Statistical Process Control
Design of Experiments

3
Discrete Data A set of data is said to be
discrete if the values / observations belonging
to it are distinct and separate. That is, they
can be counted (1,2,3,.......). For example, the
number of kittens in a litter the number of
patients in a doctors surgery the number of
flaws in one metre of cloth gender (male,
female) blood group (O, A, B, AB).
4
Continuous Data A set of data is said to be
continuous if the values / observations
belonging to it may take on any value within a
finite or infinite interval. You can count,
order and measure continuous data. For example,
height weight temperature the amount of sugar
in an orange the time required to run a mile.
5
Types of Variables

Qualitative Variables
Attributes, categories
Examples male/female, registered to vote/not,
ethnicity, eye color....
Quantitative Variables
Discrete - usually take on integer values but can
take on fractions when variable allows - counts,
how many
Continuous - can take on any value at any point
along an interval - measurements, how much

6
Self Assessment Test
For each of the following, indicate whether
the appropriate variable would be qualitative or
quantitative. If the variable is quantitative,
indicate whether it would be discrete or
continuous.
7
Self Assessment Test

a) Whether you own an RCA Colortrak television
set
b) Your status as a full-time or a part-time
student
c) Number of people who attended your schools
graduation last year

Qualitative Variable
two levels yes/no
no measurement
Qualitative Variable
two levels full/part
no measurement
Quantitative, Discrete Variable
a countable number
only whole numbers

8
Self Assessment Test

d) The price of your most recent haircut
e) Sams travel time from his dorm to the Student
Union

Quantitative, Discrete Variable
a countable number
only whole numbers
Quantitative, Continuous Variable
any number
time is measured
can take on any value greater than zero

9
Self Assessment Test

f) The number of students on campus who belong to
a social fraternity or sorority

Quantitative, Discrete Variable
a countable number
only whole numbers

10
Scales of Measurement

Nominal Scale - Labels represent various levels
of a categorical variable.
Ordinal Scale - Labels represent an order that
indicates either preference or ranking.
Interval Scale - Numerical labels indicate order
and distance between elements. There is no
absolute zero and multiples of measures are not
meaningful.
Ratio Scale - Numerical labels indicate order and
distance between elements. There is an absolute
zero and multiples of measures are meaningful.

11
Self Assessment Test
Bill scored 1200 on the Scholastic Aptitude Test
and entered college as a physics major. As a
freshman, he changed to business because he
thought it was more interesting. Because he made
the deans list last semester, his parents gave
him 30 to buy a new Casio calculator. Identify
at least one piece of information in the
12
Self Assessment Test

a) nominal scale of measurement.

1. Bill is going to college.
2. Bill will buy a Casio
calculator.
3. Bill was a physics major.
4. Bill is a business major.
5. Bill was on the deans list.

13
Self Assessment Test

b) ordinal scale of measurement
c) interval scale of measurement
d) ratio scale of measurement

Bill is a freshman.
Bill earned a 1200 on the SAT.
Bills parents gave him 30.

14
Self Assessment Test

b) ordinal scale of measurement
c) interval scale of measurement
d) ratio scale of measurement

Bill is a freshman.
Bill earned a 1200 on the SAT.
Bills parents gave him 30.

15
Histogram A histogram is a way of summarising
data that are measured on an interval scale
(either discrete or continuous). It is often used
in exploratory data analysis to illustrate the
major features of the distribution of the data
in a convenient form. It divides up the range of
possible values in a data set into classes or
groups. For each group, a rectangle is
constructed with a base length equal to the
range of values in that specific group, and an
area proportional to the number of observations
falling into that group. This means that the
rectangles might be drawn of non-uniform height.
16
Key Terms

Data array
An orderly presentation of data in either
ascending or descending numerical order.
Frequency Distribution
A table that represents the data in classes and
that shows the number of observations in each
class.

17
Key Terms

Frequency Distribution
Class - The category
Frequency - Number in each class
Class limits - Boundaries for each class
Class interval - Width of each class
Class mark - Midpoint of each class

18
Sturges Rule

How to set the approximate number of classes to
begin constructing a frequency distribution.
where k approximate number of classes to use
and
n the number of observations in the data set .

19
Frequency Distributions
1. Number of classes Choose an approximate
number of classes for your data. Sturges rule
can help. 2. Estimate the class interval
Divide the approximate number of classes (from
Step 1) into the range of your data to find the
approximate class interval, where the range is
defined as the largest data value minus the
smallest data value. 3. Determine the class
interval Round the estimate (from Step 2) to a
convenient value.
20
Frequency Distributions
4. Lower Class Limit Determine the lower class
limit for the first class by selecting a
convenient number that is smaller than the lowest
data value. 5. Class Limits Determine the other
class limits by repeatedly adding the class width
(from Step 2) to the prior class limit, starting
with the lower class limit (from Step 3). 6.
Define the classes Use the sequence of class
limits to define the classes.
21
Relative Frequency Distributions
1. Retain the same classes defined in the
frequency distribution. 2. Sum the total number
of observations across all classes of the
frequency distribution. 3. Divide the frequency
for each class by the total number of
observations, forming the percentage of data
values in each class.
22
Cumulative Relative Frequency Distributions
1. List the number of observations in the lowest
class. 2. Add the frequency of the lowest class
to the frequency of the second class. Record
that cumulative sum for the second class. 3.
Continue to add the prior cumulative sum to the
frequency for that class, so that the cumulative
sum for the final class is the total number of
observations in the data set.
23
Cumulative Relative Frequency Distributions

4. Divide the accumulated frequencies for each
class by the total number of observations --
giving you the percent of all observations that
occurred up to an including that class.
An Alternative Accrue the relative frequencies
for each class instead of the raw frequencies.
Then you dont have to divide by the total to get
percentages.

24
Example

The average daily cost to community hospitals for
patient stays during 1993 for each of the 50 U.S.
states was given in the next table.
a) Arrange these into a data array.
b) Construct a stem-and-leaf display.
) Approximately how many classes would be
appropriate for these data?
c d) Construct a frequency distribution. State
interval width and class mark.
e) Construct a histogram, a relative frequency
distribution, and a cumulative relative frequency
distribution.

25
Example Data List
AL 775 HI 823 MA 1,036 NM 1,046 SD
506 AK 1,136 ID 659 MI 902 NY
784 TN 859 AZ 1,091 IL 917 MN 652 NC
763 TX 1,010 AR 678 IN 898 MS
555 ND 507 UT 1,081 CA 1,221 IA
612 MO 863 OH 940 VT 676 CO
961 KS 666 MT 482 OK 797 VA 830 CT
1,058 KY 703 NE 626 OR 1,052 WA
1,143 DE 1,024 LA 875 NV 900 PA
861 WV 701 FL 960 ME 738 NH 976 RI
885 WI 744 GA 775 MD 889 NJ
829 SC 838 WY 537
26
Example Data Array
CA 1,221 TX 1,010 RI 885 NY 784 KS
666 WA 1,143 NH 976 LA 875 AL 775 ID
659 AK 1,136 CO 961 MO 863 GA 775 MN
652 AZ 1,091 FL 960 PA 861 NC 763 NE
626 UT 1,081 CH 940 TN 859 WI 744 IA
612 CT 1,058 IL 917 SC 838 ME
738 MS 555 OR 1,052 MI 902 VA 830 KY
703 WY 537 NM 1,046 NV 900 NJ 829 WV
701 ND 507 MA 1,036 IN 898 HI 823 AR
678 SD 506 DE 1,024 MD 889 OK
797 VT 676 MT 482
27
Example Stem and Leaf Display
Stem-and-Leaf Display N 50 Leaf Unit 100
1 12 21 2 11 43, 36 8 10 91, 81, 58, 52,
46, 36, 24, 10 7 9 76, 61, 60, 40, 17, 02,
00 (11) 8 98, 89, 85, 75, 63, 61, 59, 38, 30,
29, 23 9 7 97, 84, 75, 75, 63, 44, 38, 03,
01 7 6 78, 76, 66, 59, 52, 26, 12 4
5 55, 37, 07, 06 1 4 82 Range 482 -
1,221
28
Example Frequency Distribution

To approximate the number of classes we should
use in creating the frequency distribution, use
Sturges Rule, n 50
Sturges rule suggests we use approximately 7
classes.

29
Example Frequency Distribution

Step 1. Number of classes
Sturges Rule approximately 7 classes.
The range is 1,221 482 739
739/7 106 and 739/8 92
Steps 2 3. The Class Interval
So, if we use 8 classes, we can make each class
100 wide.

30
Example Frequency Distribution

Step 1. Number of classes
Sturges Rule approximately 7 classes.
The range is 1,221 482 739
739/7 106 and 739/8 92
Steps 2 3. The Class Interval
So, if we use 8 classes, we can make each class
100 wide.

31
Example Frequency Distribution

Step 4. The Lower Class Limit
If we start at 450, we can cover the range in 8
classes, each class 100 in width.
The first class 450 up to 550
Steps 5 6. Setting Class Limits
450 up to 550 850 up to 950
550 up to 650 950 up to 1,050
650 up to 750 1,050 up to 1,150
750 up to 850 1,150 up to 1,250

32
Example Frequency Distribution
Average daily cost Number Mark 450
under 550 4 500 550 under 650
3 600 650 under 750 9 700 750
under 850 9 800 850 under 950
11 900 950 under 1,050 7
1,000 1,050 under 1,150 6
1,100 1,150 under 1,250 1
1,200 Interval width 100
33
Example Histogram
34
Example Relative Frequency Distribution
Average daily cost Number Rel. Freq.
450 under 550 4 4/50 .08 550
under 650 3 3/50 .06 650 under 750
9 9/50 .18 750 under 850
9 9/50 .18 850 under 950 11
11/50 .22 950 under 1,050
7 7/50 .14 1,050 under 1,150 6 6/50
.12 1,150 under 1,250 1 1/50 .02
35
Example Polygon
36
Example Cumulative Frequency Distribution
Average daily cost Number Cum. Freq.
450 under 550 4 4 550 under 650
3 7 650 under 750 9 16 750
under 850 9 25 850 under 9
11 36 950 under 1,050 7 43 1,050
under 1,150 6 49 1,150 under 1,250 1 50
37
Example Cumulative Relative Frequency
Distribution
Average daily cost Cum.Freq.
Cum.Rel.Freq. 450 under 550 4 4/50
.02 550 under 650 7 7/50 .14 650
under 750 16 16/50 .32 750
under 850 25 25/50 .50 850
under 950 36 36/50 .72 950
under 1,050 43 43/50 .86 1,050
under 1,150 49 49/50 .98 1,150
under 1,250 50 50/50 1.00
38
Example Percentage Ogive
39
Statistical Description of Data
40
Key Terms

Measures of Central Tendency,
The Center

Mean
µ, population , sample
Weighted Mean
Median
Mode

41
Key Terms

Measures of Dispersion,
The Spread

Range
Mean absolute deviation
Variance
Standard deviation
Interquartile range
Interquartile deviation
Coefficient of variation

42
Key Terms

Measures of Relative Position

Quantiles
Quartiles
Deciles
Percentiles
Residuals
Standardized values

43
The Mean

Mean
Arithmetic average (sum all values)/ of values
Population µ (Sxi)/N
Sample (Sxi)/n
Problem Calculate the average number of truck
shipments from the United States to five Canadian
cities for the following data given in thousands
of bags
Montreal, 64.0 Ottawa, 15.0 Toronto, 285.0
Vancouver, 228.0 Winnipeg, 45.0
(Ans 127.4)

44
The Weighted Mean

When what you have is grouped data, compute the
mean using µ (Swixi)/Swi
Problem Calculate the average profit from truck
shipments, United States to Canada, for the
following data given in thousands of bags and
profits per thousand bags
Montreal 64.0 Ottawa 15.0 Toronto 285.0
15.00 13.50
15.50
Vancouver 228.0 Winnipeg 45.0
12.00 14.00
(Ans 14.04 per thous. bags)

45
The Median

To find the median
1. Put the data in an array.
2A. If the data set has an ODD number of numbers,
the median is the middle value.
2B. If the data set has an EVEN number of
numbers, the median is the AVERAGE of the middle
two values.
(Note that the median of an even set of data
values is not necessarily a member of the set of
values.)
The median is particularly useful if there are
outliers in the data set, which otherwise tend to
sway the value of an arithmetic mean.

46
The Mode

The mode is the most frequent value.
While there is just one value for the mean and
one value for the median, there may be more than
one value for the mode of a data set.
The mode tends to be less frequently used than
the mean or the median.

47
Comparing Measures of Central Tendency

If mean median mode, the shape of the
distribution is symmetric.
If mode lt median lt mean or if mean gt median gt
mode,
the shape of the distribution trails to the
right,
is positively skewed.
If mean lt median lt mode or if mode gt median gt
mean,
the shape of the distribution trails to the
left,
is negatively skewed.

48
The Range

The range is the distance between the smallest
and the largest data value in the set.
Range largest value smallest value
Sometimes range is reported as an interval,
anchored between the smallest and largest data
value, rather than the actual width of that
interval.

49
Residuals

Residuals are the differences between each data
value in the set and the group mean
for a population, xi µ
for a sample, xi

50
The MAD

The mean absolute deviation is found by summing
the absolute values of all residuals and dividing
by the number of values in the set
for a population, MAD (Sxi µ)/N
for a sample, MAD (Sxi )/n

51
The Variance

Variance is one of the most frequently used
measures of spread,
for population,
for sample,
The right side of each equation is often used as
a computational shortcut.

52
The Standard Deviation

Since variance is given in squared units, we
often find uses for the standard deviation, which
is the square root of variance
for a population,
for a sample,

53
Quartiles

One of the most frequently used quantiles is the
quartile.
Quartiles divide the values of a data set into
four subsets of equal size, each comprising 25
of the observations.
To find the first, second, and third quartiles
1. Arrange the N data values into an array.
2. First quartile, Q1 data value at position (N
1)/4
3. Second quartile, Q2 data value at position
2(N 1)/4
4. Third quartile, Q3 data value at position
3(N 1)/4

54
Quartiles
55
Standardized Values

How far above or below the individual value is
compared to the population mean in units of
standard deviation
How far above or below (data value mean)
which is the residual...
In units of standard deviation divided by s
Standardized individual value
A negative z means the data value falls below
the mean.

Write a Comment

User Comments (0)