Title: AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH
1AAEC 4302ADVANCED STATISTICAL METHODS IN
AGRICULTURAL RESEARCH
- Part I A Quick Review of Familiar Topics
- Chapter 4 Frequency Distributions
2Introduction
- Chapter 4 examines grouping and classification
techniques for organizing and summarizing data on
a given variable - There are a few but important differences in the
techniques applied to discrete vs. continuous
variables
3Discrete Data Variables
- Suppose that n observations are available on a
discrete variable X that only takes m different
values - The n observations can be divided into m
different groups within which all of the
observations have exactly the same value
4Discrete Data Variables
- These m different groups can be indexed by k, so
that k 1,,m - Example Family size variable on Table 2.2 (p.
18) - X ranges from 2 to 10, m 9 (k 1,,9), the
100 observations can be arranged into nine
groups, one for each family size, value observed
in the sample
5Discrete Data Variables
- Table 4.1 (p. 63) is an example of a Frequency
Table for the variable family size
6Discrete Data Variables
- Xk (column 2) denotes the value of the variable
X (family size) taken by all of the observations
grouped into category k (for example X3 4
indicates that the third category groups all
observations with a value -family size- of 4)
7Discrete Data Variables
- nk (column 3) denotes the absolute frequency
which is the number of observations that fall on
that category (for example n3 34 denotes the
of families in the sample with 4 family members)
8Discrete Data Variables
- Notice that nk n (the sample size)
- fk nk/n (column 4) is the relative frequency or
the proportion of the total observations that
fall on that category (for example f3 0.34
denotes the proportion of families in the sample
with 4 family members)
9Discrete Data Variables
- Notice that fk 1
- Can you prove it?
10Discrete Data Variables
- Figure 4.1 (p.64) is a conventional graph of the
relative frequency distribution of the discrete
variable family size (X) it is a graph of the
value of fk for each value of Xk
11Discrete Data Variables
- Family size (Xk) is in the horizontal axis, and
the corresponding relative frequency (fk) on the
vertical axis
12Discrete Data Variables
- This is a summary graphical representation of
the sample data where the only information lost
is the sample size n - How do you read the graph?
13Discrete Data Variables
- Since the relative frequency distribution
implicitly orders the values of X, the median
(Xmed) is easily found by examining the
cumulative frequencies
14Discrete Data Variables
For example, in Table 4.1 10 of the obs. are
in grp 1, where X1 2 16 of the obs. are in grp
2, where X2 3 34 of the obs. are in grp 3,
where X3 4 Obviously the middle observation
(50th) is in group 3, and its value (family size)
equal to 4
15Discrete Data Variables
- Clearly the mode or most frequently occurring
family size is also 4 in this case
16Discrete Data Variables
- The mean can also be calculated from a frequency
table instead of the raw data
17Discrete Data Variables
- The standard deviation (SX) can also be
calculated from a frequency table
18Discrete Data Variables
- Columns 5 and 6 shows the steps for calculating
X and SX from a frequency table
19Continuous Data Variables
- The formerly discussed approach does not make
sense with continuous variables, since there will
likely be one group for each observation - Instead m class intervals are created, so that
each observation can be placed into only one of
them - The 3 principles to follow when creating these
intervals are - The number of classes (m) should be between 5
15 - The range (width) of each interval should be the
same - The mean point of each interval should be a
convenient number
20Continuous Data Variables
- Example Let X be family income from Table 2.2
- X ranges from 0.75 to 32.08 thousand dollars
- Lets set up m 9 intervals starting from 0 with
an interval range of 4.0 (4,000 dollars) - Observations that lie right on the boundary
between two classes should be divided between
lower and higher classes
21Continuous Data Variables
- Table 4.2 is the so constructed frequency table
for family income - Column 1 includes the class interval index k and
the boundary values of X that define the class
22Continuous Data Variables
- In the second column is the class mark (Xk) which
is defined as the mid-point of the class
interval - In the third column is the absolute frequency
(nk) which, as before, is the number of
observations in the sample whose value falls in
the kth class (i.e. within the boundaries of the
kth class interval)
23Continuous Data Variables
- In the fourth column is the relative frequency
(fk) which, as before, is the proportion of
observations in the sample whose value falls in
the kth class (i.e. within the boundaries of the
kth class interval) - Also as before, nk n, fk nk/n, fk 1
24Continuous Data Variables
- Figure 4.2, is a graph of the values of the class
marks (Xk) in the horizontal axis coupled with
the corresponding relative frequency (fk) in the
vertical axis, which represents the relative
frequency distribution of family income (X) - This graph is known as a histogram or bar chart
where each box represents each class and the
height of the box gives the relative frequency
(fk) of the corresponding class
25Continuous Data Variables
- Figure 4.3 presents the most common shapes taken
by histograms or bar charts - Unimodal There is only one peak
- Bimodal There are two peaks
- Unimodal Skewed to the Right It has a longer
tail in that direction (length of tail signifies
direction of skewness).
26Continuous Data Variables
- The mean and standard deviation of a continuous
variable calculated form a frequency table (using
the formulas given in the case of discrete
variables) are only approximations.
27Continuous Data Variables
- There is a correspondence between relative
frequencies and areas under histogram - The ratio of the area of the kth bar to the total
area of the histogram - wfk/ S wfk wfk / w S fk wfk / w fk
28Proportions
- Question What proportion of the observations
have X values between Xa and Xb ? - Prop (Xa X Xb ) ?
- Proportion of observations that lie in the
one-standard-deviation interval 10.120 5.755 (X
Income, Xa 4.365 and Xb 15.875 - Proportion of observations having incomes less
then or equal to 15 thousand dollars - Prop (0 X 15 ) ?
- Uniform distribution assumption X values of the
observations in any class interval are spread
smoothly throughout it. -
29Proportions
- Determine proportion in question by calculating
the sum of the relative frequencies of the class
intervals and parts of class intervals that make
up the interval from Xa to Xb - Prop (4.365 X 15.875) (8-4.365)/4.0f2
- f2 (15.875-12)/4.0f4
- (3.635/4.0)(0.35) (0.35)
(3.875/4.0)(0.16) - 0.318 0.35 0.155 0.823
30Proportions
- Graphical method for determining proportions
- Figure 4.4a, where proportion is given by the
ratio of the shaded area to the total area in the
histogram - Another application of proportion calculations is
determining the median - Prop(XXmed) 0.5
- Xmed 8 (0.10/0.35) (4.0) 9.14