AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH

Description:

Columns 5 and 6 shows the steps for calculating X and SX from a frequency table. 19 ... Instead m class intervals are created, so that each observation can be placed ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 31
Provided by: aaec9
Category:

less

Transcript and Presenter's Notes

Title: AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH


1
AAEC 4302ADVANCED STATISTICAL METHODS IN
AGRICULTURAL RESEARCH
  • Part I A Quick Review of Familiar Topics
  • Chapter 4 Frequency Distributions

2
Introduction
  • Chapter 4 examines grouping and classification
    techniques for organizing and summarizing data on
    a given variable
  • There are a few but important differences in the
    techniques applied to discrete vs. continuous
    variables

3
Discrete Data Variables
  • Suppose that n observations are available on a
    discrete variable X that only takes m different
    values
  • The n observations can be divided into m
    different groups within which all of the
    observations have exactly the same value

4
Discrete Data Variables
  • These m different groups can be indexed by k, so
    that k 1,,m
  • Example Family size variable on Table 2.2 (p.
    18)
  • X ranges from 2 to 10, m 9 (k 1,,9), the
    100 observations can be arranged into nine
    groups, one for each family size, value observed
    in the sample

5
Discrete Data Variables
  • Table 4.1 (p. 63) is an example of a Frequency
    Table for the variable family size

6
Discrete Data Variables
  • Xk (column 2) denotes the value of the variable
    X (family size) taken by all of the observations
    grouped into category k (for example X3 4
    indicates that the third category groups all
    observations with a value -family size- of 4)

7
Discrete Data Variables
  • nk (column 3) denotes the absolute frequency
    which is the number of observations that fall on
    that category (for example n3 34 denotes the
    of families in the sample with 4 family members)

8
Discrete Data Variables
  • Notice that nk n (the sample size)
  • fk nk/n (column 4) is the relative frequency or
    the proportion of the total observations that
    fall on that category (for example f3 0.34
    denotes the proportion of families in the sample
    with 4 family members)

9
Discrete Data Variables
  • Notice that fk 1
  • Can you prove it?

10
Discrete Data Variables
  • Figure 4.1 (p.64) is a conventional graph of the
    relative frequency distribution of the discrete
    variable family size (X) it is a graph of the
    value of fk for each value of Xk

11
Discrete Data Variables
  • Family size (Xk) is in the horizontal axis, and
    the corresponding relative frequency (fk) on the
    vertical axis

12
Discrete Data Variables
  • This is a summary graphical representation of
    the sample data where the only information lost
    is the sample size n
  • How do you read the graph?

13
Discrete Data Variables
  • Since the relative frequency distribution
    implicitly orders the values of X, the median
    (Xmed) is easily found by examining the
    cumulative frequencies

14
Discrete Data Variables
For example, in Table 4.1 10 of the obs. are
in grp 1, where X1 2 16 of the obs. are in grp
2, where X2 3 34 of the obs. are in grp 3,
where X3 4 Obviously the middle observation
(50th) is in group 3, and its value (family size)
equal to 4
15
Discrete Data Variables
  • Clearly the mode or most frequently occurring
    family size is also 4 in this case

16
Discrete Data Variables
  • The mean can also be calculated from a frequency
    table instead of the raw data

17
Discrete Data Variables
  • The standard deviation (SX) can also be
    calculated from a frequency table

18
Discrete Data Variables
  • Columns 5 and 6 shows the steps for calculating
    X and SX from a frequency table

19
Continuous Data Variables
  • The formerly discussed approach does not make
    sense with continuous variables, since there will
    likely be one group for each observation
  • Instead m class intervals are created, so that
    each observation can be placed into only one of
    them
  • The 3 principles to follow when creating these
    intervals are
  • The number of classes (m) should be between 5
    15
  • The range (width) of each interval should be the
    same
  • The mean point of each interval should be a
    convenient number

20
Continuous Data Variables
  • Example Let X be family income from Table 2.2
  • X ranges from 0.75 to 32.08 thousand dollars
  • Lets set up m 9 intervals starting from 0 with
    an interval range of 4.0 (4,000 dollars)
  • Observations that lie right on the boundary
    between two classes should be divided between
    lower and higher classes

21
Continuous Data Variables
  • Table 4.2 is the so constructed frequency table
    for family income
  • Column 1 includes the class interval index k and
    the boundary values of X that define the class

22
Continuous Data Variables
  • In the second column is the class mark (Xk) which
    is defined as the mid-point of the class
    interval
  • In the third column is the absolute frequency
    (nk) which, as before, is the number of
    observations in the sample whose value falls in
    the kth class (i.e. within the boundaries of the
    kth class interval)

23
Continuous Data Variables
  • In the fourth column is the relative frequency
    (fk) which, as before, is the proportion of
    observations in the sample whose value falls in
    the kth class (i.e. within the boundaries of the
    kth class interval)
  • Also as before, nk n, fk nk/n, fk 1

24
Continuous Data Variables
  • Figure 4.2, is a graph of the values of the class
    marks (Xk) in the horizontal axis coupled with
    the corresponding relative frequency (fk) in the
    vertical axis, which represents the relative
    frequency distribution of family income (X)
  • This graph is known as a histogram or bar chart
    where each box represents each class and the
    height of the box gives the relative frequency
    (fk) of the corresponding class

25
Continuous Data Variables
  • Figure 4.3 presents the most common shapes taken
    by histograms or bar charts
  • Unimodal There is only one peak
  • Bimodal There are two peaks
  • Unimodal Skewed to the Right It has a longer
    tail in that direction (length of tail signifies
    direction of skewness).

26
Continuous Data Variables
  • The mean and standard deviation of a continuous
    variable calculated form a frequency table (using
    the formulas given in the case of discrete
    variables) are only approximations.

27
Continuous Data Variables
  • There is a correspondence between relative
    frequencies and areas under histogram
  • The ratio of the area of the kth bar to the total
    area of the histogram
  • wfk/ S wfk wfk / w S fk wfk / w fk

28
Proportions
  • Question What proportion of the observations
    have X values between Xa and Xb ?
  • Prop (Xa X Xb ) ?
  • Proportion of observations that lie in the
    one-standard-deviation interval 10.120 5.755 (X
    Income, Xa 4.365 and Xb 15.875
  • Proportion of observations having incomes less
    then or equal to 15 thousand dollars
  • Prop (0 X 15 ) ?
  • Uniform distribution assumption X values of the
    observations in any class interval are spread
    smoothly throughout it.

29
Proportions
  • Determine proportion in question by calculating
    the sum of the relative frequencies of the class
    intervals and parts of class intervals that make
    up the interval from Xa to Xb
  • Prop (4.365 X 15.875) (8-4.365)/4.0f2
  • f2 (15.875-12)/4.0f4
  • (3.635/4.0)(0.35) (0.35)
    (3.875/4.0)(0.16)
  • 0.318 0.35 0.155 0.823

30
Proportions
  • Graphical method for determining proportions
  • Figure 4.4a, where proportion is given by the
    ratio of the shaded area to the total area in the
    histogram
  • Another application of proportion calculations is
    determining the median
  • Prop(XXmed) 0.5
  • Xmed 8 (0.10/0.35) (4.0) 9.14
Write a Comment
User Comments (0)
About PowerShow.com