Mean - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Mean

Description:

The most common measure of central tendency is the mean which is also referred ... A political commentator suggests that the Green Party may win the next election ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 32
Provided by: DPI46
Category:

less

Transcript and Presenter's Notes

Title: Mean


1
Mean
  • The most common measure of central tendency is
    the mean which is also referred to as the
    average. The mean is the total of the scores
    divided by the number of scores.
  • Lets say our data set is 5 3 54 93 83 22 17 19.

2
Median
  • The median is the point that corresponds to the
    score that lies in the middle of the distribution
    when the data are arranged in increasing or
    decreasing numerical order in other words, it
    is the point that divides the distribution in
    half.
  • With an odd number of data values, for example
    21, we have
  • With an even number of data values, for example
    20, we have

3
Population and Sample
A population is any entire collection of people,
animals, plants or things from which we may
collect data. It is the entire group we are
interested in, which we wish to describe or draw
conclusions about. A sample is a subset from a
larger group (the population). By studying the
sample we hope to draw valid conclusions about
the larger group. The population for a study of
infant health might be all children born in Fiji
in the 1980's. The sample might be all babies
born on 7th May in any of the years.
4
Variance
  • The population variance gives an idea of how
    widely spread the values of the random variable
    are likely to be the larger the variance, the
    more scattered the observations.
  • Variance is symbolised by V(X) or Var(X) or
  • The variance of the random variable X is defined
    to be
  • E(X) is the expected value of the random variable
    X.

5
Sample Variance
  • Sample variance is a measure of the spread of or
    dispersion within a set of sample data. The
    sample variance is
  • where is the sample mean
  • Lets say our data set is 5 3 54 93 83. The mean
    of this data set is
  • The sample variance is

6
Terciles
  • Terciles divide data into three categories that
    have the same chance of occurring.
  • Example From 1961 to 1990 the Sept to Dec
    rainfall at Entebbe for 10 years were
    below-normal (tercile 1) 200-445 mm, 10 years had
    rainfall near-normal (tercile 2) 445512mm, and
    10 years above-normal (tercile 3) 5121000mm.

7
Histogram
  • A histogram is a way of summarising data that are
    measured on an interval scale. It divides up the
    range of possible values in a data set into
    classes or groups. For each group, a rectangle is
    constructed with a base length equal to the range
    of values in that specific group, and an area
    proportional to the number of observations
    falling into that group.

8
Frequency Distribution
A frequency distribution is a tabular arrangement
of data whereby the data is grouped into
different intervals. Data presented in this
manner are known as grouped data. The relative
frequency distribution is the ratio of the number
of observations in the interval to the total
number of observations. The percentage frequency
distribution is the relative frequencies of each
interval multiplied by 100. The cumulative
frequency distribution is obtained by computing
the cumulative frequency, defined as the total
frequency of all values in the preceding intervals
9
Frequency Table
  • Example Yield of rice tonnes per hectare
  • Frequency Distribution

10
Cumulative Frequency
11
Regression Correlation
Correlation is the statistical measure that
quantifies the linear relationship between two
variables. If you look at a scatter plot of two
variables, their correlation is the slope of the
best fitting straight line that can be drawn
through the points Regression is an extension
of correlation analysis that will predict the
value of one variable (the dependent variable)
based on the values of one or more predictor or
independent variables.
y a bx where y is the predicted value
of the dependent variable a is the intercept b
is the slope of the line x is the value of the
independent variable to be predicted
12
Correlation
  • Correlation is the statistical measure that
    quantifies the linear relationship between two
    variables . The sample correlation coefficient (
    r ) between X and Y is

13
Correlation Cause
  • Correlation means that two variables have some
    type of association with each other, such that as
    one variable increases, the other also increases,
    or decreases. But it does not mean that one of
    the variables is the cause of the other.
  • Correlations can demonstrate only that a
    relationship exists or does not exist between
    variables, but correlations cannot indicate
    whether or not the relationship is causal.
  • Example
  • It has been argued that there is a high
    correlation between the increase in juvenile
    delinquency and the increase in the divorce rate
    in recent years. This may be so. This does not,
    however, indicate that the increase in the
    divorce rate has caused the increase in juvenile
    delinquency.

14
Correlation Calculations
  • Set out a table and calculate S x, S y, S x2, S
    y2, S xy and mean of x and y.

15
Calculations of r, b and a
16
Coeff. of Determination r2
  • The coefficient of determination
  • gives the proportion of the fluctuation of one
    variable that is predictable from the other
    variable.
  • is the ratio of the explained variation to the
    total variation.
  • ranges from 0 lt  r 2 lt 1,  and denotes the
    strength of the linear association between x and
    y. 
  • If r 0.922, then r 2 0.850, 85 of the total
    variation in y can be explained by the linear
    relationship between x and y.

17
Hypothesis Testing
  • A statistical hypothesis is the speculation
    translated into a statement concerning the
    distribution of a defined population.
  • The statistical hypothesis under test is often
    referred to as null hypothesis.
  • H0 Boys and girls are of equal height
  • The alternative hypothesis is a statement of what
    a statistical hypothesis test is set up to
    establish
  • H1 Boys are taller than girls

18
Type of Error
  • In testing a null hypothesis the level of
    significance is the probability of rejecting a
    true hypothesis. Four possible situations are
  • The hypothesis is true and it is accepted.
  • The hypothesis is true and it is rejected.
  • The hypothesis is false and it is accepted.
  • The hypothesis is false and it is rejected.

19
Type of Error
  • A type I error is often considered to be more
    serious, and therefore more important to avoid,
    than a type II error. The probability of a type I
    error can be precisely computed as
  • P(type I error) significance
    level
  • A type II error is frequently due to sample sizes
    being too small. The probability of a type II
    error is generally unknown, but is symbolised by
    and written
  • P(type II error)

20
Probability
  • A probability provides a quantitative description
    of the likely occurrence of a particular event.
    Probability is conventionally expressed on a
    scale from 0 to 1 a rare event has a probability
    close to 0, a very common event has a probability
    close to 1.
  • The probability of drawing a spade from a pack of
    52 well-shuffled playing cards is 13/52 1/4
    0.25
  • When tossing a coin, we assume that the results
    'heads' or 'tails' each have equal probabilities
    of 0.5.

21
Subjective Probability
  • A subjective probability describes an
    individual's personal judgement about how likely
    a particular event is to occur. It is not based
    on any precise computation but is often a
    reasonable assessment by a knowledgeable person.
  • Like all probabilities, a subjective probability
    is conventionally expressed on a scale from 0 to
    1 a rare event has a subjective probability
    close to 0, a very common event has a subjective
    probability close to 1.
  • A person's subjective probability of an event
    describes his/her degree of belief in the event.
  • Example A political commentator suggests that
    the Green Party may win the next election as they
    have put environment on the top of their agenda.

22
Independent Events
  • Two events are independent if the occurrence of
    one of the events gives us no information about
    whether or not the other event will occur that
    is, the events have no influence on each other.
  • In probability theory we say that two events, A
    and B, are independent if the probability that
    they both occur is equal to the product of the
    probabilities of the two individual events, i.e.
  • A and B are independent A and C are independent
    and B and C are independent (pair wise
    independence)

23
Example of Independent Events
  • Suppose that a man and a woman each have a pack
    of 52 playing cards. Each draws a card from
    his/her pack. Find the probability that they each
    draw the ace of clubs.
  • We define the events
  • A probability that man draws ace of clubs
    1/52
  • B probability that woman draws ace of clubs
    1/52
  • Clearly events A and B are independent so

  • 1/52 . 1/52 0.00037
  • That is, there is a very small chance that the
    man and the woman will both draw the ace of
    clubs.

24
Mutually Exclusive Events
  • Two events are mutually exclusive (or disjoint)
    if it is impossible for them to occur together.
  • Formally, two events A and B are mutually
    exclusive if and only if
  • Examples
  • Experiment Rolling a die once
  • Sample space S 1,2,3,4,5,6
  • Events A 'observe an odd number' 1,3,5
  • B 'observe an even number' 2,4,6
  • the empty set, so A and
    B are mutually exclusive.
  • A subject in a study cannot be both male and
    female, nor can they be aged 20 and 30. A subject
    could however be both male and 20, or both female
    and 30.

25
Time Series
  • A time series is a sequence of observations that
    are ordered in time (or space). If observations
    are made on some phenomenon throughout time, it
    is most sensible to display the data in the order
    in which they arose, particularly since
    successive observations will probably be
    dependent.
  • Time series are best displayed in a scatter plot.
    The series value X is plotted on the vertical
    axis and time t on the horizontal axis. There are
    two kinds of time series data
  • Continuous where we have an observation at every
    instant of time, e.g. electrocardiograms. We
    denote this using observation X at time t, X(t).
  • Discrete where we have an observation at (usually
    regularly) spaced intervals. We denote this as
    Xt.

26
Time Series Plot
27
Terms in Time Series
Trend Component Trend is a long term movement in
a time series. A trend pattern exists when there
is a long-term secular increase or decrease in
the data. It is the underlying direction (an
upward or downward tendency) and rate of change
in a time series. The existence of a trend
(linear or non-linear) in the data means that
successive values will be positively correlated
with each other. Cyclical Component A cyclical
pattern exists when the data are influenced by
longer-term fluctuations such as those associated
with the business cycle.
28
Terms in Time Series
Seasonal Component Seasonality is defined as a
pattern that repeats itself over fixed intervals
of time. For example, the costs of various types
of fruits and vegetables, unemployment figures
and average daily rainfall, all show marked
seasonal variation. Irregular Component The
irregular component is that left over when the
other components of the series (trend, seasonal
and cyclical) have been accounted for.
29
Terms in Time Series
  • Smoothing Smoothing techniques are used to
    reduce irregularities (random fluctuations) in
    time series data. They provide a clearer view of
    the true underlying behaviour of the series.
  • Exponential Smoothing
  • Exponential smoothing is a smoothing technique
    used to reduce irregularities (random
    fluctuations) in time series data, thus providing
    a clearer view of the true underlying behaviour
    of the series.
  • Moving average is a form of average that has been
    adjusted to allow for seasonal or cyclical
    components of a time series. Moving average
    smoothing is a smoothing technique used to make
    the long term trends of a time series clearer.

30
Terms in Time Series
  • Running medians smoothing is a smoothing
    technique analogous to that used for moving
    averages. The purpose of the technique is the
    same, to make a trend clearer by reducing the
    effects of other fluctuations.
  • Differencing is a popular and effective method of
    removing trend from a time series. This provides
    a clearer view of the true underlying behaviour
    of the series.
  • Autocorrelation is the correlation (relationship)
    between members of a time series of observations,
    such as weekly share prices or interest rates,
    and the same values at a fixed time interval
    later.

31
Probability Distribution
  • The probability distribution of a discrete random
    variable is a list of probabilities associated
    with each of its possible values. It is also
    sometimes called the probability function or the
    probability mass function.
  • More formally, the probability distribution of a
    discrete random variable X is a function which
    gives the probability p(xi) that the random
    variable equals xi, for each value xi
  • p(xi) P(Xxi)
  • It satisfies the following conditions
Write a Comment
User Comments (0)
About PowerShow.com