Title: Mean
1Mean
- The most common measure of central tendency is
the mean which is also referred to as the
average. The mean is the total of the scores
divided by the number of scores. - Lets say our data set is 5 3 54 93 83 22 17 19.
2Median
- The median is the point that corresponds to the
score that lies in the middle of the distribution
when the data are arranged in increasing or
decreasing numerical order in other words, it
is the point that divides the distribution in
half. - With an odd number of data values, for example
21, we have - With an even number of data values, for example
20, we have
3Population and Sample
A population is any entire collection of people,
animals, plants or things from which we may
collect data. It is the entire group we are
interested in, which we wish to describe or draw
conclusions about. A sample is a subset from a
larger group (the population). By studying the
sample we hope to draw valid conclusions about
the larger group. The population for a study of
infant health might be all children born in Fiji
in the 1980's. The sample might be all babies
born on 7th May in any of the years.
4Variance
- The population variance gives an idea of how
widely spread the values of the random variable
are likely to be the larger the variance, the
more scattered the observations. - Variance is symbolised by V(X) or Var(X) or
- The variance of the random variable X is defined
to be - E(X) is the expected value of the random variable
X.
5Sample Variance
- Sample variance is a measure of the spread of or
dispersion within a set of sample data. The
sample variance is - where is the sample mean
- Lets say our data set is 5 3 54 93 83. The mean
of this data set is - The sample variance is
6Terciles
- Terciles divide data into three categories that
have the same chance of occurring. - Example From 1961 to 1990 the Sept to Dec
rainfall at Entebbe for 10 years were
below-normal (tercile 1) 200-445 mm, 10 years had
rainfall near-normal (tercile 2) 445512mm, and
10 years above-normal (tercile 3) 5121000mm.
7Histogram
- A histogram is a way of summarising data that are
measured on an interval scale. It divides up the
range of possible values in a data set into
classes or groups. For each group, a rectangle is
constructed with a base length equal to the range
of values in that specific group, and an area
proportional to the number of observations
falling into that group.
8Frequency Distribution
A frequency distribution is a tabular arrangement
of data whereby the data is grouped into
different intervals. Data presented in this
manner are known as grouped data. The relative
frequency distribution is the ratio of the number
of observations in the interval to the total
number of observations. The percentage frequency
distribution is the relative frequencies of each
interval multiplied by 100. The cumulative
frequency distribution is obtained by computing
the cumulative frequency, defined as the total
frequency of all values in the preceding intervals
9Frequency Table
- Example Yield of rice tonnes per hectare
-
- Frequency Distribution
10Cumulative Frequency
11Regression Correlation
Correlation is the statistical measure that
quantifies the linear relationship between two
variables. If you look at a scatter plot of two
variables, their correlation is the slope of the
best fitting straight line that can be drawn
through the points Regression is an extension
of correlation analysis that will predict the
value of one variable (the dependent variable)
based on the values of one or more predictor or
independent variables.
y a bx where y is the predicted value
of the dependent variable a is the intercept b
is the slope of the line x is the value of the
independent variable to be predicted
12Correlation
- Correlation is the statistical measure that
quantifies the linear relationship between two
variables . The sample correlation coefficient (
r ) between X and Y is -
13Correlation Cause
- Correlation means that two variables have some
type of association with each other, such that as
one variable increases, the other also increases,
or decreases. But it does not mean that one of
the variables is the cause of the other. - Correlations can demonstrate only that a
relationship exists or does not exist between
variables, but correlations cannot indicate
whether or not the relationship is causal. - Example
- It has been argued that there is a high
correlation between the increase in juvenile
delinquency and the increase in the divorce rate
in recent years. This may be so. This does not,
however, indicate that the increase in the
divorce rate has caused the increase in juvenile
delinquency.
14Correlation Calculations
- Set out a table and calculate S x, S y, S x2, S
y2, S xy and mean of x and y.
15Calculations of r, b and a
16Coeff. of Determination r2
- The coefficient of determination
- gives the proportion of the fluctuation of one
variable that is predictable from the other
variable. - is the ratio of the explained variation to the
total variation. - ranges from 0 lt r 2 lt 1, and denotes the
strength of the linear association between x and
y. - If r 0.922, then r 2 0.850, 85 of the total
variation in y can be explained by the linear
relationship between x and y.
17Hypothesis Testing
- A statistical hypothesis is the speculation
translated into a statement concerning the
distribution of a defined population. - The statistical hypothesis under test is often
referred to as null hypothesis. - H0 Boys and girls are of equal height
- The alternative hypothesis is a statement of what
a statistical hypothesis test is set up to
establish - H1 Boys are taller than girls
18Type of Error
- In testing a null hypothesis the level of
significance is the probability of rejecting a
true hypothesis. Four possible situations are - The hypothesis is true and it is accepted.
- The hypothesis is true and it is rejected.
- The hypothesis is false and it is accepted.
- The hypothesis is false and it is rejected.
-
19Type of Error
- A type I error is often considered to be more
serious, and therefore more important to avoid,
than a type II error. The probability of a type I
error can be precisely computed as -
- P(type I error) significance
level - A type II error is frequently due to sample sizes
being too small. The probability of a type II
error is generally unknown, but is symbolised by
and written - P(type II error)
20Probability
- A probability provides a quantitative description
of the likely occurrence of a particular event.
Probability is conventionally expressed on a
scale from 0 to 1 a rare event has a probability
close to 0, a very common event has a probability
close to 1. - The probability of drawing a spade from a pack of
52 well-shuffled playing cards is 13/52 1/4
0.25 - When tossing a coin, we assume that the results
'heads' or 'tails' each have equal probabilities
of 0.5.
21Subjective Probability
- A subjective probability describes an
individual's personal judgement about how likely
a particular event is to occur. It is not based
on any precise computation but is often a
reasonable assessment by a knowledgeable person. - Like all probabilities, a subjective probability
is conventionally expressed on a scale from 0 to
1 a rare event has a subjective probability
close to 0, a very common event has a subjective
probability close to 1. - A person's subjective probability of an event
describes his/her degree of belief in the event. - Example A political commentator suggests that
the Green Party may win the next election as they
have put environment on the top of their agenda.
22Independent Events
- Two events are independent if the occurrence of
one of the events gives us no information about
whether or not the other event will occur that
is, the events have no influence on each other. - In probability theory we say that two events, A
and B, are independent if the probability that
they both occur is equal to the product of the
probabilities of the two individual events, i.e. - A and B are independent A and C are independent
and B and C are independent (pair wise
independence)
23Example of Independent Events
- Suppose that a man and a woman each have a pack
of 52 playing cards. Each draws a card from
his/her pack. Find the probability that they each
draw the ace of clubs. - We define the events
- A probability that man draws ace of clubs
1/52 - B probability that woman draws ace of clubs
1/52 - Clearly events A and B are independent so
-
1/52 . 1/52 0.00037 - That is, there is a very small chance that the
man and the woman will both draw the ace of
clubs.
24Mutually Exclusive Events
- Two events are mutually exclusive (or disjoint)
if it is impossible for them to occur together. - Formally, two events A and B are mutually
exclusive if and only if - Examples
- Experiment Rolling a die once
- Sample space S 1,2,3,4,5,6
- Events A 'observe an odd number' 1,3,5
- B 'observe an even number' 2,4,6
- the empty set, so A and
B are mutually exclusive. - A subject in a study cannot be both male and
female, nor can they be aged 20 and 30. A subject
could however be both male and 20, or both female
and 30.
25Time Series
- A time series is a sequence of observations that
are ordered in time (or space). If observations
are made on some phenomenon throughout time, it
is most sensible to display the data in the order
in which they arose, particularly since
successive observations will probably be
dependent. - Time series are best displayed in a scatter plot.
The series value X is plotted on the vertical
axis and time t on the horizontal axis. There are
two kinds of time series data - Continuous where we have an observation at every
instant of time, e.g. electrocardiograms. We
denote this using observation X at time t, X(t). - Discrete where we have an observation at (usually
regularly) spaced intervals. We denote this as
Xt.
26Time Series Plot
27Terms in Time Series
Trend Component Trend is a long term movement in
a time series. A trend pattern exists when there
is a long-term secular increase or decrease in
the data. It is the underlying direction (an
upward or downward tendency) and rate of change
in a time series. The existence of a trend
(linear or non-linear) in the data means that
successive values will be positively correlated
with each other. Cyclical Component A cyclical
pattern exists when the data are influenced by
longer-term fluctuations such as those associated
with the business cycle.
28Terms in Time Series
Seasonal Component Seasonality is defined as a
pattern that repeats itself over fixed intervals
of time. For example, the costs of various types
of fruits and vegetables, unemployment figures
and average daily rainfall, all show marked
seasonal variation. Irregular Component The
irregular component is that left over when the
other components of the series (trend, seasonal
and cyclical) have been accounted for.
29Terms in Time Series
- Smoothing Smoothing techniques are used to
reduce irregularities (random fluctuations) in
time series data. They provide a clearer view of
the true underlying behaviour of the series. - Exponential Smoothing
- Exponential smoothing is a smoothing technique
used to reduce irregularities (random
fluctuations) in time series data, thus providing
a clearer view of the true underlying behaviour
of the series. - Moving average is a form of average that has been
adjusted to allow for seasonal or cyclical
components of a time series. Moving average
smoothing is a smoothing technique used to make
the long term trends of a time series clearer.
30Terms in Time Series
- Running medians smoothing is a smoothing
technique analogous to that used for moving
averages. The purpose of the technique is the
same, to make a trend clearer by reducing the
effects of other fluctuations. - Differencing is a popular and effective method of
removing trend from a time series. This provides
a clearer view of the true underlying behaviour
of the series. - Autocorrelation is the correlation (relationship)
between members of a time series of observations,
such as weekly share prices or interest rates,
and the same values at a fixed time interval
later.
31Probability Distribution
- The probability distribution of a discrete random
variable is a list of probabilities associated
with each of its possible values. It is also
sometimes called the probability function or the
probability mass function. - More formally, the probability distribution of a
discrete random variable X is a function which
gives the probability p(xi) that the random
variable equals xi, for each value xi - p(xi) P(Xxi)
- It satisfies the following conditions