Title: The Scientific Method
1The Scientific Method
- Lecture 3
- Data Recording Transformation
2Data Recording, Transformation and Descriptive
Statistics
- In this lesson, we will
- Discuss considerations in recording and
transforming data - Explore the use and purpose of descriptive or
summary statistics - Explain the concepts of frequency distribution,
bar chart, histogram, mean, median, mode, range,
variance, standard deviation
- By the end of this exercise, you should be able
to - Explain what is meant by data transformation and
summarisation, using examples - Define the terms frequency distribution, bar
chart, histogram, mean, median, mode, range,
variance, standard deviation and demonstrate
their application to data sets
3Recording Data
Field or experimental data must be recorded in a
planned way Variables under investigation may be
direct measurements e.g. weight, length,
amount, pH category frequencies e.g. numbers of
a species, length range, colour derived
measurements e.g. , numbers / area, amount /
time, amount / area / time
- Datalogging tables must
- record all values needed to derive the value of a
variable - permit derivation calculations to be recorded
alongside the raw data - have clear headings and units for each value e.g.
4Recording Data
Areas to be sampled can be measured out on a grid
system (eg 0.5 metre intervals for 0.5 x 0.5m
quadrats) and sample quadrat positions chosen
fromrandom number tables
Map references are given as Eastings (rows)
first, then Northings (columns) second
5Recording Data
Fixed independent variables ie temperature,
humidity, container size which have the potential
to affect the value of the dependent variable
must also be recorded
Preliminary experiments may be needed to set
values for thesee.g. a temperature at which a
bacterial culture will grow well
laboratory instruments must be calibrated before
recording variable values eg pH meters are
checked / reset against buffer solutions
spectrophotometers must be zeroed against a
blank solution containing reagents but no
product, then read against a range of known
concentrations of the product - used to plot
a calibration curve for the instrument
Consistent rounding of decimal numbers (up or
down!) and correct choice of significant figures
to reflect the accuracy of measurements is very
important Rounding up is conventional in
scientific work
6Data Transformation
Summarises and highlights trends in the data eg
Totals sum all the data values for a variable,
useful for comparison and other
purposes Percentages describe the proportion of
data falling into particular categories Rates
show how a variable changes with time and allow
comparison of data recorded over different time
periods Reciprocals (1 variable) reverse the
magnitude of a variable and can help data
interpretation Relative values expression of
data in relation to a standard value, providing
context or helping application e.g. egg output
per 1000 hens per month , energy requirement per
Kg body weight problem?
7- Now attempt the two data transformation exercises
in your workbook!
8Descriptive Statistics
Three important mathematical descriptions of the
distribution of data
Empirical frequency distributions Measures of
location Measures of dispersion
Frequency Distributions Show the frequency of
occurrence of observations in a data set
Qualitative, non-numerical and discrete data (for
at least one variable) are usually depicted in a
bar chart
9Descriptive Statistics
10Descriptive Statistics
Continuous data is usually depicted in a histogram
Class intervals must be even and clearly defined
such that an observation can fall INTO ONE CLASS
ONLY e.g. 0 - 0.99, 1.00 1.99, 2.00 2.99,
3.00 3.99
11Descriptive Statistics
It is sometimes helpful when comparing two or
more frequency distributions where the total
numbers of observations differ to calculate
relative frequency or cumulative relative
frequency distributions
This type of data plot is called an ogive
12Measures of Location (Averages)
Average refers to several measures of the central
tendency of a data set
The mean is a good measure of central tendency
when the data is distributed symmetrically but
will be distorted by a few excessively small or
large values of x (outliers)
13median the central value in a set of n
observations arranged in rank order, with as many
observations above it as below it
Measures of Location (Averages)
If n is an even number, the median is half-way
between the value of the central two values
- mode - the most commonly occurring observation
in a data set.The modal class is the group or
class into which most observations fall in a
histogram
In a perfectly symmetrically distributed data
set, mean, median and mode have the same value
NOW ATTEMPT THE EXERCISE IN YOUR WORKBOOK
14Measures of Dispersion
- Four main expressions of the spread of data
- Range the difference between the largest and
the smallest observations - Interquartile range - the range of values
enclosing the central 50 of the observations
when they are arranged in order of magnitude
(ranked ) - Variance - determined by calculating the
average of the deviation of each observation from
the arithmetic mean - The variance is a very useful measure of data
dispersion. Because some of the values will be
negative, the deviations are squared to make them
all positive and the variance ( s2 ) is
calculated as
15Measures of Dispersion
s2 is used to denote the sample variance and
distinguish it from the population variance
given the symbol s2 and calculated by dividing by
n
s2 mean of (squares minus the square of the
mean)
16Measures of Dispersion
- Standard deviation ( s or SD) is the square
root of the variance and the most popular
measure of dispersion -
and represents the average of the deviations of
the observations from the arithmetic mean
The population standard deviation ( s ) is
calculated by using n rather than (n 1) in the
same way that s2 represents the population
variance. The population mean is given the symbol
µ (mu)
Exercise!
Calculate the range, variance and standard
deviation of the following data set The plasma
vitamin E concentration (?mol/l) in 12 heifers
showing clinical signs of a muscle condition were
as follows 4.2, 3.3, 7.0, 6.9, 5.1, 3.4, 2.5,
8.6, 3.5, 2.9, 4.9, 5.4