Title: The Scientific Method
 1The Scientific Method
- Lecture 3 
 - Data Recording  Transformation
 
  2Data Recording, Transformation and Descriptive 
Statistics
- In this lesson, we will 
 - Discuss considerations in recording and 
transforming data  - Explore the use and purpose of descriptive or 
summary statistics  - Explain the concepts of frequency distribution, 
bar chart, histogram, mean, median, mode, range, 
variance, standard deviation  
- By the end of this exercise, you should be able 
to  - Explain what is meant by data transformation and 
summarisation, using examples  - Define the terms frequency distribution, bar 
chart, histogram, mean, median, mode, range, 
variance, standard deviation and demonstrate 
their application to data sets 
  3Recording Data
Field or experimental data must be recorded in a 
planned way Variables under investigation may be 
 direct measurements e.g. weight, length, 
amount, pH category frequencies e.g. numbers of 
a species, length range, colour derived 
measurements e.g. , numbers / area, amount / 
time, amount / area / time
- Datalogging tables must 
 - record all values needed to derive the value of a 
variable  - permit derivation calculations to be recorded 
alongside the raw data  - have clear headings and units for each value e.g. 
 
  4Recording Data
Areas to be sampled can be measured out on a grid 
system (eg 0.5 metre intervals for 0.5 x 0.5m 
quadrats) and sample quadrat positions chosen 
fromrandom number tables 
Map references are given as Eastings (rows) 
first, then Northings (columns) second 
 5Recording Data
Fixed independent variables ie temperature, 
humidity, container size which have the potential 
to affect the value of the dependent variable 
must also be recorded
Preliminary experiments may be needed to set 
values for thesee.g. a temperature at which a 
bacterial culture will grow well
laboratory instruments must be calibrated before 
recording variable values eg pH meters are 
checked / reset against buffer solutions
 spectrophotometers must be zeroed against a 
blank solution containing reagents but no 
product, then read against a range of known 
concentrations of the product - used to plot 
a calibration curve for the instrument
Consistent rounding of decimal numbers (up or 
down!) and correct choice of significant figures 
to reflect the accuracy of measurements is very 
important Rounding up is conventional in 
scientific work 
 6Data Transformation
Summarises and highlights trends in the data eg
Totals  sum all the data values for a variable, 
useful for comparison and other 
purposes Percentages  describe the proportion of 
data falling into particular categories Rates  
show how a variable changes with time and allow 
comparison of data recorded over different time 
periods Reciprocals (1  variable)  reverse the 
magnitude of a variable and can help data 
interpretation Relative values  expression of 
data in relation to a standard value, providing 
context or helping application e.g. egg output 
per 1000 hens per month , energy requirement per 
Kg body weight problem? 
 7- Now attempt the two data transformation exercises 
in your workbook! 
  8Descriptive Statistics
Three important mathematical descriptions of the 
distribution of data
Empirical frequency distributions Measures of 
location Measures of dispersion
Frequency Distributions Show the frequency of 
occurrence of observations in a data set
Qualitative, non-numerical and discrete data (for 
at least one variable) are usually depicted in a 
bar chart 
 9Descriptive Statistics 
 10Descriptive Statistics
Continuous data is usually depicted in a histogram
Class intervals must be even and clearly defined 
such that an observation can fall INTO ONE CLASS 
ONLY e.g. 0 - 0.99, 1.00  1.99, 2.00  2.99, 
3.00  3.99 
 11Descriptive Statistics
It is sometimes helpful when comparing two or 
more frequency distributions where the total 
numbers of observations differ to calculate 
relative frequency or cumulative relative 
frequency distributions
This type of data plot is called an ogive  
 12Measures of Location (Averages)
Average refers to several measures of the central 
tendency of a data set
The mean is a good measure of central tendency 
when the data is distributed symmetrically but 
 will be distorted by a few excessively small or 
large values of x (outliers) 
 13median  the central value in a set of n 
observations arranged in rank order, with as many 
observations above it as below it
Measures of Location (Averages)
If n is an even number, the median is half-way 
between the value of the central two values
- mode - the most commonly occurring observation 
in a data set.The modal class is the group or 
class into which most observations fall in a 
histogram 
In a perfectly symmetrically distributed data 
set, mean, median and mode have the same value
NOW ATTEMPT THE EXERCISE IN YOUR WORKBOOK 
 14Measures of Dispersion
- Four main expressions of the spread of data 
 - Range  the difference between the largest and 
the smallest observations  - Interquartile range - the range of values 
enclosing the central 50 of the observations 
when they are arranged in order of magnitude 
(ranked )  - Variance - determined by calculating the 
average of the deviation of each observation from 
the arithmetic mean  - The variance is a very useful measure of data 
dispersion. Because some of the values will be 
negative, the deviations are squared to make them 
all positive and the variance ( s2 ) is 
calculated as 
  15Measures of Dispersion
s2 is used to denote the sample variance and 
distinguish it from the population variance  
given the symbol s2 and calculated by dividing by 
n 
s2  mean of (squares minus the square of the 
mean) 
 16Measures of Dispersion
- Standard deviation ( s or SD) is the square 
root of the variance and the most popular 
measure of dispersion  -  
 
and represents the average of the deviations of 
the observations from the arithmetic mean
The population standard deviation ( s ) is 
calculated by using n rather than (n  1) in the 
same way that s2 represents the population 
variance. The population mean is given the symbol 
ยต (mu)
Exercise!
Calculate the range, variance and standard 
deviation of the following data set The plasma 
vitamin E concentration (?mol/l) in 12 heifers 
showing clinical signs of a muscle condition were 
as follows 4.2, 3.3, 7.0, 6.9, 5.1, 3.4, 2.5, 
8.6, 3.5, 2.9, 4.9, 5.4