Scientific Methods 1 - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Scientific Methods 1

Description:

Scientific Methods 1 Scientific evaluation, experimental design & statistical methods COMP80131 Lecture 2: Statistical Methods-Basics Barry & Goran – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 36
Provided by: Barry230
Category:

less

Transcript and Presenter's Notes

Title: Scientific Methods 1


1
Scientific Methods 1
Scientific evaluation, experimental design
statistical methods COMP80131 Lecture 2
Statistical Methods-Basics
  • Barry Goran

www.cs.man.ac.uk/barry/mydocs/myCOMP80131
2
Scientific Methods 1
  • Scientific evaluation derivation of useful
    reliable statements about some new or existing
    scientific idea based on an accumulation of
    evidence which is often in the form of tables of
    numerical values.
  • Experimental design how to generate the
    quantifiable outputs, the systematic observation
    measurement of these outputs and the recording
    of the resulting data. The experiments are
    normally designed to test some theoretical
    prediction of what the researcher expects to
    happen a research hypothesis
  • Statistical methods the means of deriving the
    required useful and reliable statements from
    numerical evidence.

3
Scientific Enquiry
  • It may be argued that
  • Scientific researchers propose hypotheses as
    explanations of phenomena design experimental
    studies to test these hypotheses.
  • It may also be argued otherwise.
  • Wider domains of inquiry may combine many
    independently derived hypotheses.
  • Or not have hypotheses at all, other than
    contrived ones such as
  • This idea can (not) be implemented

4
Philosophy of Science
  • Concerns the underpinning logic of the
    scientific method, what
    separates science from non-science,
    the ethics implicit in science.
  • Assumes reality is objective and consistent,
    humans have the capacity to perceive
    reality accurately, rational
    explanations exist for elements of the real
    world.
  • Logical Positivism other theories claim to have
    defined the logic of science, but have all been
    been challenged.
  • Ludwig Wittgenstein (1889-1951) got his PhD in
    Manchester

5
Objectivity, repeatability full disclosure
  • Scientific inquiry is intended to be as objective
    as possible, to reduce biased interpretations of
    results.
  • Procedures must be reproducible (i.e. repeatable)
  • Researchers should
  • document, archive and share all data and
    methodology so they are available for careful
    scrutiny by other scientists, giving them the
    opportunity to verify results by attempting to
    reproduce them.
  • This practice is called full disclosure.
  • Allows the methodology the statistical
    reliability of the data to be verified.

6
References on Statistics
  • DJ Hand Statistics a very short introduction
    Oxford UP 2008
  • Schaums Outlines Prob Stats 2009
  • WG Hopkins A new View of Statistics (Google it)
  • Why is my evil lecturer forcing me to learn
    statistics? (Google it forget it!!)

7
Tables of Results
  • Engli Maths Phys Chem Hist Fren Music Art
    Avge
  • 81 67 60 104 89 97
    72 30 75.0
  • 91 32 42 34 24 65
    81 61 53.8
  • 13 123 45 22 92 61 114
    11 60.1
  • 91 65 80 23 95 47
    101 33 66.9
  • 63 58 44 6 38 58
    36 21 40.5
  • 10 28 69 24 84 91
    20 102 53.5
  • 28 20 60 18 46 38
    -3 79 35.8
  • 55 0 44 85 35 23
    11 112 45.6
  • 96 38 49 17 11 42
    45 48 43.3
  • 96 21 48 83 80 27
    8 101 58.0
  • 16 68 55 35 69 44
    40 55 47.8
  • 97 41 64 13 91 63
    -13 33 48.6
  • 96 100 34 19 34 53 81
    -10 50.9
  • 49 92 70 17 13 39
    63 -19 40.5
  • 80 55 58 3 58 87
    68 28 54.6
  • 14 42 45 95 63 30
    64 46 49.9
  • 42 82 49 19 88 40
    42 16 47.3
  • 92 18 53 80 0 52
    -17 108 48.3

A fictitious set of exam results. A sample of 20
students out of a population of 1000. Complete
file is ExamData.xls or ExamData.dat www.cs.man.a
c.uk/barry
8
A bit of MATLAB
  • Marks,Headingsxlsread('ExamData.xls')
  • nRows,nCols size(Marks)
  • Headings(1,1nCols))
  • Marks

Reads in marks from Excel spreadsheet into an
array Marks. Headings read in separately. Miss
out to display. is comment.
9
A bit more MATLAB
  • Row with mean of each column
  • Me mean(Marks)
  • Row with standd deviations of cols
  • St_devs std(Marks)
  • Row with variances of cols
  • Variances var(Marks)

Statistics printed out Engli
Maths Phys Chem Hist Fren Music
Art Avge Means 52.2 49.2 49.7
49.6 55.7 51.0 48.4 50.7 50.8 Std_devs
28.2 27.2 10.5 31.5 33.3 28.6 33.4
34.1 8.7 Variances 795 741 110 990
1109 819 1115 1165 75.5
10
Definitions mean
  • 46
  • 8
  • 50
  • 6
  • 99
  • -42
  • 30
  • 23
  • 16
  • 38
  • 60
  • -3
  • 45
  • 0
  • 30

Here is a col of marks, say for French. The mean
is the average. It is about 27. This is a
statistic which summarizes the column of
data. Alternatives exist e.g. median mode It
allows comparisons to be made. If the average is
31 next year, we can hypothesise that the
students are better, better taught or the exam
was easier, (or maybe the exam room was
warmer). (Is the increase of 4 statistically
significant?)
11
Definitions variance
  • 46
  • 8
  • 50
  • 6
  • 99
  • -42
  • 30
  • 23
  • 16
  • 38
  • 60
  • -3
  • 45
  • 0
  • 30

28 26 29 25 30 24 27 26 28 27 28 26 25 29 27
On the right is another column. Mean is also
27. But it is much less spread out its
variance is less. All students are getting close
to the same mark. Maybe the exam is not well
designed to test ability. If there are N marks,
subtract the mean from each of them, square them
add up the squared values then divide by N-1.
Another statistic 1068 (left) 2.86
(right) Measure of spread
12
Definitions std_deviation
  • 46
  • 8
  • 50
  • 6
  • 99
  • -42
  • 30
  • 23
  • 16
  • 38
  • 60
  • -3
  • 54
  • 0
  • 30

28 26 29 25 30 24 27 26 28 27 28 26 25 29 27
This is the square root of the variance. Also a
measure of spread Yet another statistic
32.7 (left)
1.69 (right) Many alternatives exist
13
Population-mean sample-mean
  • Simplest statistic is probably the mean or
    average.
  • Given a table of 20 marks, average is easily
    found understood.
  • Questions arise if we consider this batch of
    students to be a sample of a much larger
    population of say 1000 students taking exams.
  • How representative is this batchs average,
    called a sample-mean, likely to be of the mean
    for the whole population, i.e.the population
    mean?
  • A question that arises all the time in
    statistical methods.
  • A 2nd example if there is a population of 50
    million people in the UK, we take a sample of
    1000 people, measure their heights compute the
    average, how close will be this sample mean to
    the true mean for the whole population?
  • How reliable will sample-mean be as estimate of
    population-mean?
  • Same question can be asked about other
    statistics, e.g.. variance.

14
Back to MATLAB
  • Divide the 1000 marks into batches compute the
    sample mean for each batch.

True Means 52.2 49.2 49.7 49.6 55.7 51.0
48.4 50.7 50.8 ---------------------------------
--------------------------------------------- Mean
s 50.0 58.7 51.0 46.7 43.7 62.3 61.1
36.9 51.3 52.7 Means 48.5 51.8 57.8
47.2 45.6 47.7 53.7 50.6 48.0 44.5 Means
49.5 48.6 30.9 53.9 43.7 53.6 46.6
50.4 56.9 48.4 Means 44.5 68.2 48.1
55.9 48.0 52.5 54.0 42.2 50.3 56.8 Means
52.2 39.9 38.1 69.9 50.4 61.9 57.2
50.6 49.5 59.8 Means 59.0 61.5 39.5
54.9 42.6 44.0 50.6 41.0 62.1 48.9 Means
44.6 56.1 48.7 49.9 44.3 48.4 39.1
52.4 56.6 43.5 Means 62.8 49.6 55.7
42.9 48.8 42.1 60.7 66.5 41.8 55.2 Means
51.7 52.3 53.2 48.2 48.1 69.1 49.8
57.0 50.1 53.4 Means 49.9 47.4 54.1
50.4 67.2 51.6 42.9 56.1 52.5 44.9 Means
55.8 46.1 48.5 55.8 54.7 54.5 39.3
49.9 43.8 53.1 Means 50.4 44.1 55.5
46.6 47.8 41.7 47.9 57.5 53.7 51.5 Means
52.8 67.2 47.8 46.7 53.3 53.8 46.9
51.3 48.5 58.6 Means 47.0 48.6 56.4
50.3 50.9 56.4 50.0 52.1 42.5 50.5 Means
54.2 50.0 52.3 51.0 52.3 50.9 50.8
63.5 48.6 58.6 Means 56.3 51.1 54.0
53.9 64.0 48.8 50.8 44.3 62.2 61.8 Means
40.9 53.3 52.8 56.9 51.2 61.1 57.6
56.8 50.1 37.6 Means 53.0 55.9 38.8
47.2 49.0 62.2 49.1 39.4 54.6 49.5 Means
47.8 51.4 48.2 45.9 48.2 53.6 54.0
43.6 49.1 48.3 Means 38.9 51.9 52.0
60.7 44.1 44.2 70.8 51.3 49.9 46.8 Means
52.6 54.9 54.9 50.8 43.8 53.5 50.9
58.3 40.1 48.9 Means 52.5 68.1 53.3
46.1 60.1 53.4 52.0 48.3 51.5 55.5 Means
60.0 45.7 45.5 45.7 50.5 51.8 44.8
50.1 54.2 65.9
Sample means for 50 batches of 20 Look at col 1
(Engl)
15
50 batches of 20 (column 1)
Look at spread over all batches for column
1 Remember pop-mean ? 52.2 Mean (of
sample-means) 52.2 Variance 32
16
20 batches of 50 (column 1)
Variance has reduced. Mean of sample-means
52.2 Variance 18.2
17
10 batches of 100
Mean of sample-means 52.2 Variance 7.28
18
Distributions
  • Histogram divides domain (x-axis) into say 10 or
    20 regions plots the number of marks that fall
    in each region.
  • In MATLAB
  • figure(1) hist(Marks(,1),20)
  • figure(2) hist(Marks(,2),20)
  • figure(3) hist(Marks(,3),20) etc.

19
Histogram for col 1 (English)
Evenly distributed across the domain. Looks like
a uniform distribution
20
Histogram for col 2 (Maths)
Looks a bit Gaussian or normal Mean ? 50
21
Histogram for col 3 (Phys)
Also looks Gaussian Mean ? 50 with smaller
variance
22
Histogram for col 4 (Chem)
Bi-modal distribution
23
Column 5(Hist)
A bit strange
24
Col 6 (French)
Uniform again?
25
Column 7 (Music)
Gaussian again?
26
Col 8 (Art)
Gaussian again?
27
Col 9 (Average)
Gaussian?
28
Some questions for you
  • Analyse the ficticious exam results comment on
    features.
  • Compute means, stds vars for each subject
    histograms for the distributions.
  • Make observations about performance in each
    subject overall
  • Do marks support the hypothesis that people good
    at Music are also good at Maths?
  • Do they support the hypothesis that people good
    at English are also good at French?
  • Do they support the hypothesis that people good
    at Art are also good at Maths?
  • If you have access to only 50 rows of this data,
    investigate the same hypotheses
  • What conclusions could you draw, and with what
    degree of certainty?

29
Correlation
  • Measure of how two columns are related.
  • Let cols be x and y
  • Correlation coefficient

30
Scatter plot col 1 against col 1
Corr coeff 1 Positive correlation
31
Scatter plot col 1 against -col 1
Corr-coeff -1 Negative correlation
32
Scatter plot col 1(Eng) col 2(Maths)
Corr coeff 0.04 (close to zero) Very weak or
no correlation
33
Scatter plot col 2(Maths) col 7(Mus)
Corr coeff 0.8 (strong ve corr)
34
Scatter plot col 2(Maths) col 8(Art)
Corr coeff -0.8 Strong ve correlation
35
Correlation
  • In MATLAB corr(Marks)

1.00 -0.037 -0.029 -0.068 -0.04
0.012 -0.015 0.013 0.34 -0.037
1.00 -0.0014 0.051 -0.033 0.003
0.79 -0.82 0.365 -0.029 -0.0014
1.00 -0.042 0.03 0.009 0.017
0.011 0.15 -0.068 0.051 -0.042
1.00 -0.013 -0.055 0.048 -0.031
0.42 -0.04 -0.033 0.03 -0.013
1.00 -0.053 0.002 -0.006
0.43 0.012 0.003 0.009 -0.055 -0.053
1.00 -0.004 -0.009 0.363 -0.015
0.79 0.017 0.0476 0.0021 -0.004
1.00 -0.66 0.48 0.013 -0.82
0.011 -0.031 -0.0061 -0.009 -0.66
1.00 -0.16 0.34 0.37 0.15
0.42 0.43 0.363 0.48
-0.16 1.00
Write a Comment
User Comments (0)
About PowerShow.com