Introduction to Statistics - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Introduction to Statistics

Description:

Statistics are used to analyse populations and predict changes in terms of ... Inferential statistics: propose null hypothesis and endeavour to disprove it. ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 31
Provided by: derek117
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Statistics


1
Introduction to Statistics
  • Biomedical Sciences Degrees Honours Students
  • Derek Scott
  • d.scott_at_abdn.ac.uk

2
Why use statistics?
  • Statistics are used to analyse populations and
    predict changes in terms of probability.
  • Normally, a representative sample is taken, large
    enough to make likely conclusions about the
    population as a whole.
  • Descriptive statistics summarise the data and
    describe the population. These values allow you
    to see how large and how variable the data are.
  • Inferential statistics propose null hypothesis
    and endeavour to disprove it. By looking at
    these, you can check for error.

3
  • When analysing data, you want to make the
    strongest possible conclusion from limited
    amounts of data. To do this, you need to overcome
    2 problems
  • Important differences can be obscured by
    biological variability and experimental error.
    This makes it difficult to distinguish real
    differences from random variability.
  • The human brain excels at finding patterns, even
    from random data. Our natural inclination
    (especially with our own data) is to conclude
    that any differences are real, and to minimise
    the contribution of random variability.
    Statistical rigor prevents you from making this
    mistake.

4
Errors
  • Bias or systematic error Data go in a
    predictable direction perhaps due to experimental
    design or human errors. Can remove the errors if
    you identify them.
  • Random error Unpredictable errors. Cant get rid
    of these.
  • Usually you will quote a measure of error with
    your data (e.g. standard deviation, standard
    error of the mean)
  • EXAMPLE The mean height of a student in BM4005
    is 1.71 0.20 (43) metres.

MEAN VALUE
SD or SEM
n, the number of samples
Units!!!
5
Independent Sampling 1
  • Measure BP in rats, 5 rats per group.
  • Measure BP 3 times in each animal.
  • You do not have 15 independent measurements,
    since triplicate measurements in each animals
    will be closer to one another than to those in
    other animals.
  • You should average values from each rat.
  • Now have 5 independent mean values.

6
Independent Sampling - 2
  • Perform a biochemical test 3 times, each time in
    triplicate.
  • Do not have 9 independent values, as an error in
    preparing the reagents for 1 experiment could
    affect all 3 triplicates.
  • Average the triplicates, and you have 3
    independent mean values.

7
Independent Sampling - 3
  • Doing a human exercise study.
  • Recruit 10 people from the inner-city, and 10
    people from the countryside.
  • Have not independently sampled 20 subjects from
    one population.
  • Data from inner-city subjects may be closer to
    each other than to the data from rural subjects.
    You have sampled from 2 populations, and need to
    account for this in your analysis.

8
Gaussian (Normal) Distribution
  • Data usually follow a bell-shaped distribution
    called Gaussian distribution. t-tests and ANOVA
    tests assume that the population follows an
    approximately Gaussian distribution.
  • For example, of we measure the height of everyone
    in 4th year and plot this, most people would fall
    in the middle of the curve, with a few at the
    bottom end, and a few at the top end of the
    curve.
  • For Gaussian distribution, we use parametric tests

9
Gaussian Distribution
Bell-shaped curve
10
Outliers
  • When analysing data, some values can be very
    different the rest.
  • Tempting to delete it from analysis.
  • Was the value typed in correctly?
  • Was there an experimental problem with that
    value?
  • Is it due to biological diversity?
  • What if answers to these questions are no?

11
Outliers
  • If outlier is due to chance, keep it in the data
    set.
  • If it is due to a mistake (e.g. bad pipetting,
    voltage spike, apparatus problem) then you must
    remove it from the analysis.
  • If you want to be absolutely sure whether the
    outlier is due to chance or not, there are
    specific statistical tests you can do, but
    usually these basic checks are enough to decide.

12
Mean
  • Sample mean will probably not be exactly the
    population mean. Mean is more accurate if you
    have a bigger sample size with a low variability.
  • You may calculate Confidence Intervals (CIs)
    telling you the area in which 95 of the
    population will fall.
  • EXAMPLE Mean height of a student in BM4005 is
    1.71 metres. The 95 confidence limits for this
    value are 1.5 and 1.8 metres. These are the upper
    and lower heights between which 95 of the class
    will fall.

13
Confidence Intervals
  • Nothing magical about 95. You could do it for
    any value you liked 99, 90 etc.
  • If you set a value of 99, then the intervals
    would be wider because 99 of the classs heights
    must fall within that range.
  • 95 confidence limits mean you have a reasonable
    level of confidence that the true population mean
    lies within that range.

14
Standard Deviation (SD)
  • Quantifies variability
  • If data follow Gaussian distribution, then 68 of
    values lie within one SD of mean (on either side)
    and 95 of values lie within 2 SDs of the mean.
  • So, as a rule of thumb, if 2 points on a graph
    are more than 2 SDs away from each other, they
    are significantly different.
  • Expressed in same units as data

15
Standard Error of the Mean (SEM)
  • Measure of how far sample mean is likely to be
    from the true population mean.
  • SEM SD/?n
  • Smaller than SD, so used more to give smaller
    error bars!
  • SD quantifies scatter how much values vary from
    each other. Doesnt really change much even if
    you have a bigger sample size.
  • SEM quantifies how accurately you know the true
    mean of the population. SEM gets smaller as
    sample gets larger

16
P Values
17
Students t-test
  • Used to compare the means of two groups of data.
  • Paired t-test control expt. and treatment done
    on same person, animal or cell etc.
  • Unpaired t-test control done on 1 group of
    subjects, with the treatment being done on
    another separate group.
  • Can be 1- or 2-tailed.

18
Iron and zinc evoke electrogenic responses that
are pH-dependent
Krebs pH 6.0
Krebs pH 7.4
IRON (100mM)
ZINC (100mM)
19
Iron- and zinc-evoked transport is
temperature-dependent
IRON
ZINC
? 4 oC ? 37 oC
20
Paired or Unpaired?
  • Choose paired if the 2 columns of data are
    matched, e.g.
  • You measure weight before and after an
    intervention in the same subjects.
  • You recruit subjects as pairs, matched for
    variables such as age, ethnic group, disease
    severity. One of the pair gets one treatment, the
    other gets an alternative treatment.
  • You perform the control experiment in one cell or
    piece of tissue, and then apply a drug. You
    measure the effect of the drug in the same cell
    or tissue.
  • Shouldnt be based on the variable you are
    comparing. For example, if measuring BP, you can
    match subjects based on their age or postcode,
    but not on their BPs.

21
Students t-test
  • You will probably always use a 2-tailed t-test.
  • 2-tailed test just asks whether there is a
    difference between the 2 means.
  • 1-tailed test predicts whether
  • Mean 1 is bigger than Mean 2 or
  • Mean 2 is bigger than Mean 1.
  • For 1 tailed you must know which mean will be
    bigger before you start not usually possible
  • Stick to a 2-tailed t-test to be safe!!!

22
Analysis of Variance (ANOVA)
  • Used to compare means of 3 or more groups.
  • Again, can have matched (paired) or unmatched
    (unpaired) values.
  • You will probably only use 1-way ANOVA
  • EXAMPLE Your null hypothesis is that the average
    BP for 4 men is equal. ANOVA can compare each
    subjects BP and say if they are different or
    not.

23
Features of ANOVA
  • ANOVA produces an F value which tells you how
    much variation there is in your sample. Higher F
    value means more variation.
  • Dunnetts post test allows you to compare against
    1 group e.g. A v B, A v C, A v D. Handy if A is
    the control group.
  • Tukeys post test allows you to compare all
    columns against one another just to check for any
    differences between any groups. Good way of
    finding significant differences that you may not
    have expected.

24
The effect of non-selective protein kinase
inhibition with staurosporine
IRON
ZINC
? 8-Br cGMP Staurosporine ? Staurosporine
(0.5 mM) ? 8-Br cGMP (100 mM) ? Control
25
Non-Gaussian Distribution
  • Use non-parametric tests for these unusual
    situations which rank data from low to high and
    analyse distribution of ranks.
  • Less powerful than parametric but used when
    values are too low or high to measure by
    assigning arbitrary values. Also used if outcome
    is a rank or score with only a few categories.
  • P values are usually higher.

26
Skewness
27
Correlation
ve correlation
-ve correlation
Correlation doesnt tell you about the cause of
the effect, it just tells you that there is a
link between value X and value Y. The nearer the
R value is to 1, the better the correlation.
28
Regression
Regression calculates a line of best fit. Often
used to calculate a standard curve which you
could use to estimate value x if you know value
y. Unknowns must fall within your standard
curves range.
29
Correlation and regression
  • A word of caution about doing regression and
    finding correlations.
  • Just because you can draw a line of best fit
    through some points and make quite a good
    straight line, it does not necessarily mean there
    is a relationship.
  • Correlation does not necessarily imply causation!
  • For example, the consumption of tropical fruit in
    the UK since WW2 has increased, and so has the
    birth rate in the UK. If I plot this on a graph,
    and did a regression, I would probably get a nice
    straight line as both increase together. I would
    probably also show there is a good correlation.
  • This does not mean that I can say that eating
    tropical fruit improves your fertility!!!
  • Use some common sense when interpreting your data!

30
Summary
  • This is just a basic introduction.
  • For extra information, try the Help files on
    Graphpad Prism (on the University PCs)
  • If you end up doing an Honours project with
    certain types of data (e.g. collecting
    psychological data, epidemiological studies
    etc.), your supervisor should inform you about
    any special tests/calculations they use for that
    type of data.
  • Finally, if you are still unsure, make it clear
    to your supervisor that you do not understand why
    or what you are doing.
Write a Comment
User Comments (0)
About PowerShow.com