DATA ANALYSIS FOR RESEARCH PROJECTS - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

DATA ANALYSIS FOR RESEARCH PROJECTS

Description:

data analysis for research projects – PowerPoint PPT presentation

Number of Views:132

Avg rating:3.0/5.0

Slides: 47

Provided by: PhilR164

Category:

more less

Transcript and Presenter's Notes

Title: DATA ANALYSIS FOR RESEARCH PROJECTS

1
DATA ANALYSIS FOR RESEARCH PROJECTS
2
TYPES OF DATA

Quantitative data
measurements use scale with equal
intervals
examples include mass (g), length (cm),
volume (mL), temperature (oC or K)
Qualitative data
non-standard scales with unequal
intervals or
discrete categories
examples include gender, choice, color
scales

3
Quantitative Scales of Measure
Scale Properties Example
Interval (equal) Numerical value indicates rank and meaningfully reflects relative distance between points on a scale Temperature (oC or oF)
Ratio (equal) Has all the properties of an interval scale, and in addition has a true zero point. (proportional scale) Length Weight Temperature (K)
4
Qualitative Scales of Measure
Scale Properties Example
Nominal (to name) Data represents qualitative or equivalent categories (not numerical, cannot be rank ordered). Eye color, hair color Gender Race
Ordinal (to order) Numerically ranked, but has no implication about how far apart ranks are. Grades Rating Scales
5
Sample Data

An experiment was conducted to measure the
tensile strength of each of twelve pieces of two
types of steel. The data from this experiment
are given in the table to the right.
Is there a significant difference in tensile
strength between the two types of steel?

Is there a better way to compare the data from
these groups?
What have you used before to compare data from
two different groups?

It is difficult to decide (consistently) whether
differences between experimental groups are
significant
We need a rigorous procedure that includes a
clear operational definition of dissimilarity.

8
Statistics Statistical Analysis

Statistical hypothesis-testing methods give us
the ability to say with confidence that
differences between groups are real and not just
due to random chance, sampling errors, or other
mistakes in data collection.

9
Sample data for consideration

For the following sets of data, discuss
What was the IV and DV tested?
How should the data be processed to determine if
the IV affects the DV?
How will you decide if the IV has a significant
effect on the DV?

10
Sample Data Set 1Effect of Temperature on the
pressure of a sample of gas above water
Temperature of Water (oC) Pressure (mmHg)
50 90
55 120
60 145
65 180
70 219
75 264
80 310
11
Graphing data

Correlation coefficient gives a measure of how
strong the relationship is between the graphed
variables.
Multiple trials can and should all be analyzed at
the same time.

12
Sample Data Set 2 Effect of Stress on the Height
of Bean Plants after 30 Days
Stressed Plants (cm) Unstressed Plants (cm)
55.0 48.0
65.0 65.0
50.0 59.0
57.0 57.0
59.0 51.0
73.0 63.0
57.0 65.0
54.0 58.0
62.0 44.0
68.0 50.0
13
Comparing levels of IV

If graphing the data is not appropriate, the
different groups of the IV can be compared.
These types of statistics are called Descriptive
Statistics since they
describe the data sets
summarize groups of measurements

14
Descriptive Statistics

Measure of Central Tendency
attempt to provide one value that is most
typical of the entire set of data
What are some examples of measures of central
tendency?
Variation
describes the spread within the data set
two sets of data with the same mean may have
quite different spread within the data

15
Appropriate Measures of Central Tendency and
Variations for Types of Data
QUANTITATIVE DATA QUALITATIVE DATA QUALITATIVE DATA
Central Tendency Measurement Mean, Median or Mode Nominal Ordinal
Central Tendency Measurement Mean, Median or Mode Mode Median
Variation Standard Deviation Or Range Frequency Distribution Frequency Distribution
16
What is standard deviation???

The standard deviation is a statistic that tells
you how tightly all the various examples are
clustered around the mean in a set of data. This
relates the variation in a set of data.
When the data points are pretty precise (close to
the mean, little variation), the bell-shaped
curve is steep, and the standard deviation is
small.
When there is greater variation in the data, the
bell curve is relatively flat. that tells you you
have a relatively large standard deviation.

17
Displaying variationBox-and-Whisker Plot

First Quartile (Q1) smaller than 75 of ranked
values
Median (Q2) smaller than 50 and larger than
50
Third Quartile (Q3) smaller than 25 of ranked
values

18
Illustrating Distributions for qualitative data
Histograms

Symmetrical mean equals median
Left-skewed mean lt median
Right-skewed mean gt median

19
Statistical Hypothesis Testing

A trend is apparent in the graph of the data, is
this trend significant?
So the means of the groups are different, is the
difference significant?
Statistical hypothesis testing is needed to
determine the significance in the results of your
data analysis.
The results of these tests provide Inferential
Statistics. We make inferential decisions based
on the data we collect from a sample population.

20
Sample Data Effect of Stress on the Height of
Bean Plants after 30 Days
Stressed Plants (cm) Unstressed Plants (cm)
55.0 48.0
65.0 65.0
50.0 59.0
57.0 57.0
59.0 51.0
73.0 63.0
57.0 65.0
54.0 58.0
62.0 44.0
68.0 50.0
21
Example for comparing meanst Test for
Quantitative Data

Equal Sample Size
t

mean of Group 1
mean of Group 2
variance of Group 1
variance of Group 2
number of items or measurements

22
Statistical calculations

Use the TI-84 or TI-83 calculator OR
Use Microsoft Excel Data Analysis
Calculate the t-test for the stressed plants data
on the next slide, using the graphing calculator

23
Level of Significance

Establish a level of significance
In this class, use 0.05.
this means the probability of error in
rejecting the null hypothesis is 5/100
OR
we can be 95 confident that the null
hypothesis may be rejected

24
Results from the calculator

t value for the t-test
x1 mean from List 1
x2 mean from List 2
Sx1 standard deviation for List 1
Sx2 standard deviation for List 2
df degrees of freedom
n1 number of values in List 1
n2 number of values in List 2

25
t-Test Results from Excel
26
Statistical Hypotheses(different from your
research hypothesis)

Null Hypothesis
suggests any observed difference between two
sample means occurred by chance and is NOT
significant
state that there is no relationship between
variables i.e. two means are equal OR they are
not statistically different
Claim / Alternative Hypothesis
derived from literature, research hypothesis
suggests outcome of experiment if I.V. affects
D.V.

27
Null Hypothesis

What would be the null hypothesis for this set of
data?

The mean height of stressed plants is not
significantly different from the mean height of
unstressed plants.
28
Confidence Levels

Probability that findings are repeatable
Infers that results of sample are the same as
results of the whole population
If we reject the null hypothesis at 95
confidence level
95 certainty that difference between groups is
NOT due to chance
95 certainty that results will be the same with
further testing

29
Confidence levels

Probablity of error Error that occurs if null
hypothesis is rejected when it is true and should
not be rejected
Identified by Greek lowercase alpha, a
Researchers usually select a lt 0.05
If confidence level is 95, then probability of
error (a) is 5, or 0.05

30
Statistical TestsTest Values and Critical Values

Test value the result of a statistical test on
your data.
Critical value this is a reference value for
each statistical test.
Your calculated statistical test value must
exceed this value for you to reject the null
hypothesis
You can find the critical value for each
statistical test in publications and university
websites. (links available on my website)
If you use Microsoft Excel for your statistics,
the critical value will be given with the results.

31
Significance of t value
Determine the degrees of freedom df (number in
experimental group 1) (number in control
group 1)
df (10 1) (10 1) 18
Determine significance of calculated t by looking
at table for critical t values Calculated t lt
critical t ? not significant Calculated t gt
critical t ? is significant
At df 18, t 2.101 Calculated t of 1.24 lt
2.101 and is not significant at 0.05 level.
32
Rejecting Null Hypothesis

If test value is not significant ?
null hypothesis is NOT REJECTED
If test value is significant ?
null hypothesis is REJECTED

33
Do Statistical Findings Support the Research
Hypothesis?

Null hypothesis was rejected
Research hypothesis was supported
(unless research hypothesis IS a null
hypothesis)
Null hypothesis was not rejected
Research hypothesis was not supported

34
SummarySteps of Hypothesis Testing

State the null hypothesis and alternative
hypothesis (claim)
Choose the confidence level (95) and sample size
Collect the data and calculate the appropriate
statistics
Make the proper statistical inference

35
Populations of Study Be careful what you claim!

Sample
specific portion of the population that is
selected for the study ( 100 bean seedlings used
in the study)
Sampled Population
population from which the sample was drawn (all
the bean seedlings in the nursery from which the
experimenter obtained their bean seedlings)
Target Population
ALL units (persons, things, experimental
outcomes) of the specific group whose
characteristics are being studied (all the bean
seedlings of the same species)

36
Communicating StatisticsEffect of Stress on the
Mean Height of Bean Plants after 30 Days
Stressed Group Unstressed Group
Mean Variance Standard Deviation 1SD 2SD Number 60.0 cm 49.1 cm 7.0 cm 53.0 67.0 cm 46.0 74.0 cm 10 56.0 cm 60.7 cm 7.8 cm 48.2 63.8 cm 40.4 71.6 cm 10
Results of t test t 1.3 df 18 t of 1.3 lt 2.101 p gt 0.10 t 1.3 df 18 t of 1.3 lt 2.101 p gt 0.10
37
(No Transcript)
38
Types of Tests

For Quantitative Data
Linear Regression
One-Way Analysis of Variance (ANOVA)
t Test
For Qualitative Data
Chi-Squared Test
Z Test

39
Linear Regression

Determines a linear relationship between two
variables based on a correlation coefficient
H0 The number of yellow MMs is not related to
the total number of MMs in the package.

40
ANOVA Test

Compares the means of more than two groups
H0 There is no significant difference between
the numbers of MMs in plain packages, almond
packages and peanut packages

41
t-Test

Compares the means of two independent groups
H0 There is no significant difference between
the numbers of MMs in plain and peanut packages
Two-tail test determines if populations are not
equal / the same (more difficult to support)
One-tail test determines if one mean is greater
than the other (easier to support)

42
Chi-Squared Test

Determines if a proportion within a sample is
larger than expected can be used for more than
two groups
H0 There are equal numbers of each color of MM
in a package.

43
Z-Test

Compares proportions between two groups
H0 There are equal proportions of red MMs
in plain and peanut packages

44
Selecting a Statistical Test

Things to consider
Number of groups of data
Type of data Quantitative or Qualitative
Type of variable numerical or categorical
The relationship in the null hypothesis being
tested

45
Statistical Tests Review

Comparison of two variables for correlation ?
correlation coefficient test
Comparing means of more than two groups/levels ?
ANOVA test
Comparing two means ? t-test
Comparison of proportions within a population ?
X2 (chi-squared) test
Comparison of proportions between populations ? Z
test

46
Key Questions for your Research

What kind of data will you need to collect to
test your hypothesis? (Qualitative or
Quantitative)
What kind of scale will you use?
How do you plan on analyzing this data?
Comparison of groups? What will you compare?
Look for a trend? What will you graph?
How many different levels will you need data for?
How many trials?
What relevant qualitative data will you look for
that may also help you interpret results?