Title: DATA ANALYSIS FOR RESEARCH PROJECTS
1DATA ANALYSIS FOR RESEARCH PROJECTS
2TYPES OF DATA
- Quantitative data
- measurements use scale with equal
intervals - examples include mass (g), length (cm),
- volume (mL), temperature (oC or K)
- Qualitative data
- non-standard scales with unequal
intervals or - discrete categories
- examples include gender, choice, color
scales
3Quantitative Scales of Measure
Scale Properties Example
Interval (equal) Numerical value indicates rank and meaningfully reflects relative distance between points on a scale Temperature (oC or oF)
Ratio (equal) Has all the properties of an interval scale, and in addition has a true zero point. (proportional scale) Length Weight Temperature (K)
4Qualitative Scales of Measure
Scale Properties Example
Nominal (to name) Data represents qualitative or equivalent categories (not numerical, cannot be rank ordered). Eye color, hair color Gender Race
Ordinal (to order) Numerically ranked, but has no implication about how far apart ranks are. Grades Rating Scales
5Sample Data
- An experiment was conducted to measure the
tensile strength of each of twelve pieces of two
types of steel. The data from this experiment
are given in the table to the right. - Is there a significant difference in tensile
strength between the two types of steel?
6- Is there a better way to compare the data from
these groups? - What have you used before to compare data from
two different groups?
7- It is difficult to decide (consistently) whether
differences between experimental groups are
significant - We need a rigorous procedure that includes a
clear operational definition of dissimilarity.
8Statistics Statistical Analysis
- Statistical hypothesis-testing methods give us
the ability to say with confidence that
differences between groups are real and not just
due to random chance, sampling errors, or other
mistakes in data collection.
9Sample data for consideration
- For the following sets of data, discuss
- What was the IV and DV tested?
- How should the data be processed to determine if
the IV affects the DV? - How will you decide if the IV has a significant
effect on the DV?
10Sample Data Set 1Effect of Temperature on the
pressure of a sample of gas above water
Temperature of Water (oC) Pressure (mmHg)
50 90
55 120
60 145
65 180
70 219
75 264
80 310
11Graphing data
- Correlation coefficient gives a measure of how
strong the relationship is between the graphed
variables. - Multiple trials can and should all be analyzed at
the same time.
12Sample Data Set 2 Effect of Stress on the Height
of Bean Plants after 30 Days
Stressed Plants (cm) Unstressed Plants (cm)
55.0 48.0
65.0 65.0
50.0 59.0
57.0 57.0
59.0 51.0
73.0 63.0
57.0 65.0
54.0 58.0
62.0 44.0
68.0 50.0
13Comparing levels of IV
- If graphing the data is not appropriate, the
different groups of the IV can be compared. - These types of statistics are called Descriptive
Statistics since they - describe the data sets
- summarize groups of measurements
14Descriptive Statistics
- Measure of Central Tendency
- attempt to provide one value that is most
typical of the entire set of data - What are some examples of measures of central
tendency? - Variation
- describes the spread within the data set
- two sets of data with the same mean may have
quite different spread within the data
15Appropriate Measures of Central Tendency and
Variations for Types of Data
QUANTITATIVE DATA QUALITATIVE DATA QUALITATIVE DATA
Central Tendency Measurement Mean, Median or Mode Nominal Ordinal
Central Tendency Measurement Mean, Median or Mode Mode Median
Variation Standard Deviation Or Range Frequency Distribution Frequency Distribution
16What is standard deviation???
- The standard deviation is a statistic that tells
you how tightly all the various examples are
clustered around the mean in a set of data. This
relates the variation in a set of data. - When the data points are pretty precise (close to
the mean, little variation), the bell-shaped
curve is steep, and the standard deviation is
small. - When there is greater variation in the data, the
bell curve is relatively flat. that tells you you
have a relatively large standard deviation.
17Displaying variationBox-and-Whisker Plot
- First Quartile (Q1) smaller than 75 of ranked
values - Median (Q2) smaller than 50 and larger than
50 - Third Quartile (Q3) smaller than 25 of ranked
values
18Illustrating Distributions for qualitative data
Histograms
- Symmetrical mean equals median
- Left-skewed mean lt median
- Right-skewed mean gt median
19Statistical Hypothesis Testing
- A trend is apparent in the graph of the data, is
this trend significant? - So the means of the groups are different, is the
difference significant? - Statistical hypothesis testing is needed to
determine the significance in the results of your
data analysis. - The results of these tests provide Inferential
Statistics. We make inferential decisions based
on the data we collect from a sample population.
20Sample Data Effect of Stress on the Height of
Bean Plants after 30 Days
Stressed Plants (cm) Unstressed Plants (cm)
55.0 48.0
65.0 65.0
50.0 59.0
57.0 57.0
59.0 51.0
73.0 63.0
57.0 65.0
54.0 58.0
62.0 44.0
68.0 50.0
21Example for comparing meanst Test for
Quantitative Data
- mean of Group 1
- mean of Group 2
- variance of Group 1
- variance of Group 2
- number of items or measurements
22Statistical calculations
- Use the TI-84 or TI-83 calculator OR
- Use Microsoft Excel Data Analysis
- Calculate the t-test for the stressed plants data
on the next slide, using the graphing calculator
23Level of Significance
- Establish a level of significance
- In this class, use 0.05.
- this means the probability of error in
- rejecting the null hypothesis is 5/100
- OR
- we can be 95 confident that the null
- hypothesis may be rejected
24Results from the calculator
- t value for the t-test
- x1 mean from List 1
- x2 mean from List 2
- Sx1 standard deviation for List 1
- Sx2 standard deviation for List 2
- df degrees of freedom
- n1 number of values in List 1
- n2 number of values in List 2
25t-Test Results from Excel
26Statistical Hypotheses(different from your
research hypothesis)
- Null Hypothesis
- suggests any observed difference between two
sample means occurred by chance and is NOT
significant - state that there is no relationship between
variables i.e. two means are equal OR they are
not statistically different - Claim / Alternative Hypothesis
- derived from literature, research hypothesis
- suggests outcome of experiment if I.V. affects
D.V.
27Null Hypothesis
- What would be the null hypothesis for this set of
data?
The mean height of stressed plants is not
significantly different from the mean height of
unstressed plants.
28Confidence Levels
- Probability that findings are repeatable
- Infers that results of sample are the same as
results of the whole population - If we reject the null hypothesis at 95
confidence level - 95 certainty that difference between groups is
NOT due to chance - 95 certainty that results will be the same with
further testing
29Confidence levels
- Probablity of error Error that occurs if null
hypothesis is rejected when it is true and should
not be rejected - Identified by Greek lowercase alpha, a
- Researchers usually select a lt 0.05
- If confidence level is 95, then probability of
error (a) is 5, or 0.05
30Statistical TestsTest Values and Critical Values
- Test value the result of a statistical test on
your data. - Critical value this is a reference value for
each statistical test. - Your calculated statistical test value must
exceed this value for you to reject the null
hypothesis - You can find the critical value for each
statistical test in publications and university
websites. (links available on my website) - If you use Microsoft Excel for your statistics,
the critical value will be given with the results.
31Significance of t value
Determine the degrees of freedom df (number in
experimental group 1) (number in control
group 1)
df (10 1) (10 1) 18
Determine significance of calculated t by looking
at table for critical t values Calculated t lt
critical t ? not significant Calculated t gt
critical t ? is significant
At df 18, t 2.101 Calculated t of 1.24 lt
2.101 and is not significant at 0.05 level.
32Rejecting Null Hypothesis
- If test value is not significant ?
- null hypothesis is NOT REJECTED
- If test value is significant ?
- null hypothesis is REJECTED
33Do Statistical Findings Support the Research
Hypothesis?
- Null hypothesis was rejected
- Research hypothesis was supported
- (unless research hypothesis IS a null
hypothesis) - Null hypothesis was not rejected
- Research hypothesis was not supported
34SummarySteps of Hypothesis Testing
- State the null hypothesis and alternative
hypothesis (claim) - Choose the confidence level (95) and sample size
- Collect the data and calculate the appropriate
statistics - Make the proper statistical inference
35Populations of Study Be careful what you claim!
- Sample
- specific portion of the population that is
selected for the study ( 100 bean seedlings used
in the study) - Sampled Population
- population from which the sample was drawn (all
the bean seedlings in the nursery from which the
experimenter obtained their bean seedlings) - Target Population
- ALL units (persons, things, experimental
outcomes) of the specific group whose
characteristics are being studied (all the bean
seedlings of the same species)
36Communicating StatisticsEffect of Stress on the
Mean Height of Bean Plants after 30 Days
Stressed Group Unstressed Group
Mean Variance Standard Deviation 1SD 2SD Number 60.0 cm 49.1 cm 7.0 cm 53.0 67.0 cm 46.0 74.0 cm 10 56.0 cm 60.7 cm 7.8 cm 48.2 63.8 cm 40.4 71.6 cm 10
Results of t test t 1.3 df 18 t of 1.3 lt 2.101 p gt 0.10 t 1.3 df 18 t of 1.3 lt 2.101 p gt 0.10
37(No Transcript)
38Types of Tests
- For Quantitative Data
- Linear Regression
- One-Way Analysis of Variance (ANOVA)
- t Test
- For Qualitative Data
- Chi-Squared Test
- Z Test
39Linear Regression
- Determines a linear relationship between two
variables based on a correlation coefficient -
- H0 The number of yellow MMs is not related to
the total number of MMs in the package.
40ANOVA Test
- Compares the means of more than two groups
-
- H0 There is no significant difference between
the numbers of MMs in plain packages, almond
packages and peanut packages
41t-Test
- Compares the means of two independent groups
-
- H0 There is no significant difference between
the numbers of MMs in plain and peanut packages - Two-tail test determines if populations are not
equal / the same (more difficult to support) - One-tail test determines if one mean is greater
than the other (easier to support)
42Chi-Squared Test
- Determines if a proportion within a sample is
larger than expected can be used for more than
two groups -
- H0 There are equal numbers of each color of MM
in a package.
43Z-Test
- Compares proportions between two groups
-
- H0 There are equal proportions of red MMs
in plain and peanut packages
44Selecting a Statistical Test
- Things to consider
- Number of groups of data
- Type of data Quantitative or Qualitative
- Type of variable numerical or categorical
- The relationship in the null hypothesis being
tested
45Statistical Tests Review
- Comparison of two variables for correlation ?
correlation coefficient test - Comparing means of more than two groups/levels ?
ANOVA test - Comparing two means ? t-test
- Comparison of proportions within a population ?
X2 (chi-squared) test - Comparison of proportions between populations ? Z
test
46Key Questions for your Research
- What kind of data will you need to collect to
test your hypothesis? (Qualitative or
Quantitative) - What kind of scale will you use?
- How do you plan on analyzing this data?
- Comparison of groups? What will you compare?
- Look for a trend? What will you graph?
- How many different levels will you need data for?
- How many trials?
- What relevant qualitative data will you look for
that may also help you interpret results?