Title: SCIENCE 1101: SCIENCE, SOCIETY and the ENVIRONMENT I
1SCIENCE 1101 SCIENCE, SOCIETY and the
ENVIRONMENT I
2Statistics
3- Statistics is a type of mathematics that allows
us to analyze trends in data. - Statistics is often referred to as the science of
data..
4- Statistics (and science also) involves
- Collecting data
- Classifying data
- Summarizing data
- Organizing data
- Analyzing data
- Interpreting data
5Types of data
- Quantitative observations made on a numerical
scale. - Qualitative non-numerical data that can only be
classified into one of a group of categories.
6Example
- Group Gender Age Height
Hair Color - M 26 510 Black
- F 30 55 Black
- 3 F 23 49 Blonde
- 4 M 26 58 Brown
- F 24 51 Red
- M 28 64 Brown
- M 21 511 Black
- F 27 510 Brown
- M 29 55 Red
- F 24 510 Blonde
- F 26 52 Brown
7Goals of statistics
- Describe data sets (populations or samples)
- To use sample data to make inferences about a
population or group of samples.
8 Two general branches of statistics
- Descriptive statistics The branch of statistics
devoted to the organization, summarization, and
description of data sets. - Inferential statistics The branch of statistics
concerned with using data to make an inference
about a population or group of data
9Descriptive Statistics
10- Measure of Central Tendency A number that
describes the center of a data sets distribution. - This value is useful because it allows the
researcher to understand where the data is most
concentrated. This will allow researchers to
decide what the normal condition is.
11Measure of Central Tendency
- To organize the data, place the values of a data
set into descending order. - 26, 30, 23, 26, 24, 28, 21, 27, 29, 24,26
- Becomes
- 30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21
12Measure of central tendency
13Measure of Central Tendency
- Mode
- The number that occurs most often in a data set.
- Does not need to be near the center of the data
set. - A data set can have more than one mode or no
mode.
14Measure of Central Tendency
- 30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21
- What is the mode of the above data set?
15Measure of Central Tendency
- Median
- If the number of observations is odd,the value
that occurs in the middle of a data set is the
median. - If the number of observations is even, the mean
between the two middle observations is the
median.
16Measure of Central Tendency
- 30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21
- What is the median value of the above data set?
17Measure of Central Tendency
- Mean
- Often referred to as the average
- Add up all of the observations and then divide by
the number of observations. -
- (12345)/5 3
18Measure of Central Tendency
- 30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21
- What is the mean of the above data set?
19Answers
- Mode 26
- Median 26
- Mean 25.81 26
- In this data set the Mean, Median and Mode are
all more or less equal. This does not always
happen though.
20Which is the best?
- This depends on the type of descriptive
information you want.
21Measure of Dispersion
- Dispersion is a measure of the spread of a data
set.
22Measure of Dispersion
- Range
- Gives the researcher an idea about how spread and
diverse their data is. - Describes the highest and lowest values in
a data set. - All values within a data set fall within the
range.
23Measure of Dispersion
- 30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21
- What is the range of the above data set?
- 21 to 30
24Measure of Dispersion
- Standard Deviation and variance are also measures
of dispersion. - Use statistical programs to help you calculate
these values.
25Measures of DispersionVariance and Standard
Deviation
- Standard Deviation and Variance
- Interpreted as being the average distance of the
sample points from their center.
26Bringing it all together
- Imagine that we are studying the diameter of pine
trees in a 200,000 acres forest with
approximately 4,000,000 trees. - Is it possible to measure ALL of the trees in the
entire forest? - AnswerNO!!
27Bringing it all together
- How can we then determine what the general tree
diameter is of a tree in such a large forest with
so many trees? - We measure fewer trees (our representative
sample), lets say 100 trees. - Why did we pick fewer trees to measure?
28Bringing it all together
- After weve measured our trees then we try to
determine what measure of central tendency we
want to use. - In this experiment well use the mean.
- How can knowing the mean tree diameter of 100
trees tell me anything about a forest with
4,000,000 trees?
29Bringing it all together
- Knowing the mean diameter of 100 trees can tell
us a lot. - However, we can never know for sure what the
average tree diameter is in the 200,000 acre
forest without measuring ALL of the trees. This
would be the True Average.
30Bringing it all together
- Therefore, Since we cannot know the True Average,
Standard Deviation helps us to know where the
true average may lie based on our sample
population. - Lets say the average tree diameter for a pine
tree in the 100 trees is 2.5 feet in diameter.
31Bringing it all together
- Based on the data in the 100 tree sample, we get
a standard deviation of 1.3. - Based on this we now know that the True Mean tree
diameter may lie between 3.8 feet to 1.2 feet.
32(No Transcript)
33 34 35Inferential Statistics
- Makes comparisons between data sets and then
infers whether the two sets are significantly
different from one another. - Chance will always plays a role.
- Attempt to determine if the two means truly
differ or is the difference just due to random
chance.
36Inferential Statistics
- A coach wants to know if the coin flip is fixed
- Ideally, if I flip a coin there is an equal
number of chances (probability) that either side
will appear on top. - To determine if this is true, I flip the coin and
count the number of times heads comes up.
37Binomial Distribution
Animated graph
I flip the coin 10 times and these are the
probabilities that we get based on our flips. Is
ten flips enough? How about more?
38Binomial Distribution
Imagine that I flip it 50 times. Notice that the
distribution is smoother than the previous
Distribution.
39Probability
- Normal curves are useful because they allow us to
make statistical conclusions about being a
certain distance from the center or mean. - 68 of all values are within one standard
deviation, 95 within 2, and 99 are within 3 - The difficulty is knowing when to conclude an
occurrence is not due to random chance
40Probability
- So where do we determine that the difference is
due to random chance - Statisticians decided that two standard
deviations or 95 would be the cut off. - This means that there is a 5 chance that the
difference that you see is due to random chance.
41Probability
- Whenever a statistical test returns a probability
value (p-value) of 0.05 (5) or less, we reject
the null hypothesis. - Null hypothesis states that the data fits the
distribution.
42Probability
33 Heads
0.05 Cutoff
Back to coin toss if result of 50 flips is 33
heads And 17 tails is it part of the
distribution? NO.
43t-test
- The t-test is one of many types of inferential
statistics that will allow you to compare two
different groups of data and determine if they
are statistically different. - This test asks the question Do the data sets
have the same distribution.
44t-test
- Imagine we want to compare the growth rate of two
populations of fish raised on different food
types. - We use the t-test to compare the means and
dispersion of the data set to determine if the
growth rates have the same distribution.
45t-test
The t-test looks at the ratio of the differences
in the means of the two groups to the variability
of the data of the two groups.
46t-test
- This t-statistic ratio allows us to determine the
probability for your test to determine if the
differences between the two sets of data are due
to random chance. - Our t-statistic is compared to a probability
table and our probability value is determined. - If we get a probability of less than 0.05 we know
that the differences between the two sets of data
are not due to random chance and the two sets of
data are statistically different.
47t-test
- We are always testing the Null hypothesis.
- Therefore the hypotheses we are actually testing
under this experimental design is - H0The growth rate of the two fish populations
will not differ significantly.