SCIENCE 1101: SCIENCE, SOCIETY and the ENVIRONMENT I - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

SCIENCE 1101: SCIENCE, SOCIETY and the ENVIRONMENT I

Description:

3 F 23 4'9' Blonde. 4 M 26 5'8' Brown. F 24 5'1' Red. M 28 6'4' Brown. M 21 5'11' Black ... F 24 5'10' Blonde. F 26 5'2' Brown. Goals of statistics: ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 48

Provided by: davidh2

Category:

more less

Transcript and Presenter's Notes

Title: SCIENCE 1101: SCIENCE, SOCIETY and the ENVIRONMENT I

1
SCIENCE 1101 SCIENCE, SOCIETY and the
ENVIRONMENT I

Lecture 4

2
Statistics

What are Statistics?

Statistics is a type of mathematics that allows
us to analyze trends in data.
Statistics is often referred to as the science of
data..

Statistics (and science also) involves
Collecting data
Classifying data
Summarizing data
Organizing data
Analyzing data
Interpreting data

5
Types of data

Quantitative observations made on a numerical
scale.
Qualitative non-numerical data that can only be
classified into one of a group of categories.

6
Example

Group Gender Age Height
Hair Color
M 26 510 Black
F 30 55 Black
3 F 23 49 Blonde
4 M 26 58 Brown
F 24 51 Red
M 28 64 Brown
M 21 511 Black
F 27 510 Brown
M 29 55 Red
F 24 510 Blonde
F 26 52 Brown

7
Goals of statistics

Describe data sets (populations or samples)
To use sample data to make inferences about a
population or group of samples.

8
Two general branches of statistics

Descriptive statistics The branch of statistics
devoted to the organization, summarization, and
description of data sets.
Inferential statistics The branch of statistics
concerned with using data to make an inference
about a population or group of data

9
Descriptive Statistics
10

Measure of Central Tendency A number that
describes the center of a data sets distribution.
This value is useful because it allows the
researcher to understand where the data is most
concentrated. This will allow researchers to
decide what the normal condition is.

11
Measure of Central Tendency

To organize the data, place the values of a data
set into descending order.
26, 30, 23, 26, 24, 28, 21, 27, 29, 24,26
Becomes
30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21

12
Measure of central tendency

Mean
Median
Mode

13
Measure of Central Tendency

Mode
The number that occurs most often in a data set.
Does not need to be near the center of the data
set.
A data set can have more than one mode or no
mode.

14
Measure of Central Tendency

30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21
What is the mode of the above data set?

15
Measure of Central Tendency

Median
If the number of observations is odd,the value
that occurs in the middle of a data set is the
median.
If the number of observations is even, the mean
between the two middle observations is the
median.

16
Measure of Central Tendency

30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21
What is the median value of the above data set?

17
Measure of Central Tendency

Mean
Often referred to as the average
Add up all of the observations and then divide by
the number of observations.
(12345)/5 3

18
Measure of Central Tendency

30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21
What is the mean of the above data set?

19
Answers

Mode 26
Median 26
Mean 25.81 26
In this data set the Mean, Median and Mode are
all more or less equal. This does not always
happen though.

20
Which is the best?

This depends on the type of descriptive
information you want.

21
Measure of Dispersion

Dispersion is a measure of the spread of a data
set.

22
Measure of Dispersion

Range
Gives the researcher an idea about how spread and
diverse their data is.
Describes the highest and lowest values in
a data set.
All values within a data set fall within the
range.

23
Measure of Dispersion

30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21
What is the range of the above data set?
21 to 30

24
Measure of Dispersion

Standard Deviation and variance are also measures
of dispersion.
Use statistical programs to help you calculate
these values.

25
Measures of DispersionVariance and Standard
Deviation

Standard Deviation and Variance
Interpreted as being the average distance of the
sample points from their center.

26
Bringing it all together

Imagine that we are studying the diameter of pine
trees in a 200,000 acres forest with
approximately 4,000,000 trees.
Is it possible to measure ALL of the trees in the
entire forest?
AnswerNO!!

27
Bringing it all together

How can we then determine what the general tree
diameter is of a tree in such a large forest with
so many trees?
We measure fewer trees (our representative
sample), lets say 100 trees.
Why did we pick fewer trees to measure?

28
Bringing it all together

After weve measured our trees then we try to
determine what measure of central tendency we
want to use.
In this experiment well use the mean.
How can knowing the mean tree diameter of 100
trees tell me anything about a forest with
4,000,000 trees?

29
Bringing it all together

Knowing the mean diameter of 100 trees can tell
us a lot.
However, we can never know for sure what the
average tree diameter is in the 200,000 acre
forest without measuring ALL of the trees. This
would be the True Average.

30
Bringing it all together

Therefore, Since we cannot know the True Average,
Standard Deviation helps us to know where the
true average may lie based on our sample
population.
Lets say the average tree diameter for a pine
tree in the 100 trees is 2.5 feet in diameter.

31
Bringing it all together

Based on the data in the 100 tree sample, we get
a standard deviation of 1.3.
Based on this we now know that the True Mean tree
diameter may lie between 3.8 feet to 1.2 feet.

32
(No Transcript)
33

Take a break

Inferential Statistics

35
Inferential Statistics

Makes comparisons between data sets and then
infers whether the two sets are significantly
different from one another.
Chance will always plays a role.
Attempt to determine if the two means truly
differ or is the difference just due to random
chance.

36
Inferential Statistics

A coach wants to know if the coin flip is fixed
Ideally, if I flip a coin there is an equal
number of chances (probability) that either side
will appear on top.
To determine if this is true, I flip the coin and
count the number of times heads comes up.

37
Binomial Distribution
Animated graph
I flip the coin 10 times and these are the
probabilities that we get based on our flips. Is
ten flips enough? How about more?
38
Binomial Distribution
Imagine that I flip it 50 times. Notice that the
distribution is smoother than the previous
Distribution.
39
Probability

Normal curves are useful because they allow us to
make statistical conclusions about being a
certain distance from the center or mean.
68 of all values are within one standard
deviation, 95 within 2, and 99 are within 3
The difficulty is knowing when to conclude an
occurrence is not due to random chance

40
Probability

So where do we determine that the difference is
due to random chance
Statisticians decided that two standard
deviations or 95 would be the cut off.
This means that there is a 5 chance that the
difference that you see is due to random chance.

41
Probability

Whenever a statistical test returns a probability
value (p-value) of 0.05 (5) or less, we reject
the null hypothesis.
Null hypothesis states that the data fits the
distribution.

42
Probability
33 Heads
0.05 Cutoff
Back to coin toss if result of 50 flips is 33
heads And 17 tails is it part of the
distribution? NO.
43
t-test

The t-test is one of many types of inferential
statistics that will allow you to compare two
different groups of data and determine if they
are statistically different.
This test asks the question Do the data sets
have the same distribution.

44
t-test

Imagine we want to compare the growth rate of two
populations of fish raised on different food
types.
We use the t-test to compare the means and
dispersion of the data set to determine if the
growth rates have the same distribution.

45
t-test
The t-test looks at the ratio of the differences
in the means of the two groups to the variability
of the data of the two groups.
46
t-test

This t-statistic ratio allows us to determine the
probability for your test to determine if the
differences between the two sets of data are due
to random chance.
Our t-statistic is compared to a probability
table and our probability value is determined.
If we get a probability of less than 0.05 we know
that the differences between the two sets of data
are not due to random chance and the two sets of
data are statistically different.

47
t-test

We are always testing the Null hypothesis.
Therefore the hypotheses we are actually testing
under this experimental design is
H0The growth rate of the two fish populations
will not differ significantly.

Write a Comment

User Comments (0)