SCIENCE 1101: SCIENCE, SOCIETY and the ENVIRONMENT I - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

SCIENCE 1101: SCIENCE, SOCIETY and the ENVIRONMENT I

Description:

3 F 23 4'9' Blonde. 4 M 26 5'8' Brown. F 24 5'1' Red. M 28 6'4' Brown. M 21 5'11' Black ... F 24 5'10' Blonde. F 26 5'2' Brown. Goals of statistics: ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 48
Provided by: davidh2
Category:

less

Transcript and Presenter's Notes

Title: SCIENCE 1101: SCIENCE, SOCIETY and the ENVIRONMENT I


1
SCIENCE 1101 SCIENCE, SOCIETY and the
ENVIRONMENT I
  • Lecture 4

2
Statistics
  • What are Statistics?

3
  • Statistics is a type of mathematics that allows
    us to analyze trends in data.
  • Statistics is often referred to as the science of
    data..

4
  • Statistics (and science also) involves
  • Collecting data
  • Classifying data
  • Summarizing data
  • Organizing data
  • Analyzing data
  • Interpreting data

5
Types of data
  • Quantitative observations made on a numerical
    scale.
  • Qualitative non-numerical data that can only be
    classified into one of a group of categories.

6
Example
  • Group Gender Age Height
    Hair Color
  • M 26 510 Black
  • F 30 55 Black
  • 3 F 23 49 Blonde
  • 4 M 26 58 Brown
  • F 24 51 Red
  • M 28 64 Brown
  • M 21 511 Black
  • F 27 510 Brown
  • M 29 55 Red
  • F 24 510 Blonde
  • F 26 52 Brown

7
Goals of statistics
  • Describe data sets (populations or samples)
  • To use sample data to make inferences about a
    population or group of samples.

8
Two general branches of statistics
  • Descriptive statistics The branch of statistics
    devoted to the organization, summarization, and
    description of data sets.
  • Inferential statistics The branch of statistics
    concerned with using data to make an inference
    about a population or group of data

9
Descriptive Statistics
10
  • Measure of Central Tendency A number that
    describes the center of a data sets distribution.
  • This value is useful because it allows the
    researcher to understand where the data is most
    concentrated. This will allow researchers to
    decide what the normal condition is.

11
Measure of Central Tendency
  • To organize the data, place the values of a data
    set into descending order.
  • 26, 30, 23, 26, 24, 28, 21, 27, 29, 24,26
  • Becomes
  • 30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21

12
Measure of central tendency
  • Mean
  • Median
  • Mode

13
Measure of Central Tendency
  • Mode
  • The number that occurs most often in a data set.
  • Does not need to be near the center of the data
    set.
  • A data set can have more than one mode or no
    mode.

14
Measure of Central Tendency
  • 30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21
  • What is the mode of the above data set?

15
Measure of Central Tendency
  • Median
  • If the number of observations is odd,the value
    that occurs in the middle of a data set is the
    median.
  • If the number of observations is even, the mean
    between the two middle observations is the
    median.

16
Measure of Central Tendency
  • 30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21
  • What is the median value of the above data set?

17
Measure of Central Tendency
  • Mean
  • Often referred to as the average
  • Add up all of the observations and then divide by
    the number of observations.
  • (12345)/5 3

18
Measure of Central Tendency
  • 30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21
  • What is the mean of the above data set?

19
Answers
  • Mode 26
  • Median 26
  • Mean 25.81 26
  • In this data set the Mean, Median and Mode are
    all more or less equal. This does not always
    happen though.

20
Which is the best?
  • This depends on the type of descriptive
    information you want.

21
Measure of Dispersion
  • Dispersion is a measure of the spread of a data
    set.

22
Measure of Dispersion
  • Range
  • Gives the researcher an idea about how spread and
    diverse their data is.
  • Describes the highest and lowest values in
    a data set.
  • All values within a data set fall within the
    range.

23
Measure of Dispersion
  • 30, 29, 28, 27, 26, 26, 26, 24, 24, 23, 21
  • What is the range of the above data set?
  • 21 to 30

24
Measure of Dispersion
  • Standard Deviation and variance are also measures
    of dispersion.
  • Use statistical programs to help you calculate
    these values.

25
Measures of DispersionVariance and Standard
Deviation
  • Standard Deviation and Variance
  • Interpreted as being the average distance of the
    sample points from their center.

26
Bringing it all together
  • Imagine that we are studying the diameter of pine
    trees in a 200,000 acres forest with
    approximately 4,000,000 trees.
  • Is it possible to measure ALL of the trees in the
    entire forest?
  • AnswerNO!!

27
Bringing it all together
  • How can we then determine what the general tree
    diameter is of a tree in such a large forest with
    so many trees?
  • We measure fewer trees (our representative
    sample), lets say 100 trees.
  • Why did we pick fewer trees to measure?

28
Bringing it all together
  • After weve measured our trees then we try to
    determine what measure of central tendency we
    want to use.
  • In this experiment well use the mean.
  • How can knowing the mean tree diameter of 100
    trees tell me anything about a forest with
    4,000,000 trees?

29
Bringing it all together
  • Knowing the mean diameter of 100 trees can tell
    us a lot.
  • However, we can never know for sure what the
    average tree diameter is in the 200,000 acre
    forest without measuring ALL of the trees. This
    would be the True Average.

30
Bringing it all together
  • Therefore, Since we cannot know the True Average,
    Standard Deviation helps us to know where the
    true average may lie based on our sample
    population.
  • Lets say the average tree diameter for a pine
    tree in the 100 trees is 2.5 feet in diameter.

31
Bringing it all together
  • Based on the data in the 100 tree sample, we get
    a standard deviation of 1.3.
  • Based on this we now know that the True Mean tree
    diameter may lie between 3.8 feet to 1.2 feet.

32
(No Transcript)
33
  • Take a break

34
  • Inferential Statistics

35
Inferential Statistics
  • Makes comparisons between data sets and then
    infers whether the two sets are significantly
    different from one another.
  • Chance will always plays a role.
  • Attempt to determine if the two means truly
    differ or is the difference just due to random
    chance.

36
Inferential Statistics
  • A coach wants to know if the coin flip is fixed
  • Ideally, if I flip a coin there is an equal
    number of chances (probability) that either side
    will appear on top.
  • To determine if this is true, I flip the coin and
    count the number of times heads comes up.

37
Binomial Distribution
Animated graph
I flip the coin 10 times and these are the
probabilities that we get based on our flips. Is
ten flips enough? How about more?
38
Binomial Distribution
Imagine that I flip it 50 times. Notice that the
distribution is smoother than the previous
Distribution.
39
Probability
  • Normal curves are useful because they allow us to
    make statistical conclusions about being a
    certain distance from the center or mean.
  • 68 of all values are within one standard
    deviation, 95 within 2, and 99 are within 3
  • The difficulty is knowing when to conclude an
    occurrence is not due to random chance

40
Probability
  • So where do we determine that the difference is
    due to random chance
  • Statisticians decided that two standard
    deviations or 95 would be the cut off.
  • This means that there is a 5 chance that the
    difference that you see is due to random chance.

41
Probability
  • Whenever a statistical test returns a probability
    value (p-value) of 0.05 (5) or less, we reject
    the null hypothesis.
  • Null hypothesis states that the data fits the
    distribution.

42
Probability
33 Heads
0.05 Cutoff
Back to coin toss if result of 50 flips is 33
heads And 17 tails is it part of the
distribution? NO.
43
t-test
  • The t-test is one of many types of inferential
    statistics that will allow you to compare two
    different groups of data and determine if they
    are statistically different.
  • This test asks the question Do the data sets
    have the same distribution.

44
t-test
  • Imagine we want to compare the growth rate of two
    populations of fish raised on different food
    types.
  • We use the t-test to compare the means and
    dispersion of the data set to determine if the
    growth rates have the same distribution.

45
t-test
The t-test looks at the ratio of the differences
in the means of the two groups to the variability
of the data of the two groups.
46
t-test
  • This t-statistic ratio allows us to determine the
    probability for your test to determine if the
    differences between the two sets of data are due
    to random chance.
  • Our t-statistic is compared to a probability
    table and our probability value is determined.
  • If we get a probability of less than 0.05 we know
    that the differences between the two sets of data
    are not due to random chance and the two sets of
    data are statistically different.

47
t-test
  • We are always testing the Null hypothesis.
  • Therefore the hypotheses we are actually testing
    under this experimental design is
  • H0The growth rate of the two fish populations
    will not differ significantly.
Write a Comment
User Comments (0)
About PowerShow.com