Statistics Class 2 - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Statistics Class 2

Description:

... been identified by researchers: Broca's, conduction, and anomic. ... Anomic 10 10/22 = .455 1.00. Totals 22 1.00 1.00. September 22, 2000. Anabel Quan Haase ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 73
Provided by: famili4
Category:

less

Transcript and Presenter's Notes

Title: Statistics Class 2


1
Statistics Class 2
  • Descriptive Statistics
  • Central Tendency, Variability, and Standard
    Deviation
  • Probability and Sampling
  • Frequency Distributions

2
Where we have been?
  • Looked at definition of statistics
  • Looked at key terms
  • The role of statistical thinking in science and
    society at large
  • Distinction inferential and descriptive
    statistics
  • Distinction quantitative and qualitative data
  • Reliability

3
Where we are going?
  • Talk about descriptive statistics
  • Graphs
  • Numerical methods
  • Aim of descriptive statistics

4
Describing Qualitative Data
  • Definition of qualitative data
  • Types of qualitative data
  • nominal (yes/no 0, 1)
  • ordinal (never, rarely, once a month, once a
    week, daily) ordered or ranked data

5
Key Terms
  • Class is one of several categories in which
    observations can be classified
  • Class frequency is the number of observations in
    a particular class
  • Class relative frequency is the class frequency
    in relation to all observations ie divided by
    the total number of observations

6
Aphasia Example
  • Consider a study of aphasia published in the
    Journal of Communication Disorders (Mar. 1995).
    Aphasia is the "impariment of loss of the faculty
    of using or understaind spoken or written
    languages." Three types of aphasia have been
    identified by researchers Broca's, conduction,
    and anomic. They wanted to determine whether one
    type of aphasia occurs more often than nay other,
    an, if so, how often. Consequently, they measured
    apahsia types for a sample of 22 adult
    aphasiac's. Table 2.1 gives the type of aphasia
    diagnosed for each aphasiac in the sample.

7
Summary Table for Data on 22 Aphasiacs
8
Calculating Relative Frequencies
Type Aphasia Relative Frequency Cumulative
Frequency Brocas 5 5/22 .227 .227 Conduct
ion 7 7/22 .318 .645 Anomic 10 10/22
.455 1.00 Totals 22 1.00 1.00
9
Example Problem Dimensions
  • Information systems design can progress toward
    meting the needs of the population of
    decision-makers, managers, policy-makers, and
    interdisciplinary workers by attention to
    specifications obtained from the users
    situation. The user situation is represented by
    problems and their dimensions. Problem dimensions
    are discussed to propose a new orientation for
    the design of information systems. (MacMullin
    Taylor, The Information Society, 3, 91-111).

10
Bar Graph
  • Visual depiction of the relative frequency of a
    variable. Shows how many times each case appears
    in the sample.
  • Can also depict accumulative frequency or
    absolute frequency.

11
(No Transcript)
12
(No Transcript)
13
Figure 1. Types of Aphasia in 22 Adult Aphasiacs.
Source Journal of Communication Disorders.
14
(No Transcript)
15
Graphical Methods for Describing Quantitative Data
  • Dot Plots
  • Stem-and-Leaf Display
  • Histograms

16
  • break

17
Dot Plots
  • In a dot plot the numerical value of each case is
    located on the horizontal axis.
  • When data values repeat, dots are placed one on
    top of the other.
  • Example

18
Stem-and-Leaf
  • A very easy and simple way of depicting data. The
    stem represents the full number, whereas the leaf
    represents the decimal points.
  • Example

19
Relative Frequency Histogram
  • A relative frequency histogram has a vertical and
    a horizontal axes.
  • The vertical axis indicates the proportion or
    relative frequency of the data.
  • The horizontal axis represents the possible
    numerical values appearing in the data.
  • Example.

20
Measurement Classes
  • The horizontal axis is divided into intervals
    called measurement classes.
  • The intervals are of equal size.
  • To each interval a frequency and relative
    frequency can be assigned on the vertical axis.

21
Types of Histograms
  • Absolute frequency
  • Relative frequency
  • Cumulative frequency
  • Percentage
  • Cumulative percentage
  • Absolute frequency including cumulative percentage

22
Numerical Measures of Central Tendency
  • Central tendency is the numerical indictor of
    where the data tends to cluster.
  • Variability indicates what the spread of the data
    is.
  • Example height (general).

23
Mean
  • The mean is the most commonly used measure of
    central tendency.
  • It is the sum of all the data points divided by
    the number of points in the data set.
  • Example height.

24
Calculating the Mean
  • Mean sum of scores/ number of observations
  • X ?xi/n average
  • Example 2, 4, 6, 8,
  • X 20/45

25
Mean is an estimator of the population center
  • The goodness of the inference depends on
  • the size of the sample
  • the spread of the data

26
Center and Spread
  • Insert picture p. 41

27
Median
  • When the data is arranged from the largest to the
    smallest number,
  • the median is the number in the middle if the
    population size is odd, and
  • the median is the number in the middle once the
    smallest number is eliminated if the population
    size is even.

28
Location of Media
  • Insert picture p. 43
  • the median cuts the data in two 50 below and 50
    above.

29
Mode
  • Is the most frequent data point in the data set.
  • Thus, has the largest relative frequency.

30
Symbols
  • X sample mean
  • µ population mean
  • M median
  • n population size

31
Comparing central tendency measures mean, and
medianinsert diagram p. 45
32
Numerical Measures of Variability
  • Range is the largest point in the data set minus
    the smallest.
  • Sample variance sum of squared distance form the
    mean divided by n-1.
  • Sample standard deviation is the positive square
    root of the sample variance.

33
Why range not adequate?
34
(No Transcript)
35
Symbols
  • S2 sample variance
  • s sample standard deviation
  • sigma square (?2) population variance
  • sigma (?) population standard deviation

36
Interpreting the Standard Deviation
  • The meaning of the mean is very much dependent on
    the size of the standard deviation.
  • It provides information about the spread or the
    homogeneity of the sample.

37
Calculating Variance and SD
  • Example variance 2, 4, 6, 8,
  • Mean 20/45
  • Distance from mean 2-5-3
  • 4-5-1 6-51 8-53
  • Add all up 0
  • What is the problem?
  • Mean is in the middle!

38
Solution
  • Square the differences!
  • Distance from mean 2-5-3
  • 4-5-1 6-51 8-53
  • (-3 )2 (-1) 2 (1) 2 (3) 2 9119 20
  • Variance is 20.
  • SD 20/n-120/36.66

39
Two hypothetical data sets
  • Sample 1 1,2,3,4,5
  • Sample 2 2,3,3,3,4

40
Solution
3
1
5
  • X13
  • X23
  • -2-1012 square 4101410
  • -10001 square 10001 2

41
SD
  • Take the square root of the variance
  • Sample 1 SD3.16
  • Sample 2 SD1.41

42
Numerical Measures of Relative Standing
  • These are a series of descriptive measures of the
    relationship of a measurement to the rest of the
    data.
  • For example the pth percentile indicates where
    the rest of the points are located. P of the
    measures fall below the pth percentile and
    (100-p) fall above.

43
  • Insert figure 2.23

44
(No Transcript)
45
Z-scores as numerical measures of relative
standing
  • Z-distribution indicates where the measure stands
    in relation to the other measures.
  • The sample z score is calculated by zx - x / s.

46
Interpretation of z-scores for bell-shaped
variables
  • 1. Approximately 68 of the measurements will
    have z-scores between -1 and 1SD.
  • 2. Approximately 68 of the measurements will
    have z-scores between -2 and 2SD.
  • 3. Approximately 99.7 of the measurements will
    have z-scores between -3 and 3 SD.

47
  • Picture 2.25

48
Probability
49
Probability
  • Is the basis for inferential statistics
  • Take dices and roll them 10 times
  • What are the possible outcomes?
  • There are 6 possible sample points 1, 2, 3, 4,
    5, and 6
  • Each result is called an observation
  • In this case 10 observations

50
Coin Example
  • Take a coin and toss it head or tale?
  • If you do this with 2 coins, what are the
    possible outcomes or sample points
  • Process of observing is called experiment

51
All Possible Sample Points for 2 Coins
  • 1. Observe HH
  • 2. Observe TT
  • 3. Observe TH
  • 4. Observe HT
  • All possible sample points are referred to as
    sample space.

52
Key Terms
  • Experiment is an act of observation that leads to
    a single outcome that cannot be predicted with
    certainty.
  • Sample point is the most basic outcome of an
    experiment.
  • Sample space is the collection of all its sample
    points.
  • Event is a specific collection of sample points.

53
Experiments and Their Sample Spaces
  • Experiment Observe the up face on a coin.
  • Sample Space
  • 1. Observe a head.
  • 2. Observe a tale.
  • S H, T

54
  • Experiment Observe the up face on a die.
  • Sample Space
  • 1. Observe a 1
  • 2. Observe a 2
  • 3. Observe a 3
  • 4. Observe a 4
  • 5. Observe a 5
  • 6. Observe a 6
  • S 1, 2, 3, 4, 5, and 6

55
  • Experiment Observe the up faces on two coins
  • Sample Space
  • 1. Observe HH
  • 2. Observe HT
  • 3. Observe TH
  • 4. Observe TT
  • S HH, HT, TH, TT

56
Venn Diagrams
H T
1 2 3 4 5 6
HH HT TH TT
57
Probability Rules for Sample Points
  • 1. All sample point probabilities must lie
    between 0 and 1
  • 2. The probabilities of all the sample points
    within a sample space must sum to 1.

58
Probability of an Event
  • The probability of an event A is calculated by
    summing the probabilities of the sample points in
    the sample space for A.

59
Steps for Calculating Probabilities of Events
  • 1. Define the experiment , that is, describe the
    process used to make an observation and the type
    of observation that will be recorded.
  • 2. List the sample points.
  • 3. Assign probabilities to the sample points.
  • 4. Determine the collection of sample points
    contained in the event of interest.
  • 5. Sum the sample point probabilities to get the
    event probability.

60
Lecture on Sampling
  • Statistics Course
  • Faculty of Information Studies

61
Inferential statistics
  • Hypothesis testing set a null hypothesis and an
    alternative hypothesis
  • Parameter estimation determine the magnitude of
    a characteristic in a population

62
Two central topics are
  • 1. Random sampling
  • 2. Probability

63
Relationship population and sample
  • The sample is a part of the population
  • The sample is taken from the population
  • Why is it that we do not use the population?
  • Why would it be more advantageous to use the
    population?
  • What is then the relationship between sample and
    population

64
Representative sample
  • This is a key issue in guaranteeing that we can
    make inferences from our sample to the population
    of interest
  • Why important concept?
  • Can you imagine what happens if a sample is
    non-representative?
  • What other concepts are relevant in
    experimentation?

65
Validity-Generalizability
  • These are two central concepts in methodology
    (mainly in testing)
  • Definition of validity to measure what we intend
    to measure
  • Definition of generalizability to be able to
    generalize the results to the population of
    interest to us
  • Why relevant concepts?

66
Random Sample
  • Definition A random sample is defined as a
    sample selected from the population by a process
    that assures the following
  • 1) each possible sample of a given size has an
    equal chance of being selected and
  • 2) all the members of the population have an
    equal chance of being selected into the sample

67
Sampling with replacement
  • Take members from population to include them in
    the sample and then put them back in the
    population, so that they can be drawn again
  • Why important?
  • If not, then sampling without replacement (often
    done in practice) why more often used?

68
Why important (random sampling)?
  • To be able to generalize from a sample to a
    population
  • The sample has to be representative of the
    population
  • Potential problem biased
    sample!!!

69
Biased sample
  • A sample that is not representative of the
    population we intend to study
  • Example student population selected on the
    basis of email
  • why not representative social aspect of
    technology

70
Techniques for random sampling
  • 1) Computer programs e.g. excel
  • 2) Table of random numbers (good method)
  • 3) Intuition
  • 4) Stratification

71
Tricks for evaluating papers
  • Look at article in critical manner
  • Look at the methodology section
  • Focus on sample-population relationship
  • Focus on how sample was selected
  • Focus on generalizability
  • Focus on validity
  • Focus on tests used

72
  • Focus on type of statistical analysis
  • Focus on level of significance
  • Focus on rationale
  • Think about relationship results and
    interpretation
  • What is your overall feeling?
Write a Comment
User Comments (0)
About PowerShow.com