Title: Statistics Class 2
1Statistics Class 2
- Descriptive Statistics
- Central Tendency, Variability, and Standard
Deviation - Probability and Sampling
- Frequency Distributions
-
2Where we have been?
- Looked at definition of statistics
- Looked at key terms
- The role of statistical thinking in science and
society at large - Distinction inferential and descriptive
statistics - Distinction quantitative and qualitative data
- Reliability
3Where we are going?
- Talk about descriptive statistics
- Graphs
- Numerical methods
- Aim of descriptive statistics
4Describing Qualitative Data
- Definition of qualitative data
- Types of qualitative data
- nominal (yes/no 0, 1)
- ordinal (never, rarely, once a month, once a
week, daily) ordered or ranked data
5Key Terms
- Class is one of several categories in which
observations can be classified - Class frequency is the number of observations in
a particular class - Class relative frequency is the class frequency
in relation to all observations ie divided by
the total number of observations
6Aphasia Example
- Consider a study of aphasia published in the
Journal of Communication Disorders (Mar. 1995).
Aphasia is the "impariment of loss of the faculty
of using or understaind spoken or written
languages." Three types of aphasia have been
identified by researchers Broca's, conduction,
and anomic. They wanted to determine whether one
type of aphasia occurs more often than nay other,
an, if so, how often. Consequently, they measured
apahsia types for a sample of 22 adult
aphasiac's. Table 2.1 gives the type of aphasia
diagnosed for each aphasiac in the sample.
7Summary Table for Data on 22 Aphasiacs
8Calculating Relative Frequencies
Type Aphasia Relative Frequency Cumulative
Frequency Brocas 5 5/22 .227 .227 Conduct
ion 7 7/22 .318 .645 Anomic 10 10/22
.455 1.00 Totals 22 1.00 1.00
9Example Problem Dimensions
- Information systems design can progress toward
meting the needs of the population of
decision-makers, managers, policy-makers, and
interdisciplinary workers by attention to
specifications obtained from the users
situation. The user situation is represented by
problems and their dimensions. Problem dimensions
are discussed to propose a new orientation for
the design of information systems. (MacMullin
Taylor, The Information Society, 3, 91-111).
10Bar Graph
- Visual depiction of the relative frequency of a
variable. Shows how many times each case appears
in the sample. - Can also depict accumulative frequency or
absolute frequency.
11(No Transcript)
12(No Transcript)
13Figure 1. Types of Aphasia in 22 Adult Aphasiacs.
Source Journal of Communication Disorders.
14(No Transcript)
15Graphical Methods for Describing Quantitative Data
- Dot Plots
- Stem-and-Leaf Display
- Histograms
16 17Dot Plots
- In a dot plot the numerical value of each case is
located on the horizontal axis. - When data values repeat, dots are placed one on
top of the other. - Example
18Stem-and-Leaf
- A very easy and simple way of depicting data. The
stem represents the full number, whereas the leaf
represents the decimal points. - Example
19Relative Frequency Histogram
- A relative frequency histogram has a vertical and
a horizontal axes. - The vertical axis indicates the proportion or
relative frequency of the data. - The horizontal axis represents the possible
numerical values appearing in the data. - Example.
20Measurement Classes
- The horizontal axis is divided into intervals
called measurement classes. - The intervals are of equal size.
- To each interval a frequency and relative
frequency can be assigned on the vertical axis.
21Types of Histograms
- Absolute frequency
- Relative frequency
- Cumulative frequency
- Percentage
- Cumulative percentage
- Absolute frequency including cumulative percentage
22Numerical Measures of Central Tendency
- Central tendency is the numerical indictor of
where the data tends to cluster. - Variability indicates what the spread of the data
is. - Example height (general).
23Mean
- The mean is the most commonly used measure of
central tendency. - It is the sum of all the data points divided by
the number of points in the data set. - Example height.
24Calculating the Mean
- Mean sum of scores/ number of observations
- X ?xi/n average
- Example 2, 4, 6, 8,
- X 20/45
25Mean is an estimator of the population center
- The goodness of the inference depends on
- the size of the sample
- the spread of the data
26Center and Spread
27Median
- When the data is arranged from the largest to the
smallest number, - the median is the number in the middle if the
population size is odd, and - the median is the number in the middle once the
smallest number is eliminated if the population
size is even.
28Location of Media
- Insert picture p. 43
- the median cuts the data in two 50 below and 50
above.
29Mode
- Is the most frequent data point in the data set.
- Thus, has the largest relative frequency.
30Symbols
- X sample mean
- µ population mean
- M median
- n population size
31Comparing central tendency measures mean, and
medianinsert diagram p. 45
32Numerical Measures of Variability
- Range is the largest point in the data set minus
the smallest. - Sample variance sum of squared distance form the
mean divided by n-1. - Sample standard deviation is the positive square
root of the sample variance.
33Why range not adequate?
34(No Transcript)
35Symbols
- S2 sample variance
- s sample standard deviation
- sigma square (?2) population variance
- sigma (?) population standard deviation
36Interpreting the Standard Deviation
- The meaning of the mean is very much dependent on
the size of the standard deviation. - It provides information about the spread or the
homogeneity of the sample.
37Calculating Variance and SD
- Example variance 2, 4, 6, 8,
- Mean 20/45
- Distance from mean 2-5-3
- 4-5-1 6-51 8-53
- Add all up 0
- What is the problem?
- Mean is in the middle!
38Solution
- Square the differences!
- Distance from mean 2-5-3
- 4-5-1 6-51 8-53
- (-3 )2 (-1) 2 (1) 2 (3) 2 9119 20
- Variance is 20.
- SD 20/n-120/36.66
39Two hypothetical data sets
- Sample 1 1,2,3,4,5
- Sample 2 2,3,3,3,4
40Solution
3
1
5
- X13
- X23
- -2-1012 square 4101410
- -10001 square 10001 2
41SD
- Take the square root of the variance
- Sample 1 SD3.16
- Sample 2 SD1.41
42Numerical Measures of Relative Standing
- These are a series of descriptive measures of the
relationship of a measurement to the rest of the
data. - For example the pth percentile indicates where
the rest of the points are located. P of the
measures fall below the pth percentile and
(100-p) fall above.
43 44(No Transcript)
45Z-scores as numerical measures of relative
standing
- Z-distribution indicates where the measure stands
in relation to the other measures. - The sample z score is calculated by zx - x / s.
46Interpretation of z-scores for bell-shaped
variables
- 1. Approximately 68 of the measurements will
have z-scores between -1 and 1SD. - 2. Approximately 68 of the measurements will
have z-scores between -2 and 2SD. - 3. Approximately 99.7 of the measurements will
have z-scores between -3 and 3 SD.
47 48Probability
49Probability
- Is the basis for inferential statistics
- Take dices and roll them 10 times
- What are the possible outcomes?
- There are 6 possible sample points 1, 2, 3, 4,
5, and 6 - Each result is called an observation
- In this case 10 observations
50Coin Example
- Take a coin and toss it head or tale?
- If you do this with 2 coins, what are the
possible outcomes or sample points - Process of observing is called experiment
51All Possible Sample Points for 2 Coins
- 1. Observe HH
- 2. Observe TT
- 3. Observe TH
- 4. Observe HT
- All possible sample points are referred to as
sample space.
52Key Terms
- Experiment is an act of observation that leads to
a single outcome that cannot be predicted with
certainty. - Sample point is the most basic outcome of an
experiment. - Sample space is the collection of all its sample
points. - Event is a specific collection of sample points.
53Experiments and Their Sample Spaces
- Experiment Observe the up face on a coin.
- Sample Space
- 1. Observe a head.
- 2. Observe a tale.
- S H, T
54- Experiment Observe the up face on a die.
- Sample Space
- 1. Observe a 1
- 2. Observe a 2
- 3. Observe a 3
- 4. Observe a 4
- 5. Observe a 5
- 6. Observe a 6
- S 1, 2, 3, 4, 5, and 6
55- Experiment Observe the up faces on two coins
- Sample Space
- 1. Observe HH
- 2. Observe HT
- 3. Observe TH
- 4. Observe TT
- S HH, HT, TH, TT
56Venn Diagrams
H T
1 2 3 4 5 6
HH HT TH TT
57Probability Rules for Sample Points
- 1. All sample point probabilities must lie
between 0 and 1 - 2. The probabilities of all the sample points
within a sample space must sum to 1.
58Probability of an Event
- The probability of an event A is calculated by
summing the probabilities of the sample points in
the sample space for A.
59Steps for Calculating Probabilities of Events
- 1. Define the experiment , that is, describe the
process used to make an observation and the type
of observation that will be recorded. - 2. List the sample points.
- 3. Assign probabilities to the sample points.
- 4. Determine the collection of sample points
contained in the event of interest. - 5. Sum the sample point probabilities to get the
event probability.
60Lecture on Sampling
- Statistics Course
- Faculty of Information Studies
61Inferential statistics
- Hypothesis testing set a null hypothesis and an
alternative hypothesis - Parameter estimation determine the magnitude of
a characteristic in a population
62Two central topics are
- 1. Random sampling
- 2. Probability
63Relationship population and sample
- The sample is a part of the population
- The sample is taken from the population
- Why is it that we do not use the population?
- Why would it be more advantageous to use the
population? - What is then the relationship between sample and
population
64Representative sample
- This is a key issue in guaranteeing that we can
make inferences from our sample to the population
of interest - Why important concept?
- Can you imagine what happens if a sample is
non-representative? - What other concepts are relevant in
experimentation?
65Validity-Generalizability
- These are two central concepts in methodology
(mainly in testing) - Definition of validity to measure what we intend
to measure - Definition of generalizability to be able to
generalize the results to the population of
interest to us - Why relevant concepts?
66Random Sample
- Definition A random sample is defined as a
sample selected from the population by a process
that assures the following - 1) each possible sample of a given size has an
equal chance of being selected and - 2) all the members of the population have an
equal chance of being selected into the sample
67Sampling with replacement
- Take members from population to include them in
the sample and then put them back in the
population, so that they can be drawn again - Why important?
- If not, then sampling without replacement (often
done in practice) why more often used?
68Why important (random sampling)?
- To be able to generalize from a sample to a
population - The sample has to be representative of the
population - Potential problem biased
sample!!!
69Biased sample
- A sample that is not representative of the
population we intend to study - Example student population selected on the
basis of email - why not representative social aspect of
technology
70Techniques for random sampling
- 1) Computer programs e.g. excel
- 2) Table of random numbers (good method)
- 3) Intuition
- 4) Stratification
71Tricks for evaluating papers
- Look at article in critical manner
- Look at the methodology section
- Focus on sample-population relationship
- Focus on how sample was selected
- Focus on generalizability
- Focus on validity
- Focus on tests used
72- Focus on type of statistical analysis
- Focus on level of significance
- Focus on rationale
- Think about relationship results and
interpretation - What is your overall feeling?