Title: Math Review
1Math Review
2First.
3Blow ot light bulbs. Psychokinesis ?
- A TV host turns to the main camera and with a
serious, coaxing air looks the viewer straight in
the eye and says - Go ahead! Turn on 5 or 6 lights around you and
see what happens - Then he turns to the medium and says
- Do you really think you can do it ?
- After hesitating a few moments, the medium
replies - I hope I have enough concentration this evening,
but the conditions are not ideal. To produce
long-distance phenomena like this I usually spend
a few days in complete and utter solitude, after
rigorous fasting. (if she fails the public will
blame the circumstances, not her abilities ) - The medium does not fail ! Light bulbs do blow
out in the homes of viewers of this program, and
over 1000 viewers call the TV station to testify
! - The medium has successfully focused her spiritual
power on the material world and blown out light
balls far away! - Amazing, right ?
4The Man
5Lets examine this a little more closely
- Suppose 1 million people were watching the show
- ? 5 or 6 million light bulbs were on for an hour
or more - Assume , considering economics, 2 million light
bulbs were on for 1 hour - On average a light bulb lasts 1,000 hours
- Among the light bulb installed at random by
viewers, there is no reason to think that they
tend to be very old or very new - Among the 2 million, there are
- 2000 with 1 hour of life used, 2000 with 2 hours
of life used, 2000 with 3 hours 2000 with 999
hours of life used, 2000 with 1000 hours of life
used - Thus, during a 1-hour show , those last 2000
bulbs will reach the end of their life span and
burn out
6What, Youre a Scorpio too ?
- Thats amazing !! WOW
- Whats the probability that at least 2 people in
any party have the same birthday? (month and day)
7The Birthday Problem
- What is the probability that at least two people
in this class share the same birthday?
8Assumptions
- Only 365 days each year.
- Birthdays are evenly distributed throughout the
year, so that each day of the year has an equal
chance of being someones birthday.
9Take group of 5 people.
Let A event no one in group shares same
birthday. Then AC event at least 2 people share
same birthday. P(A) 364/365 363/365
362/365 361/365 360/365 0.973 P(AC) 1
- 0.973 0.027 That is, about a 3 chance that
in a group of 5 people at least two people share
the same birthday.
10Take group of 23 people.
Let A event no one in group shares same
birthday. Then AC event at least 2 people share
same birthday. P(A) 364/365 363/365
342/365 0.493 P(AC) 1 - 0.493 0.507 That
is, about a 50 chance that in a group of 23
people at least two people share the same
birthday.
11Take group of 50 people.
Let A event no one in group shares same
birthday. Then AC event at least 2 people share
same birthday. P(A) 364/365 363/365
315/365 0.03 P(AC) 1 - 0.03 0.97 That is,
virtually certain that in a group of 50 people
at least two people share the same birthday.
12Premonition?
- Youre peacefully lying in your bed.
- It is 604 in the morning, and youre hardly
awakened when youre struck by the thought of
your cousin, whom you havent seen for years and
whom you havent thought of for a long time
either. - Now at 608 the damn phone rings and you pick it
up, only to hear the sad news - Your cousin has died !
- Here is the long-awaited proof that premonition
is for real !!
13Premonition--Debunked
- Put the question like this
- What is the probability that , having thought
about a person, we will somehow learn in the next
5 minutes, purely by coincidence and without any
paranormal influence, that that person has died
? - We need to know 2 things
-
14Consider
- 1. The number of people whose death comes to our
attention during say 1 year. - 2. The number of times one thinks of these people
during the same period. - 1. Assumption you know 10 people whose death you
learn over a 1 year period - 2. Assumption you think of each of those people
a single time over the 1 year period.
15Consider
- 1 particular person among the 10.
- 1 year has 105,120 five min. intervals
- The chance that well be informed of his death
during that 5 min. interval is - 1 in 105,120 (small!)
- What about the other 9 people ?
- For each of them the probability of
- having the thought then learning of their death
is 1 in 105,120 - Addition rule
- P(having the thought then learning of their
death of any of the 10) 1/10,512 (still
small)
16Hate to tell youbut
- There is nothing unique about you in this respect
- There are about 250,000,000 people in the US
- So the thought - notification connection must
occur each year to about - 1/10,512 x 250,000,000 23,782 people
- So, by chance alone there are 65 cases like this
each day in the US !
17Statistics
?
Probability
Science of data
Science of chance, uncertainties
collecting, processing, presentation,
analyzing interpretation of data
what is possible , what is probable
numbers with context
mathematical formulas
18Statistics
- Data Collection
- Summarizing Data
- Interpreting Data
- Drawing Conclusions from Data
19Data Categories
Data
Quantitative (numerical)
Qualitative (categorical)
20Qualitative Data
- Ideas
- Opinions
- Categorical Evaluation
- Examples
- Color Preference
- Favored Political Candidate
- Quality Evaluation - Defective of non-defective
21Quantitative Data
Annual Income Football Attendance Interest
Rates Dow Jones Industrials Average Number of
Defective Parts in a Shipment Number of Late
Deliveries Last Month Percentage of Satisfied
Customers
Discrete
Continuous
22Data Collection
- Designing experiments
- Does adjusting the oxygen-fuel ratio in an
automotive fuel injection system improve emission
quality? - Observational studies
- Polls - Bushs (dis) approval rating
23Time for some definitions
24Population
- The set of data (numerical or otherwise)
corresponding to the entire collection of units
about which information is sought
25Population Examples
- Air QualityValues from all sampling devices in
the country - Unemployment - Status of ALL employable people
(employed, unemployed) in the U.S. - SAT Scores - Math SAT scores of EVERY person that
took the SAT during 2002 - Responses of ALL currently enrolled underage
college students as to whether they have consumed
alcohol in the last 24 hours
26Population Examples cont.
- Again Population Defined
- The Collection of All Items of Interest
(Universe) - All People Living in Georgia
- All HP Laser-jet Printers Sold in 2001
- All Accounts Receivable Balances
- All Homeowners in Atlanta
- over 35 years old
- employed
- married
- 2 or more children
27Sample
- A subset of the population data that are actually
collected in the course of a study. -
28Sample Examples
- Air QualityValues from samples at Midwestern
urban sites during July - Unemployment - Status of the 1000 employable
people interviewed. - SAT Scores - Math SAT scores of 20 people that
took the SAT during 2002 - Responses of 538 currently enrolled underage
college students as to whether they have consumed
alcohol in the last 24 hours
29Population vs. Sample
Population
Sample
30Samples
- Again Sample Defined
- A Subset of a population.
- A Representative Sample
- Has the characteristics of the population
- Census - A Sample that Contains all Items in the
Population
31WHO CARES?
- In most studies, it is difficult to obtain
information from the entire population. We rely
on samples to make estimates or inferences
related to the population.
32Types of Statistical Analysis
- Descriptive Statistics
- Graphical Tools
- Numerical Measures
- Inferential Statistics
- Populations
- Samples
- Probability
- Linking Descriptive and Inferential Statistics
33Statistical Inference
Drawing Conclusions (Inferences) about a
Population Based on an examination of a Sample
taken from the population
34Statistical Inference Examples
- Nielson TV Ratings
- Gallup and Harris Polls
- Market Research
- Financial Auditing
- Opinion Surveys
35Review of Descriptive Stats.
- Descriptive Statistics are used to present
quantitative descriptions in a manageable form. - This method works by reducing lots of data into a
simpler summary. - Example
- Batting Average in baseball
- Creightons Grade Point System
36Univariate Analysis
- This is the examination across cases of one
variable at a time. - Frequency distributions are used to group data.
- One may set up margins that allow us to group
cases into categories. - Examples include
- age categories
- price categories
- temperature categories
- concentration categories
37Distributions
- Two ways to describe a univariate distribution
- a table
- a graph (histogram, bar chart)
38Distributions (cont)
- Distributions may also be displayed using
percentages. - For example One could use percentages to
describe the - percentage of people under the poverty level
- over a certain age
- over a certain score on a standardized test
- days with a AQI 100
39Distributions (cont.)
A Frequency Distribution Table
Category Percent Under 35 9 36-45 21 46-55 45 56-
65 19 66 6
40Distributions (cont.)
A Histogram
41Central Tendency
- An estimate of the center of a distribution
- Three different types of estimates
- Mean
- Median
- Mode
42Mean
- The most commonly used method of describing
central tendency. - One basically totals all the results and then
divides by the number of units or n of the
sample. - Example The ATS-542 Homework mean was determined
by the sum of all the scores divided by the
number of students turning in the HW.
43Working Example (mean)
- Lets take the set of scores 15,20,21,20,36,15,
25,15 - The Mean would be 167/820.875
44Median
- The median is the score found at the exact middle
of the set. - One must list all scores in numerical order, and
then locate the score in the center of the
sample. - Example if there are 500 scores in the list,
score 250 would be the median. - This is useful in weeding out outliers.
45Working Example (median)
- Lets take the set of scores 15,20,21,20,36,15,
25,15 - First line up the scores.
- 15,15,15,20,20,21,25,36
- The middle score falls at 20. There are 8 scores
and score 4 and 5 represent the halfway point.
46Mode
- The mode is the most repeated score in the set of
results. - Lets take the set of scores 15,20,21,20,36,15,
25,15 - Again we first line up the scores
- 15,15,15,20,20,21,25,36
- 15 is the most repeated score and is therefore
labeled the mode.
47Central Tendency
- If the distribution is normal (i.e.,
bell-shaped), the mean, median and mode are all
equal - In our analyses, well use the mean
48Dispersion
- Two estimates
- Range
- Standard Deviation
- Standard Deviation is more accurate/detailed,
because an outlier can greatly extend the range
49Range
- The range is used to identify the highest and
lowest scores. - Lets take the set of scores 15,20,21,20,36,15,
25,15 - The range would be 15-36. This identifies the
fact that 21 points separates the highest to the
lowest score.
50Standard Deviation
- The Standard Deviation is a value that shows the
relation that individual scores have to the mean
of the sample. - If scores are said to be standardized to a normal
curve then there are several statistical
manipulations that can be performed to analyze
the data set.
51Standard Dev. (cont)
- Assumptions may be made about the percentage of
scores as they deviate from the mean. - If scores are normally distributed, then one can
assume that approximately 69 of the scores in
the sample fall within one standard deviation of
the mean. Approximately 95 of the scores would
then fall within two standard deviations of the
mean.
52Standard Dev. (cont)
- The standard deviation calculates the square root
of the sum of the squared deviations from the
mean of all the scores divided by the number of
scores. - This process accounts for both positive and
negative deviations from the mean.
53Working Example (stand. dev.)
- Lets take the set of scores 15,20,21,20,36,15,
25,15 - The mean of this sample was found to be 20.875.
Round up to 21. - Again we first line up the scores
- 15,15,15,20,20,21,25,36.
- 21-156, 21-156, 21-156,21-201,21-201,
21-210, 21-25-4, 21-36-15
54Working Ex. (Stan. dev. cont)
- Square these values.
- 36,36,36,1,1,0,16,225
- Total these values. 351.
- Divide 351 by 8 43.8
- Take the square root of 43.8 6.62
- 6.62 is your Standard Deviation.
55Describing Data Graphically
56Tools for Describing Data
- Graphical Tools
- Pie Charts
- Bar Charts
- Histograms
- Stem and Leaf Diagrams
- Trend Charts
- Many Variations of the above......
57Analyzing Quantitative DataOn-Time Delivery
Example
- Variable x Number of days Delivery is Late
- (Each data point represents one shipment.)
- Raw Data
- 0 2 3 4 1 0 0 1
- 3 0 3 1 1 0 0 0
- 2 2 0 0 0 1 2 0
- 4 1 0 1 0 0 0 1
- 1 0 0 0 0 1 3 1
- N 40 shipments
58Organizing the DataStep 1
Form a Data Array Sort the data in numerical
order
Raw Data 0 2 3 4 1 0 0 1 3 0 3 1 1 0 0 0 2 2
0 0 0 1 2 0 4 1 0 1 0 0 0 1 1 0 0 0 0 1 3 1
Data Array
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
1 1 1 1 1 2 2 2 2 3 3 3 3 4 4
Low
High
59Organizing the DataStep 2Construct a Frequency
Distribution
- Ungrouped Frequency Distribution
- When the variable has only a few different values
- Number of data values may be high or low
- Grouped Data Frequency Distribution
- When the variable has more than a few different
values - Number of data values is high
60Frequency Distribution
A table that divides the data into classes and
shows the number of observed values that fall
into each class.
61Frequency DistributionOn-Time Delivery Example
Use ungrouped Frequency Distribution since the
variable takes on only a few different values.
Data Array
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
1 1 1 1 1 2 2 2 2 3 3 3 3 4 4
Low
High
x Frequency
0 19 1 11 2 4 3 4 4 2
Frequency Distribution
N 40 values
62Forming a HistogramOn-Time Delivery Example
25 20 15 10 0
Frequency
x
0 1 2 3 4
Days Late
63Relative Frequency DistributionOn-Time Delivery
Example
x Frequency Relative Frequency
0 19 19/40 .475 1 11
11/40 .275 2 4
4/40 .100 3 4
4/40 .100 4 2
2/40 .050
40
1.000
Relative Frequency Distributions are useful for
comparing two or more data sets which have
different volumes of data.
64Relative Frequency HistogramOn-Time Delivery
Example
.675 .50 .375 .25 0
Relative Frequency
x
0 1 2 3 4
Days Late
65Cumulative Frequency DistributionOn-Time
Delivery Example
X F CF
Cumulative Frequency Histogram
0 19 19 1 11 30 2 4
34 3 4 38 4 2 40
40
66Type of Frequency Distributions
- Ungrouped Frequency Distribution
- When the variable has only a few different values
- Number of data values may be high or low
- Grouped Data Frequency Distribution
- When the variable has more than a few different
values - Number of data values is high
67Concentrations PPM
Raw Data
68Grouped Data Frequency Distribution
- Class - data category
- Frequency - number of items in each class
- Class limits - boundaries of each class
- Class interval - width of each class
- difference between lower limits of a class and
the preceding class lower limit - Class Mark - midpoint of the class
69Guidelines for Grouped Frequency Distributions
- Mutually Exclusive Classes - no overlap
- All inclusive - a place for each data point
- Equal width classes (if possible)
- 5-12 classes (rule of thumb)
- Try to have round numbered class limits
- Avoid open-ended classes
70Develop a Grouped Data Frequency Distribution -
Form a Data Array
Low
Sorted
High
71Forming the Class Limits
Class Interval High Value - Low Value
number of
classes
Try 6 classes
Class Interval 74.95 - 0.97
12.33
6
round to nicer interval -- 12.50
72Class Limits
Classes
All Inclusive Mutually Exclusive Equal
Width No Open-Ended Classes
0.00 and under 12.50 12.50 and under
25.00 25.00 and under 37.50 37.50 and under
50.00 50.00 and under 62.50 62.50 and under 75.00
73Frequency DistributionConcentrations
Classes Frequency
0.00 and under 12.50
38 12.50 and under 25.00
14 25.00 and under 37.50
4 37.50 and under 50.00
2 50.00 and under 62.50
1 62.50 and under 75.00 5
64
74Class Mark(Midpoint)
Midpoint lower limit .50 (Class
Interval) For first class Midpoint 0.00
.50(12.50)
6.25
75Frequency Distribution With Midpoints
Classes Frequency
Midpoint
0.00 and under 12.50 38
6.25 12.50 and under 25.00 14
18.75 25.00 and under 37.50
4 31.25 37.50 and under
50.00 2 43.75 50.00
and under 62.50 1
56.25 62.50 and under 75.00 5
68.75
64
76Frequency Polygon
Frequency
Concentrations PPM
77Cumulative Frequency DistributionConcentrations
Example
Classes Frequency
Midpoint Cumulative Freq.
0.00 and under 12.50 38
6.25 38 12.50 and
under 25.00 14 18.75
52 25.00 and under 37.50
4 31.25
56 37.50 and under 50.00
2 43.75
58 50.00 and under 62.50 1
56.25 59 62.50
and under 75.00 5
68.75 64
64
78Histogram
Concentrations
79Histograms
- A Graphical Summary of Variation in a Set of Data
- Key Concepts
- Generated data will show variation because of
many factors - process equipment, materials, people,
environment, etc. - The variation will display a pattern
(distribution) - Patterns are hard to see in data tables
- Histograms make it easier to see patterns
80Cable TV Amplification Example
- Amplifiers made to boost cable TV signals (Gain)
- Complaints about weak signals in outlying areas
- Amplifiers are the prime suspect
- Specifications
- nominal (average) gain is 10 units
- Amplifiers to provide between 7.75 and 12.25
units gain. - Tests conducted on 120 amplifiers
81Amplifier Data Arrayn120already sorted
Low
High
82First Pass Conclusion
Specifications Gain 7.75 ------ 12.25
Since all 120 amplifiers tested fall between 7.8
and 11.7 the problem cant be the
amplifiers. They all meet specifications!
83The Frequency DistributionAmplifier Test Data
Class Frequency
Relative Frequency 7.75 - 8.25 24 .20 8.26 -
8.75 28 .23 8.76 - 9.25 26 .22 9.26 -
9.75 19 .16 9.76 - 10.25 12 .10 10.26 -
10.75 7 .06 10.76 - 11.25 2 .02 11.26 -
11.75 2 .02
120
84Frequency Histogram
30 25 20 15 10 5 0
Nominal Specification 10.0 gain
Frequency
7.75 8.25 8.75 9.25
9.75 10.25 10.75 11.25
11.75
Gain
85Amplifier ExampleNew Conclusions
- Distribution of gains is not evenly spread around
the nominal target - All amplifiers do operate within specifications
- Most amplifiers provide gains below nominal
target of 10 units 85 percent - There is a wide variation in performance of
individual amplifiers in the test - By random assignment it would be possible to get
a series of below target amplifiers, thus
generating a weak signal - The company needs to focus on why the amplifiers
are not spread more evenly around the target of 10
86Other Graphical Tools
- Bar Charts
- Pie Charts
- Trend Charts
- Quality Control Charts
- Stem and Leaf Diagrams
- Dot Plots
- Others
87Bar Charts
A graphical tool used to represent qualitative
data. Typically used when the available data
are in a summary form already.
88Bar Chart ExampleForecasted Total Returns
Percent Return
89(No Transcript)
90(No Transcript)
91(No Transcript)
92Pie Charts
T o t a l F e d e r a l F u n d s ( O u t
l a y s ) 1, 4 3 8 B i l l i o n
93(No Transcript)
94Line Chart (Trend Chart)
95(No Transcript)
96Line Charts(Figure 2-25)
Profit and sales going in opposite directions
97Scatter DiagramsDependent and Independent
Variables
- A dependent variable is one whose values are
thought to be a function of the values of another
variable. (y-axis) - An independent variable is one whose values are
thought to impact the values of the dependent
variable. (x-axis)
98Scatter Plot Example
99Scatter Plot Example
100Other Data Displays