Title: Module 1 Introduction and Review Communicating Data
1Module 1Introduction and ReviewCommunicating
Data
2IN CLASS
Exams after Module 2, 4 and Final. Final will
count for 25 of grade.
Team will log homework and evaluate individual
participation
Individual then team test
All team members must be able to complete work on
Excel
Group work on problems
Exam
Group work on project and cases
Readiness Assurance Test
Readings with guide/videos/Homework
Homework
Research and homework
Review
INDIVIDUAL WORK
3Structure of Course
- Goal is not just to calculate, but to know how to
research, present, interpret, analyze, and
evaluate. - Team based learning
- There is a lot of research to show that team
based learning is a better way for you to learn
just about anything - You will work in a team for the entire quarter
and you will be responsible for your own learning
and the learning of your teammates - Taking responsibility for your learning is the
best way to accelerate how you learn
4Team based learning
- Teams will be selected by the instructor.
- We will have 5 modules in this class
- Your grade will be based on your individual
Readiness Assurance Test, your team RAT, team
problem solving and casework, team project, 3
individual exams. The class will determine how to
allocate 50 of the grade. - The Team Project requires that you apply what you
learn to real problems or issues. This requires
that you apply research, analysis and evaluation
skills that will be required in your professions.
5RATs and Exams
- RATs are open textbook only.
- Individual exams are open textbook plus two pages
of notes which may NOT include the practice
exams. (The reading guide provides a nice summary
for notes.) - You may use a calculator but not a laptop.
6Success Factors Exam Preparation
- Attend all classes and participate fully in the
team process - Use the reading guide to read textbook sections,
view video lectures and do homework before and
during Module work - Help your team members learn. You learn the most
when you teach others the concepts. - Do all supplemental problems at the end of the
chapter in review for the test - Do practice exams without consulting key
7Topics Covered
- Module 1 Communicating data
- Module 2 Describing populations
- Module 3 Estimation
- Module 4 Hypothesis testing Is it different?
- Module 5 Regression and Chi Square
8What does America look like?
9Chapter 1 - What Are Statistics?
- There are lies--
- There are damn lies--
- And there are statistics
10Chapter 1 Objectives
- Define statistics
- Descriptive
- Inferential
- Determine where statistics can be obtained
- Define variable, sample, random sample and
population - Know data (type and scale)
Love your data
11What Are Statistics?
- What is the gender breakdown at BC?
- What is the average age of a student at BC?
- What is the ethnic breakdown of the student
population of BC?
12What Are Statistics?
- What is the gender breakdown of King County?
- What is the typical age of a King County
resident? - What is the ethnic makeup of King County?
- What is the typical income of King County?
- Go to www.census.gov
13What Are Statistics?
- What is the gender breakdown of Americans?
- What is typical age of an American?
- What is the ethnic breakdown?
- What is the typical income of an American
household?
14Chapter 1 - What are statistics?
- What do statistics do?
- Descriptive Statistics - summarizes data for a
population I have all the data - Inferential Statistics - uses samples to make
inferences about a population dont have all
the data but want to know about the population
15Chapter 1 - What are statistics?
- Where do you get the data from?
- Published sources US Census check on this as
some sources are more credible than others - Observation
- Sampling convenience and systematic sampling is
not as good as random sampling - Gold standard is an experimental study with
control and random assignment
16Chapter 1 - What are statistics?
Variable
A characteristic youre measuring
Sample
When you select a few from the population, you
have a sample.
Random Sample
Population
Every value in the population has an equal chance
of being selected.
When you measure everyone you are conducting a
census.
17Chapter 1 - What are statistics?
Types of Data
Qualitative
Quantitative
Categories, no numbers assessed, e.g. gender,
country of birth
Numerical
Discrete
Continuous
Can be counted, gaps in values, e.g. number of
students, cars
Part of a continuous Curve, e.g. time, weight,
volume
18Chapter 1 Types of Data (Not covered in text
but required)
19Classify as qualitative/quantitative,
discrete/continuous, nominal/ordinal/interval/rati
o
- Problem 1.12 p. 20
- High school GPA
- High school class rank
- SAT or ACT score
- Gender
- Parents income
- Age
- Problem 1.21 p. 21
- Length of maximum span (feet)
- Number of vehicle lanes
- Toll bridge (yes or no)
- Average daily traffic
- Condition of deck (good, fair, poor)
- Bypass or detour length (miles)
- Route type (interstate, US, state, county or city)
20Graphic display of information
21Chapter 2 Objectives
- Define and use minimum, maximum, and intervals
- Organize and tabulate data
- Differentiate and properly apply common graphs
such as pie charts, bar charts, line charts and
combinations - Visually display data for effective communication
22Chapter 2 - Graphs and Tables
Data Here is a sample of federal taxes paid by
140 people. Can you make sense of it? Look for
the maximum and minimum (4 and 96,497) Create
intervals that make sense (use 5 or 10s)
23Chapter 2 - Graphs and Tables
In this case, intervals of 10,000 were used. So
we create a table where there are enough
intervals to cover the minimum (4) and the
maximum (96,497). The intervals cannot
overlap. Sort the values into the intervals As
an example in this first line of data, all values
go into the first interval. Once youve sorted
all the data, each interval will have a frequency
(number of values that fall in that interval).
24Chapter 2 - Graphs and Tables
Calculating the percent in each interval gives a
better picture and gives you the probability
distribution of your data. Percent is calculated
by taking the interval frequency and dividing by
the grand total. For example, in the first
interval 127/14091. Cumulative percent is
calculated by adding the interval with all
previous intervals.
25Chapter 2 - Graphs and Tables
A frequency bar chart gives you the picture very
quickly.
26Chapter 2 - Graphs and Tables
Changing the intervals can give a more detailed
picture.
27Chapter 2 - Graphs and Tables
Different kinds of charts communicate better for
different kinds of data.
Pie charts for good for qualitative data where
there are fewer categories.
28Chapter 2 - Graphs and Tables
Frequency bar charts or histograms are good for
relative difference between intervals. Comments
on this chart?
29Chapter 2 Graphs and Tables
30Chapter 2 - Graphs and Tables
Frequency bar charts or histograms can be in
percent as well.
31Chapter 2 - Graphs and Tables
Cumulative frequency charts add up all the
preceding intervals. They are good for
showing cut-off points.
32Chapter 2 - Graphs and Tables
Line graphs are good for showing trends. This
line graph shows the growth of the U.S. economy
over time. It is a time series graph.
33Chapter 2 - Graphs and Tables
X
Y
A scatter plot graphs two variables so you can
see the relationship between the two.
34Chapter 2 - Graphs and Tables
- A stem leaf is very similar to a bar chart but it
retains the details of the individual values. - Steps in creating a stem leaf
- Rank all the values in order.
- Select a reasonable stem or interval. Numbers
may be truncated or rounded. - Enter every value as a leaf on the stem. Only
one number for each value to keep graphic
display. - Label the graph correctly.
35Chapter 2 Graphs and Tables
- Designate stem. You may have to truncate numbers.
- Sort data in stems.
- Only one number is recorded in the stem for the
purposes of this class. - Dont use decimals in the leaf.
- Label clearly.
Rank order data
Data
Retains detail. Graphical representation.
36Chapter 2 - Graphs and Tables
- Checklist for graphs
- Do I know and understand the data?
- Have I organized the data correctly?
- Have I selected the best type to communicate the
data? - Does the graph show the data to sufficient
detail? - Have I labeled the graph so it is quickly and
easily understood? (There is no limit on the
amount of text.)
37Chapter 2 - Measures of Central Tendency
- Mean
- Median - Rank all value from lowest to highest.
If the number of values is odd, count to the
midpoint or (n1)/2. If the number of values is
even, take the middle of the two values. Count to
the middle two values. Add the two values
together and divide by two. - Mode - Locate the value with the highest
frequency.
38Chapter 2 - Measures of Central Tendency
When a population has extreme values, the
median can be a better measure of centralness
than the mean or average. Take this example of a
class of 20 BA 240 students. We have asked
students for their net worth (basically what is
in the bank). Most students have a net worth of
about 2000, but one student, Bill G. has a net
worth of 50 billion. With Bill in the class, the
mean is not a good measure typical.
RANK ORDER ALL THE DATA. Median is between the
10th and 11th position. N1/210.5
Mode is the highest frequency value.
39Chapter 2 - Measures of Variation
- Range - The difference between the maximum and
minimum value. - Standard Deviation - A measure of dispersion
about the mean. - For most populations, the most common values will
be clustered around the central measures. - The further away from the center, the less the
frequency.
40Chapter 2 - Measures of Variation
41Chapter 2 - Measures of Variation
The concept of deviation is very important in
statistics. The sum of the squares which is the
sum of all deviations squared is also very
important in statistics.
42Chapter 2 - Measures of Variation
43Chapter 2 - Measures of Variation
Standard deviations give you the frequency of
values. In a normal distribution (bell-shaped
curve), just about the whole distribution is
within plus or minus 3 standard deviation of the
mean.
44Chapter 2 - Measures of Variation
For all distributions (normal or not), 90 of
values are plus or minus 3 standard deviations of
the mean.
45Chapter 2 - Measures of Relative Standing
Percentile works like cumulative percent. It lets
you know where you stand from the bottom. The
higher the percentile, the higher your score
compared to others. Calculation of percentile is
similar to median (which is 50 percentile). Use
the formula below (ppercentile) and round up for
the rank of the value then count until you get to
that rank.
Position only NOT the actual value COUNT to get
value
6 feet 2 inches 92 percentile
46Chapter 2 - Measures of Relative Standing
1961 50
1975 35
1979 31
1981 29
A Z-score tells you where you sit on the curve
relative to the mean. Look at the absolute value
of the z-score. The smaller it is, the closer it
is to the mean.
47Distribution of Height
48Chapter 3 Measures of Relative Standing
Females
Males
Take a sheet of paper and calculate your z-score.
49Chapter 2 - Outliers
Yao Ming 75
EXTREME OUTLIERS Z scores greater than 3 or
less than -3. NORMAL OUTLIERS Z scores greater
than 2 or less than -2.
50Chapter 2 - Measures of Relative Standing
51Review Summation with Linear Regression Worksheet
Be careful of the order of operations.
52Chapter 2 Bivariate Relationships
Correlation is the relationship between two
variables.
53Simple Linear Regression
Y-intercept or the value of y when x 0.
Slope or the change in y for every unit change in
x
Error because the actual data doesnt sit on the
line.
Y variable plotted on vertical axis.
X variable plotted on horizontal axis.
54(No Transcript)
55Preview - Linear Regression
The first step is to do a correlation
analysis Between x and y to determine the
strength of relationship.
56Preview Linear Regression
Fit a line to this data
57Preview - Linear Regression
If the correlation is significant then create a
regression analysis.
A regression creates a model of the relationship
between x and y. It fits a line to the scatter
plot by minimizing the distance between y and the
line or
58Preview - Linear Regression
The slope is calculated as
Tells you the change in the dependent variable
for every unit change in the independent
variable. Alternate calculation
Intercept Formula
59INTERCEPT
SLOPE
60Preview Linear Regression
Correlation tells the strength of relationship
between x and y. Correlation or R can be any
value from -1 to 1. Relationship may not be
linear.
61Correlation or R
62Preview Linear Regression
The coefficient of determination or R-square
measures the variation explained by the best-fit
line as a percent of the total variation Altern
ate calculation
Average y
63R Square
SSE
64Chapter 2 Summary
- Measure of central tendency give a quick picture
of typical - Mean
- Median
- Mode
- Measures of variation give an idea of the shape
and breadth of the distribution - Range
- Variance
- Standard Deviation
- Standard deviation can help define a distribution
- Values cluster within one standard deviation of
the mean - Know how to calculate standard deviation on
calculator and Excel - Measures of relative standing tell how a value
compares to the rest of the distribution - Percentile position from the bottom
- Z-score position relative to mean
- Outliers extreme values
65Linear Regression
- Linear regression is fitting a line to the data
which minimizes the sum of the squares
(deviation) between the line and the data - Short cut manual calculations for linear
regression
66Short cut formulae for intercept and slope