Introduction to Applied Statistics

About This Presentation

Title:

Introduction to Applied Statistics

Description:

The average cost of a wedding is nearly RM10,000. ... consists of the collection, organization, classification, summarization, and ... – PowerPoint PPT presentation

Number of Views:1743

Avg rating:3.0/5.0

Slides: 54

Provided by: notesU

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Applied Statistics

1
Introduction to Applied Statistics

CHAPTER 1
BCT2053

2
CONTENT

1.1 Overview
1.2 Statistical Problem-Solving Methodology
1.3 Review of Descriptive Statistics
1.3.1 Measures of Central Tendency
1.3.2 Measures of Variation

3
OBJECTIVE

By the end of this chapter, you should be able to
Define the meaning of statistics, population,
sample, parameter, statistic, descriptive
statistics and inferential statistics.
Understand and explain why a knowledge of
statistics is needed
Outline the 6 basic steps in the statistical
problem solving methodology.
Identifies various method to obtain samples.
Discuss the role of computers and data analysis
software in statistical work.
Summarize data using measures of central
tendency, such as the mean, median, mode, and
midrange.
Describe data using measures of variation, such
as the range, variance, and standard deviation.

4
1.1 OVERVIEW
5
What is Statistics?
Most people become familiar with probability and
statistics through radio, television, newspapers,
and magazines. For example, the following
statements were found in newspapers

Ten of thousands parents in Malaysia have chosen
StemLife as their trusted stem cell bank.
The average annual salary for a professional
football player for the year 2001 was 1,100,500.
The average cost of a wedding is nearly
RM10,000.
In USA, the median salary for men with a
bachelors degree is 49,982, while the median
salary for women with a bachelors degree is
35,408.
Globally, an estimated 500,000 children under
the age of 15 live with Type 1 diabetes.
Women who eat fish once a week are 29 less
likely to develop heart disease.

6
Statistics

is the sciences of conducting studies to collect,
organize, summarize, analyze, present, interpret
and draw conclusions from data.

Any values (observations or measurements) that
have been collected
7
The basic idea behind all statistical methods of
data analysis is to make inferences about a
population by studying small sample chosen from
it
Population The complete collection of
measurements outcomes, object or individual under
study
Parameter A number that describes a population
characteristics
Tangible Always finite after a population is
sampled, the population size decrease by 1 The
total number of members is fixed could be listed
Conceptual Population that consists of all the
value that might possibly have been observed
has an unlimited number of members
Sample A subset of a population, containing the
objects or outcomes that are actually observed
Statistic A number that describes a sample
characteristics
8
Descriptive Inferential Statistics

Inferential statistics
consists of generalizing from samples to
populations, performing estimations hypothesis
testing, determining relationships among
variables, and making predictions.
Used to describe, infer, estimate, approximate
the characteristics of the target population
Used when we want to draw a conclusion for the
data obtain from the sample

Descriptive statistics
consists of the collection, organization,
classification, summarization, and presentation
of data obtain from the sample.
Used to describe the characteristics of the
sample
Used to determine whether the sample represent
the target population by comparing sample
statistic and population parameter

9
Example 1

Ten of thousands parents in Malaysia have chosen
StemLife as their trusted stem cell bank.
(Descriptive)
The death rate from lung cancer was 10 times for
smokers compared to nonsmokers. (Inferential)
The average cost of a wedding is nearly
RM10,000. (Descriptive)
In USA, the median salary for men with a
bachelors degree is 49,982, while the median
salary for women with a bachelors degree is
35,408. (Descriptive)
Globally, an estimated 500,000 children under
the age of 15 live with Type 1 diabetes.
(Inferential)
A researcher claim that a new drug will reduce
the number of heart attacks in men over 70 years
of age. (Inferential)

10
An overview of descriptive statistics and
statistical inference
Descriptive Statistics
Yes
Statistical Inference
No
11
Need for Statistics

It is a fact that, you need a knowledge of
statistics to help you
Describe and understand numerical relationship
between variables
There are a lot of data in this world so we need
to identify the right variables.
Make better decision
Statistical methods allow people to make better
decisions in the face of uncertainty.

12
Describing relationship between variables

A management consultant wants to compare a
clients investment return for this year with
related figures from last year. He summarizes
masses of revenue and cost data from both periods
and based on his findings, presents his
recommendations to his client.
A college admission director needs to find an
effective way of selecting student applicants. He
design a statistical study to see if theres a
significance relationship between SPM result and
the gpa achieved by freshmen at his school. If
there is a strong relationship, high SPM result
will become an important criteria for acceptance.

13
Aiding in Decision Making

Suppose that the manager of Big-Wig Executive
Hair Stylist, Alvin Tang, has advertised that
90 of the firms customers are satisfied with
the companys services. If Pamela, a consumer
activist, feels that this is an exaggerated
statement that might require legal action, she
can use statistical inference techniques to
decide whether or not to sue Alvin.
Students and professional people can also use the
knowledge gained from studying statistics to
become better consumers and citizens. For
example, they can make intelligent decisions
about what products to purchase based on consumer
studies about government spending based on
utilization studies, and so on.

14
1.2 STATISTICAL PROBLEM SOLVING METHODOLOGY
15
STATISTICAL PROBLEM SOLVING METHODOLOGY

6 Basic Steps
Identifying the problem or opportunity
Deciding on the method of data collection
Collecting the data
Classifying and summarizing the data
Presenting and analyzing the data
Making the decision

16
STEP 1Identifying the problem or opportunity

Must clearly understand correctly define the
objective/goal of the study
If not, time effort are waste
Is the goal to study some population?
Is it to impose some treatment on the group
then test the response?
Can the study goal be achieved through simple
counts or measurements of the group?
Must an experiment be performed on the group?
If sample are needed, how large?, how should they
be taken? the larger the better (more than 30)

17
Characteristics of sample size

The larger the sample, the smaller the magnitude
of sampling errors.
Survey studies needed large sample because the
returns of the survey is voluntary based.
Easy to divide into subgroups.
In mail response the percentage of response may
be as low as 20-30, thus the bigger number of
samples is required.
Subject availability and cost factors are
legitimate considerations in determining
appropriate sample size.

18
STEP 2Deciding on the Method of Data Collection

Data must be gathered that are accurate, as
complete as possible relevant to the problem
Data can be obtained in 3 ways
Data that are made available by others (internal,
external, primary or secondary data)
Data resulting from an experiment (experimental
study)
Data collected in an observational study
(observation, survey, questionnaire, interview)

19
STEP 3Collecting the data

Nonprobability data
Is one in which the judgment of the experimenter,
the method in which the data are collected or
other factors could affect the results of the
sample
3 basic methods Judgment samples, Voluntary
samples and Convenience samples
Probability data
Is one in which the chance of selection of each
item in the population is known before the sample
is picked
4 basic methods random, systematic, stratified,
and cluster.

20
Nonprobability data samples

Judgment samples
Base on opinion of one or more expert person
Ex A political campaign manager intuitively
picks certain voting districts as reliable places
to measure the public opinion of his candidate
Voluntary samples
Question are posed to the public by publishing
them over radio or tv (phone or sms)
Convenience samples
Take an easy sample (most conveniently
available)
Ex A surveyor will stand in one location ask
passerby their questions

21
Probability data samples

Random samples
Selected using chance method or random methods
Example
A lecturer wants to study the physical fitness
levels of students at her university. There are
5,000 students enrolled at the university, and
she wants to draw a sample of size 100 to take a
physical fitness test. She obtains a list of all
5,000 students, numbered it from 1 to 5,000 and
then randomly invites 100 students corresponding
to those numbers to participate in the study.

22
Probability data samples

Systematic samples
Numbering each subject of the populations and
data is selected every kth number.
Example
A lecturer wants to study the physical fitness
levels of students at her university. There are
5,000 students enrolled at the university, and
she wants to draw a sample of size 100 to take a
physical fitness test. She obtains a list of all
5,000 students, numbered it from 1 to 5,000 and
randomly picks one of the first 50 voters
(5000/100 50) on the list. If the pick number
is 30, then the 30th student in the list should
be invited first. Then she should invite the
selected every 50th name on the list after this
first random starts (the 80th student, the 130th
student, etc) to produce 100 samples of students
to participate in the study.

23
Probability data samples

Stratified samples
Dividing the population into groups according to
some characteristics that is important to the
study, then sampling from each group
Example
A lecturer wants to study the physical fitness
levels of students at her university. There are
5,000 students enrolled at the university, and
she wants to draw a sample of size 100 to take a
physical fitness test. Assume that, because of
different lifestyles, the level of physical
fitness is different between male and female
students. To account for this variation in
lifestyle, the population of student can easily
be stratified into male and female students. Then
she can either use random method or systematic
methods to select the participants. As example
she can use random sample to chose 50 male
students and use systematic method to chose
another 50 female students or otherwise.

24
Probability data samples

Cluster samples
Dividing the population into sections/clusters,
then randomly select some of those cluster and
then choose all members from those selected
cluster
Using a cluster sampling can reduce cost and
time.
Example
A lecturer wants to study the physical fitness
levels of students at her university. There are
5,000 students enrolled at the university, and
she wants to draw a sample to take a physical
fitness test. Assume that, because of different
lifestyles, the level of physical fitness is
different between freshmen, sophomores, juniors
and seniors students. To account for this
variation in lifestyle, the population of student
can easily be clustered into freshmen,
sophomores, juniors and seniors students. Then
she can choose any one cluster such as freshmen
and take all the freshmen students as the
participant.

25
Identified the type of sampled obtain Example
1 A physical education professor wants to study
the physical fitness levels of students at her
university. There are 20,000 students enrolled at
the university, and she wants to draw a sample of
size 100 to take a physical fitness test. She
obtains a list of all 20,000 students, numbered
it from 1 to 20,000 and then invites the 100
students corresponding to those numbers to
participate in the study.
Example 2 A quality engineer wants to inspect
rolls of wallpaper in order to obtain information
on the rate at which flows in the printing are
occurring. She decides to draw a sample of 50
rolls of wallpaper from a days production. Each
hour for 5 hours, she takes the 10 most recently
produced rolls and counts the number of flaws on
each. Is this a simple random sample?
26
Example 3 Suppose we have a list of 1000
registered voters in a community and we want to
pick a probability sample of 50. We can use a
random number table to pick one of the first 20
voters (1000/50 20) on our list. If the table
gave us the number of 16, the 16th voter on the
list would be the first to be selected. We would
then pick every 20th name after this random start
(the 36th voter, the 56th voter, etc) to produce
a sample. Example 4 Consumer surveys of large
cities often employ cluster sampling. The usual
procedure is to divide a map of the city into
small blocks each blocks containing a cluster are
surveyed. A number of clusters are selected for
the sample, and all the households in a cluster
are surveyed. Using a cluster sampling can reduce
cost and time. Less energy and money are expended
if an interviewer stays within a specific area
rather than traveling across stretches of the
cities.
27
Example 5 Suppose our population is a university
student body. We want to estimate the average
annual expenditures of a college student for non
school items. Assume we know that, because of
different lifestyles, juniors and seniors spend
more than freshmen and sophomores, but there are
fewer students in the upper classes than in the
lower classes because of some dropout factor. To
account for this variation in lifestyle and group
size, the population of student can easily be
stratified into freshmen, sophomores, junior and
seniors. A sample can be stratum and each result
weighted to provide an overall estimate of
average non school expenditures. Example 6 A
research wanted to survey students in 100
homerooms in secondary school in a large school
district. They could first randomly select 10
schools from all the secondary schools in the
district. Then from a list of homerooms in the 10
schools they could randomly select 100.
28
STEP 4Classifying and Summarizing the data

Organize or group the facts/sample raw data for
study and investigation
Classifying- identifying items with like
characteristics arranging them into groups or
classes.
Ex Production data (product make, location,
production process ext..)
Data can be classified as Qualitative
(categorical/Attributes) data and Quantitative
(Numerical) data.
Summarization
Graphical Descriptive statistics ( tables,
charts, measure of central tendency, measure of
variation, measure of position)

29
Data Classification

Data are the values that variables can assume
Variables is a characteristic or attribute that
can assume different values.
Variables whose values are determined by chance
are called random variables

Variables can be classified
By how they are categorized, counted or measured
- Level of measurements of data
As Quantitative and Qualitative
30
Types of Data
Qualitative (categorical/Attributes) 1 Data that
refers only to name classification (done using
numbers) 2 Can be placed into distinct
categories according to some characteristic or
attribute.
Nominal Data (cant be rank) Gender, race,
citizenship. etc
Use code numbers (1, 2,)
Ordinal Data (can be rank) Feeling (dislike
like), color (dark bright) , etc
Discrete Variables Assume values that can be
counted and finite Ex no of something
Quantitative (Numerical) 1 Data that represent
counts or measurements (can be count or
measure) 2 Are numerical in nature and can be
ordered or ranked.
Continuous variables 1. Can assume all values
between any two specific values it obtained by
measuring 2. Have boundaries and must be rounded
because of the limits of measuring device Ex
weight, age, salary, height, temperature, etc
31

Example
The Lemon Marketing Corporation has asked you
for information about the car you drive. For each
question, identify each of the types of data
requested as either attribute data or numeric
data. When numeric data is requested, identify
the variable as discrete or continuous.
What is the weight of your car?
In what city was your car made?
How many people can be seated in your car?
Whats the distance traveled from your home to
your school?
Whats the color of your car?
How many cars are in your household?
Whats the length of your car?
Whats the normal operating temperature (in
degree Fahrenheit) of your cars engine?
What gas mileage (miles per gallon) do you get in
city driving?
Who made your car?
How many cylinders are there in your cars
engine?
How many miles have you put on your cars current
set of tyres?

32
Level of Measurements of Data
Examples
33
STEP 5Presenting and Analyzing the data

Summarized analyzed information given by the
graphical descriptive statistics
Identify the relationship of the information
Making any relevant statistical inferences
(hypothesis testing, confidence interval, ANOVA,
control charts, etc)

34
STEP 6Making the decision

The researchers can make a list of all the
options and decisions which can achieve the
objective and goal of the research, weighs the
options and choose the best options which
represents the best solution to the problem.
The correctness of this choice depends on the
analytical skill and the quality of the
information.

35
Statistical Problem Solving Methodology
No
Yes
Yes
No
36
Role of the Computer in Statistics

Two software tools commonly used for data
analysis
Spreadsheets
Microsoft Excel Lotus 1-2-3
Statistical Packages
MINITAB, SAS, SPSS and SPlus

37
1.3 REVIEW OF DESCRIPTIVE STATISTICS
38
Summary Statistics (Data Description)

Statistical methods can be used to summarize
data.
Measures of average are also called measures of
central tendency and include the mean, median,
mode, and midrange.
Measures that determine the spread of data values
are called measures of variation or measures of
dispersion and include the range, variance, and
standard deviation.
Measures of position tell where a specific data
value falls within the data set or its relative
position in comparison with other data values.
The most common measures of position are
percentiles, deciles, and quartiles.
The measures of central tendency, variation, and
position are part of what is called traditional
statistics. This type of data is typically used
to confirm conjectures about the data

1.3.1 Measures of Central Tendency

Mean the sum of the values divided by the total
number of values.
Population Mean
Sample Mean
Example 9 2 1 4 3 3 7 5 8
6
40
Properties of Mean

The mean is compute by using all the values of
the data.
The mean varies less than the median or mode when
samples are taken from the same population and
all three measures are computed for these
samples.
The mean is used in computing other statistics,
such as variance.
The mean for the data set is unique, and not
necessarily one of the data values.
The mean cannot be computed for an open-ended
frequency distribution.
The mean is affected by extremely high or low
values and may not be the appropriate average to
use in these situations

1.3.1 Measures of Central Tendency

Median the middle number of n ordered data
(smallest to largest)
If n is odd
If n is even
Example 9 2 1 4 3
3 7 5 8 6
Example 9 2 1 3 3
7 5 8 6
42
Properties of Median

The median is used when one must find the center
or middle value of a data set.
The median is used when one must determine
whether the data values fall into the upper half
or lower half of the distribution.
The median is used to find the average of an
open-ended distribution.
The median is affected less than the mean by
extremely high or extremely low values.

1.3.1 Measures of Central Tendency

Mode the most commonly occurring value in a data
series

The mode is used when the most typical case is
desired.
The mode is the easiest average to compute.
The mode can be used when the data are nominal,
such as religious preference, gender, or
political affiliation.
The mode is not always unique. A data set can
have more than one mode, or the mode may not
exist for a data set.

Example 9 2 1 4 3 3 7 5 8 6
44

1.3.1 Measures of Central Tendency

Midrange is a rough estimate of the middle
also a very rough estimate of the average and can
be affected by one extremely high or low value.
Example 9 2 1 4 3 3 7 5 8 6
45
Types of Distribution
Symmetric
Positively skewed or right-skewed
Negatively skewed or left-skewed
46

1.3.2 Measures of Variation / Dispersion

Used when the central of tendency doesn't mean
anything or not needed (ex mean are same for two
types of data)
One that measure the variability that exists in a
data set
To form a judgment about how well the average
value illustrate/ depict the data
To learn the extent of the scatter so that steps
may be taken to control the existing variation

1.3.2 Measures of Variation / Dispersion

Range is the different between the highest
value and the lowest value in a data set. The
symbol R is used for the range.
R highest value - lowest value
Example 9 2 1 4 3 3 7 5 8 6
48

1.3.2 Measures of Variation / Dispersion

Variance is the average of the squares of the
distance each value is from the mean.
Population Variance
Sample Variance
Population standard deviation , ?
Sample standard deviation, s
Example 9 2 1 4 3 3
7 5 8 6
Standard Deviation is the square root of the
variance
49
Properties of Variance
Standard Deviation

Variances and standard deviations can be used to
determine the spread of the data. If the variance
or standard deviation is large, the data are more
dispersed. The information is useful in comparing
two or more data sets to determine which is more
variable.
The measures of variance and standard deviation
are used to determine the consistency of a
variable.
The variance and standard deviation are used to
determine the number of data values that fall
within a specified interval in a distribution.
The variance and standard deviation are used
quite often in inferential statistics.
The standard deviation is used to estimate amount
of spread in the population from which the sample
was drawn.

50
Chebychev Theorem
51
TIPS Calculate mean and variance by
using Scientific Calculator

Casio fx-570MS
Insert data
MODE SD data M
Shift 1
Shift 2
Clear data
Shift CLR 1

Casio fx-570W
Insert data
MODE SD data M
Shift 1
Shift 2
Shift 3
Shift 4
Clear data
Shift AC/ON

52
Conclusion

The applications of statistics are many and
varied. People encounter them in everyday life,
such as in reading newspapers or magazines,
listening to the radio, or watching television.
By combining all of the descriptive statistics
techniques discussed in this chapter together,
the student is now able to collect, organize,
summarize and present data.