Introduction to Applied Statistics - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Introduction to Applied Statistics

Description:

The average cost of a wedding is nearly RM10,000. ... consists of the collection, organization, classification, summarization, and ... – PowerPoint PPT presentation

Number of Views:1738
Avg rating:3.0/5.0
Slides: 54
Provided by: notesU
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Applied Statistics


1
Introduction to Applied Statistics
  • CHAPTER 1
  • BCT2053

2
CONTENT
  • 1.1 Overview
  • 1.2 Statistical Problem-Solving Methodology
  • 1.3 Review of Descriptive Statistics
  • 1.3.1 Measures of Central Tendency
  • 1.3.2 Measures of Variation

3
OBJECTIVE
  • By the end of this chapter, you should be able to
  • Define the meaning of statistics, population,
    sample, parameter, statistic, descriptive
    statistics and inferential statistics.
  • Understand and explain why a knowledge of
    statistics is needed
  • Outline the 6 basic steps in the statistical
    problem solving methodology.
  • Identifies various method to obtain samples.
  • Discuss the role of computers and data analysis
    software in statistical work.
  • Summarize data using measures of central
    tendency, such as the mean, median, mode, and
    midrange.
  • Describe data using measures of variation, such
    as the range, variance, and standard deviation.

4
1.1 OVERVIEW
5
What is Statistics?
Most people become familiar with probability and
statistics through radio, television, newspapers,
and magazines. For example, the following
statements were found in newspapers
  • Ten of thousands parents in Malaysia have chosen
    StemLife as their trusted stem cell bank.
  • The average annual salary for a professional
    football player for the year 2001 was 1,100,500.
  • The average cost of a wedding is nearly
    RM10,000.
  • In USA, the median salary for men with a
    bachelors degree is 49,982, while the median
    salary for women with a bachelors degree is
    35,408.
  • Globally, an estimated 500,000 children under
    the age of 15 live with Type 1 diabetes.
  • Women who eat fish once a week are 29 less
    likely to develop heart disease.

6
Statistics
  • is the sciences of conducting studies to collect,
    organize, summarize, analyze, present, interpret
    and draw conclusions from data.

Any values (observations or measurements) that
have been collected
7
The basic idea behind all statistical methods of
data analysis is to make inferences about a
population by studying small sample chosen from
it
Population The complete collection of
measurements outcomes, object or individual under
study
Parameter A number that describes a population
characteristics
Tangible Always finite after a population is
sampled, the population size decrease by 1 The
total number of members is fixed could be listed
Conceptual Population that consists of all the
value that might possibly have been observed
has an unlimited number of members
Sample A subset of a population, containing the
objects or outcomes that are actually observed
Statistic A number that describes a sample
characteristics
8
Descriptive Inferential Statistics
  • Inferential statistics
  • consists of generalizing from samples to
    populations, performing estimations hypothesis
    testing, determining relationships among
    variables, and making predictions.
  • Used to describe, infer, estimate, approximate
    the characteristics of the target population
  • Used when we want to draw a conclusion for the
    data obtain from the sample
  • Descriptive statistics
  • consists of the collection, organization,
    classification, summarization, and presentation
    of data obtain from the sample.
  • Used to describe the characteristics of the
    sample
  • Used to determine whether the sample represent
    the target population by comparing sample
    statistic and population parameter

9
Example 1
  • Ten of thousands parents in Malaysia have chosen
    StemLife as their trusted stem cell bank.
    (Descriptive)
  • The death rate from lung cancer was 10 times for
    smokers compared to nonsmokers. (Inferential)
  • The average cost of a wedding is nearly
    RM10,000. (Descriptive)
  • In USA, the median salary for men with a
    bachelors degree is 49,982, while the median
    salary for women with a bachelors degree is
    35,408. (Descriptive)
  • Globally, an estimated 500,000 children under
    the age of 15 live with Type 1 diabetes.
    (Inferential)
  • A researcher claim that a new drug will reduce
    the number of heart attacks in men over 70 years
    of age. (Inferential)

10
An overview of descriptive statistics and
statistical inference
Descriptive Statistics
Yes
Statistical Inference
No
11
Need for Statistics
  • It is a fact that, you need a knowledge of
    statistics to help you
  • Describe and understand numerical relationship
    between variables
  • There are a lot of data in this world so we need
    to identify the right variables.
  • Make better decision
  • Statistical methods allow people to make better
    decisions in the face of uncertainty.

12
Describing relationship between variables
  • A management consultant wants to compare a
    clients investment return for this year with
    related figures from last year. He summarizes
    masses of revenue and cost data from both periods
    and based on his findings, presents his
    recommendations to his client.
  • A college admission director needs to find an
    effective way of selecting student applicants. He
    design a statistical study to see if theres a
    significance relationship between SPM result and
    the gpa achieved by freshmen at his school. If
    there is a strong relationship, high SPM result
    will become an important criteria for acceptance.

13
Aiding in Decision Making
  • Suppose that the manager of Big-Wig Executive
    Hair Stylist, Alvin Tang, has advertised that
    90 of the firms customers are satisfied with
    the companys services. If Pamela, a consumer
    activist, feels that this is an exaggerated
    statement that might require legal action, she
    can use statistical inference techniques to
    decide whether or not to sue Alvin.
  • Students and professional people can also use the
    knowledge gained from studying statistics to
    become better consumers and citizens. For
    example, they can make intelligent decisions
    about what products to purchase based on consumer
    studies about government spending based on
    utilization studies, and so on.

14
1.2 STATISTICAL PROBLEM SOLVING METHODOLOGY
15
STATISTICAL PROBLEM SOLVING METHODOLOGY
  • 6 Basic Steps
  • Identifying the problem or opportunity
  • Deciding on the method of data collection
  • Collecting the data
  • Classifying and summarizing the data
  • Presenting and analyzing the data
  • Making the decision

16
STEP 1Identifying the problem or opportunity
  • Must clearly understand correctly define the
    objective/goal of the study
  • If not, time effort are waste
  • Is the goal to study some population?
  • Is it to impose some treatment on the group
    then test the response?
  • Can the study goal be achieved through simple
    counts or measurements of the group?
  • Must an experiment be performed on the group?
  • If sample are needed, how large?, how should they
    be taken? the larger the better (more than 30)

17
Characteristics of sample size
  • The larger the sample, the smaller the magnitude
    of sampling errors.
  • Survey studies needed large sample because the
    returns of the survey is voluntary based.
  • Easy to divide into subgroups.
  • In mail response the percentage of response may
    be as low as 20-30, thus the bigger number of
    samples is required.
  • Subject availability and cost factors are
    legitimate considerations in determining
    appropriate sample size.

18
STEP 2Deciding on the Method of Data Collection
  • Data must be gathered that are accurate, as
    complete as possible relevant to the problem
  • Data can be obtained in 3 ways
  • Data that are made available by others (internal,
    external, primary or secondary data)
  • Data resulting from an experiment (experimental
    study)
  • Data collected in an observational study
    (observation, survey, questionnaire, interview)

19
STEP 3Collecting the data
  • Nonprobability data
  • Is one in which the judgment of the experimenter,
    the method in which the data are collected or
    other factors could affect the results of the
    sample
  • 3 basic methods Judgment samples, Voluntary
    samples and Convenience samples
  • Probability data
  • Is one in which the chance of selection of each
    item in the population is known before the sample
    is picked
  • 4 basic methods random, systematic, stratified,
    and cluster.

20
Nonprobability data samples
  • Judgment samples
  • Base on opinion of one or more expert person
  • Ex A political campaign manager intuitively
    picks certain voting districts as reliable places
    to measure the public opinion of his candidate
  • Voluntary samples
  • Question are posed to the public by publishing
    them over radio or tv (phone or sms)
  • Convenience samples
  • Take an easy sample (most conveniently
    available)
  • Ex A surveyor will stand in one location ask
    passerby their questions

21
Probability data samples
  • Random samples
  • Selected using chance method or random methods
  • Example
  • A lecturer wants to study the physical fitness
    levels of students at her university. There are
    5,000 students enrolled at the university, and
    she wants to draw a sample of size 100 to take a
    physical fitness test. She obtains a list of all
    5,000 students, numbered it from 1 to 5,000 and
    then randomly invites 100 students corresponding
    to those numbers to participate in the study.

22
Probability data samples
  • Systematic samples
  • Numbering each subject of the populations and
    data is selected every kth number.
  • Example
  • A lecturer wants to study the physical fitness
    levels of students at her university. There are
    5,000 students enrolled at the university, and
    she wants to draw a sample of size 100 to take a
    physical fitness test. She obtains a list of all
    5,000 students, numbered it from 1 to 5,000 and
    randomly picks one of the first 50 voters
    (5000/100 50) on the list. If the pick number
    is 30, then the 30th student in the list should
    be invited first. Then she should invite the
    selected every 50th name on the list after this
    first random starts (the 80th student, the 130th
    student, etc) to produce 100 samples of students
    to participate in the study.

23
Probability data samples
  • Stratified samples
  • Dividing the population into groups according to
    some characteristics that is important to the
    study, then sampling from each group
  • Example
  • A lecturer wants to study the physical fitness
    levels of students at her university. There are
    5,000 students enrolled at the university, and
    she wants to draw a sample of size 100 to take a
    physical fitness test. Assume that, because of
    different lifestyles, the level of physical
    fitness is different between male and female
    students. To account for this variation in
    lifestyle, the population of student can easily
    be stratified into male and female students. Then
    she can either use random method or systematic
    methods to select the participants. As example
    she can use random sample to chose 50 male
    students and use systematic method to chose
    another 50 female students or otherwise.

24
Probability data samples
  • Cluster samples
  • Dividing the population into sections/clusters,
    then randomly select some of those cluster and
    then choose all members from those selected
    cluster
  • Using a cluster sampling can reduce cost and
    time.
  • Example
  • A lecturer wants to study the physical fitness
    levels of students at her university. There are
    5,000 students enrolled at the university, and
    she wants to draw a sample to take a physical
    fitness test. Assume that, because of different
    lifestyles, the level of physical fitness is
    different between freshmen, sophomores, juniors
    and seniors students. To account for this
    variation in lifestyle, the population of student
    can easily be clustered into freshmen,
    sophomores, juniors and seniors students. Then
    she can choose any one cluster such as freshmen
    and take all the freshmen students as the
    participant.

25
Identified the type of sampled obtain Example
1 A physical education professor wants to study
the physical fitness levels of students at her
university. There are 20,000 students enrolled at
the university, and she wants to draw a sample of
size 100 to take a physical fitness test. She
obtains a list of all 20,000 students, numbered
it from 1 to 20,000 and then invites the 100
students corresponding to those numbers to
participate in the study.
Example 2 A quality engineer wants to inspect
rolls of wallpaper in order to obtain information
on the rate at which flows in the printing are
occurring. She decides to draw a sample of 50
rolls of wallpaper from a days production. Each
hour for 5 hours, she takes the 10 most recently
produced rolls and counts the number of flaws on
each. Is this a simple random sample?
26
Example 3 Suppose we have a list of 1000
registered voters in a community and we want to
pick a probability sample of 50. We can use a
random number table to pick one of the first 20
voters (1000/50 20) on our list. If the table
gave us the number of 16, the 16th voter on the
list would be the first to be selected. We would
then pick every 20th name after this random start
(the 36th voter, the 56th voter, etc) to produce
a sample. Example 4 Consumer surveys of large
cities often employ cluster sampling. The usual
procedure is to divide a map of the city into
small blocks each blocks containing a cluster are
surveyed. A number of clusters are selected for
the sample, and all the households in a cluster
are surveyed. Using a cluster sampling can reduce
cost and time. Less energy and money are expended
if an interviewer stays within a specific area
rather than traveling across stretches of the
cities.
27
Example 5 Suppose our population is a university
student body. We want to estimate the average
annual expenditures of a college student for non
school items. Assume we know that, because of
different lifestyles, juniors and seniors spend
more than freshmen and sophomores, but there are
fewer students in the upper classes than in the
lower classes because of some dropout factor. To
account for this variation in lifestyle and group
size, the population of student can easily be
stratified into freshmen, sophomores, junior and
seniors. A sample can be stratum and each result
weighted to provide an overall estimate of
average non school expenditures. Example 6 A
research wanted to survey students in 100
homerooms in secondary school in a large school
district. They could first randomly select 10
schools from all the secondary schools in the
district. Then from a list of homerooms in the 10
schools they could randomly select 100.
28
STEP 4Classifying and Summarizing the data
  • Organize or group the facts/sample raw data for
    study and investigation
  • Classifying- identifying items with like
    characteristics arranging them into groups or
    classes.
  • Ex Production data (product make, location,
    production process ext..)
  • Data can be classified as Qualitative
    (categorical/Attributes) data and Quantitative
    (Numerical) data.
  • Summarization
  • Graphical Descriptive statistics ( tables,
    charts, measure of central tendency, measure of
    variation, measure of position)

29
Data Classification
  • Data are the values that variables can assume
  • Variables is a characteristic or attribute that
    can assume different values.
  • Variables whose values are determined by chance
    are called random variables

Variables can be classified
By how they are categorized, counted or measured
- Level of measurements of data
As Quantitative and Qualitative
30
Types of Data
Qualitative (categorical/Attributes) 1 Data that
refers only to name classification (done using
numbers) 2 Can be placed into distinct
categories according to some characteristic or
attribute.
Nominal Data (cant be rank) Gender, race,
citizenship. etc
Use code numbers (1, 2,)
Ordinal Data (can be rank) Feeling (dislike
like), color (dark bright) , etc
Discrete Variables Assume values that can be
counted and finite Ex no of something
Quantitative (Numerical) 1 Data that represent
counts or measurements (can be count or
measure) 2 Are numerical in nature and can be
ordered or ranked.
Continuous variables 1. Can assume all values
between any two specific values it obtained by
measuring 2. Have boundaries and must be rounded
because of the limits of measuring device Ex
weight, age, salary, height, temperature, etc
31
  • Example
  • The Lemon Marketing Corporation has asked you
    for information about the car you drive. For each
    question, identify each of the types of data
    requested as either attribute data or numeric
    data. When numeric data is requested, identify
    the variable as discrete or continuous.
  • What is the weight of your car?
  • In what city was your car made?
  • How many people can be seated in your car?
  • Whats the distance traveled from your home to
    your school?
  • Whats the color of your car?
  • How many cars are in your household?
  • Whats the length of your car?
  • Whats the normal operating temperature (in
    degree Fahrenheit) of your cars engine?
  • What gas mileage (miles per gallon) do you get in
    city driving?
  • Who made your car?
  • How many cylinders are there in your cars
    engine?
  • How many miles have you put on your cars current
    set of tyres?

32
Level of Measurements of Data
Examples
33
STEP 5Presenting and Analyzing the data
  • Summarized analyzed information given by the
    graphical descriptive statistics
  • Identify the relationship of the information
  • Making any relevant statistical inferences
    (hypothesis testing, confidence interval, ANOVA,
    control charts, etc)

34
STEP 6Making the decision
  • The researchers can make a list of all the
    options and decisions which can achieve the
    objective and goal of the research, weighs the
    options and choose the best options which
    represents the best solution to the problem.
  • The correctness of this choice depends on the
    analytical skill and the quality of the
    information.

35
Statistical Problem Solving Methodology
No
Yes
Yes
No
36
Role of the Computer in Statistics
  • Two software tools commonly used for data
  • analysis
  • Spreadsheets
  • Microsoft Excel Lotus 1-2-3
  • Statistical Packages
  • MINITAB, SAS, SPSS and SPlus

37
1.3 REVIEW OF DESCRIPTIVE STATISTICS
38
Summary Statistics (Data Description)
  • Statistical methods can be used to summarize
    data.
  • Measures of average are also called measures of
    central tendency and include the mean, median,
    mode, and midrange.
  • Measures that determine the spread of data values
    are called measures of variation or measures of
    dispersion and include the range, variance, and
    standard deviation.
  • Measures of position tell where a specific data
    value falls within the data set or its relative
    position in comparison with other data values.
    The most common measures of position are
    percentiles, deciles, and quartiles.
  • The measures of central tendency, variation, and
    position are part of what is called traditional
    statistics. This type of data is typically used
    to confirm conjectures about the data

39
  • 1.3.1 Measures of Central Tendency

Mean the sum of the values divided by the total
number of values.
Population Mean
Sample Mean
Example 9 2 1 4 3 3 7 5 8
6
40
Properties of Mean
  • The mean is compute by using all the values of
    the data.
  • The mean varies less than the median or mode when
    samples are taken from the same population and
    all three measures are computed for these
    samples.
  • The mean is used in computing other statistics,
    such as variance.
  • The mean for the data set is unique, and not
    necessarily one of the data values.
  • The mean cannot be computed for an open-ended
    frequency distribution.
  • The mean is affected by extremely high or low
    values and may not be the appropriate average to
    use in these situations

41
  • 1.3.1 Measures of Central Tendency

Median the middle number of n ordered data
(smallest to largest)
If n is odd
If n is even
Example 9 2 1 4 3
3 7 5 8 6
Example 9 2 1 3 3
7 5 8 6
42
Properties of Median
  • The median is used when one must find the center
    or middle value of a data set.
  • The median is used when one must determine
    whether the data values fall into the upper half
    or lower half of the distribution.
  • The median is used to find the average of an
    open-ended distribution.
  • The median is affected less than the mean by
    extremely high or extremely low values.

43
  • 1.3.1 Measures of Central Tendency

Mode the most commonly occurring value in a data
series
  • The mode is used when the most typical case is
    desired.
  • The mode is the easiest average to compute.
  • The mode can be used when the data are nominal,
    such as religious preference, gender, or
    political affiliation.
  • The mode is not always unique. A data set can
    have more than one mode, or the mode may not
    exist for a data set.

Example 9 2 1 4 3 3 7 5 8 6
44
  • 1.3.1 Measures of Central Tendency

Midrange is a rough estimate of the middle
also a very rough estimate of the average and can
be affected by one extremely high or low value.
Example 9 2 1 4 3 3 7 5 8 6
45
Types of Distribution
Symmetric
Positively skewed or right-skewed
Negatively skewed or left-skewed
46
  • 1.3.2 Measures of Variation / Dispersion
  • Used when the central of tendency doesn't mean
    anything or not needed (ex mean are same for two
    types of data)
  • One that measure the variability that exists in a
    data set
  • To form a judgment about how well the average
    value illustrate/ depict the data
  • To learn the extent of the scatter so that steps
    may be taken to control the existing variation

47
  • 1.3.2 Measures of Variation / Dispersion

Range is the different between the highest
value and the lowest value in a data set. The
symbol R is used for the range.
R highest value - lowest value
Example 9 2 1 4 3 3 7 5 8 6
48
  • 1.3.2 Measures of Variation / Dispersion

Variance is the average of the squares of the
distance each value is from the mean.
Population Variance
Sample Variance
Population standard deviation , ?
Sample standard deviation, s
Example 9 2 1 4 3 3
7 5 8 6
Standard Deviation is the square root of the
variance
49
Properties of Variance
Standard Deviation
  • Variances and standard deviations can be used to
    determine the spread of the data. If the variance
    or standard deviation is large, the data are more
    dispersed. The information is useful in comparing
    two or more data sets to determine which is more
    variable.
  • The measures of variance and standard deviation
    are used to determine the consistency of a
    variable.
  • The variance and standard deviation are used to
    determine the number of data values that fall
    within a specified interval in a distribution.
  • The variance and standard deviation are used
    quite often in inferential statistics.
  • The standard deviation is used to estimate amount
    of spread in the population from which the sample
    was drawn.

50
Chebychev Theorem
51
TIPS Calculate mean and variance by
using Scientific Calculator
  • Casio fx-570MS
  • Insert data
  • MODE SD data M
  • Shift 1
  • Shift 2
  • Clear data
  • Shift CLR 1
  • Casio fx-570W
  • Insert data
  • MODE SD data M
  • Shift 1
  • Shift 2
  • Shift 3
  • Shift 4
  • Clear data
  • Shift AC/ON

52
Conclusion
  • The applications of statistics are many and
    varied. People encounter them in everyday life,
    such as in reading newspapers or magazines,
    listening to the radio, or watching television.
  • By combining all of the descriptive statistics
    techniques discussed in this chapter together,
    the student is now able to collect, organize,
    summarize and present data.

53
Thank You
  • See You in CHAPTER 2
  • SAMPLING DISTRIBUTION AND CONFIDENCE INTERVAL
Write a Comment
User Comments (0)
About PowerShow.com