Statistical Techniques for Analyzing Quantitative Data - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Statistical Techniques for Analyzing Quantitative Data

Description:

The null hypothesis H0 expresses the idea that the observed difference is due to ... as the statement we hope or suspect is true instead of the null hypothesis. ... – PowerPoint PPT presentation

Number of Views:734
Avg rating:3.0/5.0
Slides: 42
Provided by: maryamr
Category:

less

Transcript and Presenter's Notes

Title: Statistical Techniques for Analyzing Quantitative Data


1
Statistical Techniques for Analyzing Quantitative
Data
  • Maryam Ramezani
  • Values in Computer Technology
  • CSC 426

2
Outline
3
Role of Statistics in Research
  • With Statistics , we can summarize large bodies
    of data, make predictions about future trends
    ,and determine when different experimental
    treatments have led to significantly different
    outcomes.
  • Statistics are among the most powerful tools in
    the research's toolbox.

4
How statistics come to research?
  • In quantitative research we use numbers to
    represent physical or nonphysical phenomena
  • We use statistics to summarize and interpret
    numbers

5
Exploring and Organizing a Data Set
  • Look at your data and find the ways of organizing
    them
  • example Scores of test for 11 children
  • What do you see?

Ruth 96, Robert 60, chuck 68, Margaret 88 Tom
56, Mary 92,Ralph 64, Bill 72,Alice 80 Adam
76,Kathy 84
6
Exploring and Organizing a Data Set
Alphabetical Order
7
Using Computer Spreadsheets to Organize and
Analyze Data
  • Sorting
  • Graphing
  • Formulas
  • What Ifs
  • Save, Store, recall, update information

8
Functions of Statistics
  • Descriptive Statistics
  • describes what the data look like
  • Inferential Statistics
  • inference about a large population by collecting
    small samples.

9
Considering the Nature of the Data
  • Continuous or discrete
  • Nominal, ordinal, interval or ratio scale
  • Normal or non-normal distribution

10
Continuous versus Discrete Variables
  • Continuous Data takes on any value within a
    finite or infinite interval. You can count, order
    and measure continuous data.
  • Example height, weight, temperature, the amount
    of sugar in an orange, the time required to run a
    mile.
  • Discrete Data values / observations belong are
    distinct and separate, i.e. they can be counted
    (1,2,3,....).
  • Example the number of kittens in a litter the
    number of patients in a doctors surgery the
    number of flaws in one metre of cloth gender
    (male, female) blood group (O, A, B, AB).

11
Nominal Data
  • the numbers are simply labels. You can count but
    not order or measure nominal data
  • Example males could be coded as 0, females as 1
    marital status of an individual could be coded as
    Y if married, N if single.
  • classification data, e.g. m/f
  • no ordering, e.g. it makes no sense to state that
    M gt F
  • arbitrary labels, e.g., m/f, 0/1, etc

12
Ordinal Data
  • ordered but differences between values are not
    important
  • e.g., Like scales, rank on a scale of 1..5 your
    degree of satisfaction
  • rating of 2 rather than 1 might be much less than
    the difference in enjoyment expressed by giving a
    rating of 4 rather than 3.
  • You can count and order, but not measure, ordinal
    data.

13
Interval Data
  • ordered, constant scale, but no natural zero
  • differences make sense, but ratios do not
  • e.g. 30-2020-10, but 20/10 is not twice
    as hot!
  • e.g. Dates the time interval between the starts
    of years 1981 and 1982 is the same as that
    between 1983 and 1984, namely 365 days. The zero
    point, year 1 AD, is arbitrary time did not
    begin then

14
Ratio Data
  • Like interval data but has true zero
  • Ordered, Constant scale, natural zero
  • e.g., height, weight, age, length

15
Normal and Non-Normal Distributions
16
Normal Distribution
17
Non-Normal Distributions
Skewed to the Left(Negatively Skewed)
Skewed to the Right (Positevely Skewed)
18
Leptokurtic and Platykurtic Distributions
19
Descriptive Statistics
  • Descriptive Statistics describes data
  • Points of Central Tendency
  • Amount of Variability
  • Relation of different variables to each other

20
Points Of Central Tendency Mean
  • Measuring center If the n observations are x1,
    x2,, xn, arithmetic mean is

Geometric Mean
e.x. Biological growth, Population growth
21
Measure of Central Tendency
22
Measures of Variability
How great is the Spread? RangeHighest
Score-Lowest score the quartiles The pth
percentile of a distribution is the value such
that p percent of the observations fall at or
below it. The 50th percentile median, M The
25th percentile first quartile, Q1 The 75th
percentile third quartile, Q3 Interquartile
Quartile 3- Quartile 1
  • Example
  • 13 13 16 19 21 21 23 23 24 26 26 27 27
    27 28 28 30 30
  • M?, Q1?, Q3?

23
Measures of Variability
Standard Devastation
standardized score
24
Measure of Relationship Correlation
  • correlation indicates the strength and direction
    of a linear relationship between two variables.
  • See page 266 for other examples or correlation
    statistics

25
Notes about Correlation
  • Substantial correlations between two
    characteristics needs reasonable Validity and
    Reliability in measuring
  • Correlation does not indicate causation

26
Examples of using Statistics in Computer Science
  • Conceptual Representation of User Transactions or
    Sessions

Pageview/objects
Session/user data
27
Inferential Statistics
  • We use the samples as estimate of population
    parameter.
  • The quality of all statistical analysis depends
    on the quality of the sample data

Random Sampling every unit in the population
has an equal chance to be Chosen A random sample
should represent the population well, so sample
statistics from a random sample should provide
reasonable estimates of population parameters
28
Some definitions
  • Parameter describes a population
  • Statistic describes a sample

A parameter is a characteristic or quality of a
population that in concept is constant ,however,
its value is variable. example radius is a
parameter in a circle
29
Inferential Statistics
  • Estimate a population parameter from a random
    sample
  • Test statistically hypotheses

30
Inferential Statistics Estimate a Population
Parameter from Sample
  • All sample statistics have some error in
    estimating population parameters
  • Example estimate mean height of 10 year old boys
    in Chicago, Sample200 boys
  • How close the sample mean is to the population
    mean?
  • we dont know but we know
  • The mean from an infinite number of samples form
    a normal distribution.
  • The population mean equals the average (mean) of
    all samples.
  • The Standard deviation of sample distribution (
    standard error) is directly related to the std
    of the characteristic in question for the overall
    population.

31
Standard Error
  • Standard error tell us how much the particular
    mean vary from one sample to another when all
    samples are the same size and drawn randomly from
    the sample population.
  • Standard Error
  • n is size of all samples and s is the population
    std which we dont have!
  • We use the std of sample

32
Accuracy of the Estimator
As in many problems, there is a trade off between
accuracy and dollars.
What we will get from our money if we
invest dollars in obtaining a larger size?
n 100? n 200?
33
Point versus Interval Estimate
  • A point estimate is a single value--a
    point--taken from a sample and used to estimate
    the corresponding parameter of a population
  • , s, s2 and r estimate µ, s, s2, ?
    respectively
  • An interval estimate is a range of values--an
    interval within whose limits a population
    parameter probably lies.
  • we say that we are 95 confident that the unknown
    population mean lies in the interval

95 confidence interval for µ.
(x -2?/(n1/2), x2 ?/(n1/2))
  • In only 5 of all samples,
  • the sample mean x is not in the above interval,
  • that is 5 of all samples give inaccurate results.

34
Testing Hypothesis
  • Confidence intervals are used when the goal of
    our analysis is to estimate an unknown parameter
    in the population.
  • A second goal of a statistical analysis is to
    verify some claim about the population on the
    basis of the data.
  • Research Hypothesis /Statistical hypothesis
  • A test of significance is a procedure to assess
    the truth about a hypothesis using the observed
    data. The results of the test are expressed in
    terms of a probability that measures how well the
    data support the hypothesis.

35
Example To determine whether the mean nicotine
content of a brand of cigarettes is greater than
the advertised value of 1.4 milligrams, a health
advocacy group takes a sample of 500 cigarettes
and measures the amount of nicotine in the
sample.
Sample values The sample average of nicotine
1.51 mlg The standard deviation 1.016.
The estimated amount of nicotine is 1.51mlg,
based on the sample values. The standard error
of the sample average is S.E.s.d./sqrt(n-1)0.04
5 Is there an actual difference between the
sample value (1.51mlg) and the advertised value
(1.4 mlg)? Or is it just due to sampling
error? To answer this question we need a Test of
Significance
36
Stating an hypotheses
The null hypothesis H0 expresses the idea that
the observed difference is due to chance. It is a
statement of no effect or no difference,
and is expressed in terms of the population
parameter.
Let ? denote the true average amount of
nicotine. H0 ? 1.4mlg
The alternative hypothesis Ha represents the idea
that the difference is real. It is expressed as
the statement we hope or suspect is true instead
of the null hypothesis.
The alternative hypothesis states that the
cigarettes contain a higher amount of nicotine,
that is Ha ? gt 14mlg
37
General comments on stating hypotheses
  • It is not easy to state the null and the
    alternative hypothesis!
  • The hypotheses are statements on the population
    values.
  • The alternative hypothesis Ha is often called
    researcher hypothesis, because it is the
    hypothesis we are interested about.
  • A significance test is a test against the null
    hypothesis
  • Often we set Ha first and then Ho is defined as
    the opposite statement!

38
Errors in Hypothesis testing
  • Type I Error the null hypothesis is rejected
    when it is in fact true that is, H0 is wrongly
    rejected.
  • Type II Error the null hypothesis H0, is not
    rejected when it is in fact false

39
Meta- Analysis
  • Meta-analysis refers to the analysis of
    analyses...the statistical analysis of a large
    collection of analysis results from individual
    studies for the purpose of integrating the
    findings. (Glass, 1976, p. 3)
  • Conduct a fairly extensive search for relevant
    studies
  • Identify appropriate studies to include in
    meta-analysis
  • Convert each studys results to a common
    statistical index

40
Using Statistical Software Packages
  • SPSS
  • SAS
  • Matlab Statistics toolbox
  • SYSTAT, Minitab, Stat View, Statistica

41
Interpreting the Data
  • Relating the findings to the original research
    problem and to the specific research questions
    and hypothesis
  • Relating the findings to preexisting literature,
    concepts, theories and research results.
  • Determining whether the findings have practical
    significance as well as statistical significance
  • Identifying limitations of the study
Write a Comment
User Comments (0)
About PowerShow.com