Basic Statistics with Microsoft Excel - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Basic Statistics with Microsoft Excel

Description:

To show you how to use some of the statistical worksheet ... Kurtosis. Skewness. Confidence Level for Mean. Range. Minimum. Maximum. Sum. Count. kth Largest ... – PowerPoint PPT presentation

Number of Views:618
Avg rating:3.0/5.0
Slides: 24
Provided by: traini6
Category:

less

Transcript and Presenter's Notes

Title: Basic Statistics with Microsoft Excel


1
Basic Statistics with Microsoft Excel
  • Helen Dixon

2
Aim and Objectives
  • Aim of todays course
  • To illustrate how Excel can be used to carry out
    some basic statistical analyses and tests
  • Objectives
  • To show you how to use some of the statistical
    worksheet functions available within Excel
  • To show you how to use some of the tools
    available in the Analysis ToolPak
  • To make you aware of the limitations of Excel

3
Why use Excel?
  • Software more accessible
  • Previous familiarity with software
  • Easy to format output
  • Better charting facilities than some statistical
    applications
  • Access to other key Excel facilities
  • Easy to use results with other applications

4
Problems with Excel
  • Errors due to rounding, missing data or extreme
    values
  • Not suitable for very large data sets
  • Output labelled or arranged inappropriately
  • Need to repeat processes for different variables
    or options
  • No record of analyses
  • Some algorithms are numerically unstable - little
    or no information about algorithms employed
  • Analysis ToolPak results are not dynamic and may
    vary with results generated by functions

5
Statistical Functions
  • Frequency Distributions
  • Mean, Median and Mode
  • Percentiles and Quartiles
  • Deviation and Squared Deviation about the Mean
  • Variance and Standard Deviation
  • Covariance and the Correlation Coefficient

6
Frequency
  • Use COUNTIF to count how many times an item
    appears in a list
  • COUNTIF(range, criteria)
  • Use FREQUENCY to calculate how often values occur
    within a range
  • FREQUENCY(data_array, bins_array)
  • Can also use Histogram tool in Analysis Toolpak

7
Mean, Median, Mode
  • Use AVERAGE or AVERAGEA to calculate the
    arithmetic mean
  • AVERAGE(number1, number2, etc.)
  • Use MEDIAN to return the middle number
  • MEDIAN(number1, number2, etc)
  • Use MODE to return the most common value
  • MODE(number1, number2, etc)

8
Percentiles and Quartiles
  • Use PERCENTILE to return the kth percentile of a
    data set
  • PERCENTILE(array, percentile)
  • Percentile argument is a value between 0 and 1
  • Use QUARTILE to return the given quartile of a
    data set
  • QUARTILE(array, quart)
  • Quart is 1, 2, 3 or 4
  • IQR Q3-Q1
  • May return different values to statistical package

9
Variance and Standard Deviation
  • Use VAR, VARA, VARP or VARPA to calculate the
    variance for a range
  • E.g. VAR(value1, value 2, etc.)
  • Squared deviations about the mean/N or /n-1
  • Use STDEV, STDEVA, STDEVP or STDEVPA to calculate
    the standard deviation for a range
  • E.g. STDEV(value1, value2, etc.)
  • Positive square root of variance

10
Covariance and the Correlation Coefficient
  • Use COVAR to calculate the covariance
  • COVAR(array1, array2)
  • Average of products of deviations for each data
    point pair
  • Depends on units of measurement
  • Use CORREL to return the correlation coefficient
  • CORREL(array1, array2)
  • Returns value between -1 and 1
  • Also available in Analysis ToolPak

11
Probability
  • Numerical measure of the likelihood that an event
    will occur
  • Some probabilities that can be calculated using
    Excel
  • Binomial Probabilities
  • Poisson Probabilities
  • Hypergeometric Probabilities
  • Normal Probabilities
  • Exponential Probabilities

12
Binomial Probabilities
  • Use BINOMDIST to compute binomial distribution
    probabilities and cumulative binomial
    probabilities
  • BINOMDIST(number_s, trials, probability_s,
    cumulative)
  • Calculates the probability that a sequence of
    independent trials with two possible outcomes
    will have a given number of successes
  • Cumulative is either TRUE or FALSE

13
Poisson Probabilities
  • Use POISSON to compute Poisson Probabilities
  • POISSON(x, mean, cumulative)
  • Shows the probability of x occurrences of an
    event over a specified interval of time or space

14
Hypergeometric Probabilities
  • Use HYPGEOMDIST to compute hypergeometric
    probabilities
  • HYPGEOMDIST(sample_s, number_sample,
    population_s, number_pop)
  • Computes the probability of x successes
    (sample_s) in n trials (number_sample) when the
    trials are dependent
  • Similar to Binomial except trials are not
    independent probability of success changes from
    trial to trial
  • Does not compute cumulative probabilities

15
Normal Probabilities
  • Use NORMSDIST or NORMDIST to compute the
    cumulative probability
  • NORMSDIST(z)
  • NORMDIST(x, mean, standard_dev, cumulative)
  • Use NORMSINV or NORMINV to compute the z or x
    value given a cumulative probability
  • NORMSINV(probability)
  • NORMINV(probability, mean, standard_dev)

16
Exponential Probabilities
  • Use EXPONDIST to compute exponential
    probabilities
  • EXPONDIST(x, lambda, cumulative)
  • x is the random variable
  • Lambda is 1/mean
  • Useful in computing probabilities for the time it
    takes to complete a task

17
Analysis ToolPak
  • Descriptive Statistics
  • Correlation
  • Linear Regression
  • t-Tests
  • z-Tests
  • ANOVA
  • Covariance

18
Descriptive Statistics
  • Mean, Median, Mode
  • Standard Error
  • Standard Deviation
  • Sample Variance
  • Kurtosis
  • Skewness
  • Confidence Level for Mean
  • Range
  • Minimum
  • Maximum
  • Sum
  • Count
  • kth Largest
  • kth Smallest

19
Correlation and Regression
  • Correlation is a measure of the strength of
    linear association between two variables
  • Values between -1 and 1
  • Values close to -1 indicate strong negative
    relationship
  • Values close to 1 indicate strong positive
    relationship
  • Values close to 0 indicate weak relationship
  • Linear Regression is the process of finding a
    line of best fit through a series of data points
  • Can also use the SLOPE, INTERCEPT, CORREL and RSQ
    functions

20
t-Tests and z-Tests
  • Used to test hypotheses by comparing means
  • If sample means are equal suggests both samples
    came from same population
  • t-Test n lt30
  • Equal or unequal variances or paired test
  • Check result using TTEST function
  • z-Test ngt30
  • Used for means with known variances

21
ANOVA Analysis of Variances
  • Compares variances in two or more data sets
  • If difference is found it can be assumed that the
    means of the data sets are different
  • Single Factor use instead of t-Test for more
    than 2 samples
  • Two Factor with Replication useful when data
    can be classified along 2 different dimensions
  • Two Factor without Replication as above but
    only one observation for each pair

22
PivotTables
  • Use for crosstabulations
  • Data must be in tabular format columns with
    headings, no blank columns
  • Easy to pivot data
  • Easy to create PivotCharts
  • Can summarise and analyse data without affecting
    data source

23
Final Tips
  • Excel only suitable for basic analysis using
    small data sets
  • Later versions of Excel more reliable than Excel
    97
  • Check Analysis TookPak results with worksheet
    functions
  • Check overall results by hand or with dedicated
    statistical package
Write a Comment
User Comments (0)
About PowerShow.com