Statistical Analysis Of Population - PowerPoint PPT Presentation

About This Presentation
Title:

Statistical Analysis Of Population

Description:

A sample is a subset of random numbers from P. ... Let the number of females got by this be 20,000, ... By taking the Square Root we get the value as 0.0067. ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 13
Provided by: SUSH91
Learn more at: https://crystal.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: Statistical Analysis Of Population


1
Statistical Analysis Of Population
  • Prepared by,
  • Sushruth Puttaswamy

2
Contents
  • Population
  • Sampling a Population
  • Relation between Mean of a Sample Mean of the
    population
  • Estimation of Error
  • Sample Query

3
Population
  • Basically a bunch of Numbers.
  • P y1, y2, y3 yn
  • Objective is to do some statistical analysis on
    the population.

4
Sampling a Population
  • Consider x to be a random number from the
    population.
  • Each element of P has an equal opportunity to be
    selected.
  • A sample is a subset of random numbers from P.
  • Sampling in this situation is assumed to be done
    with replacement.

5
Relation between Mu Y
  • Let the Average/Mean of P be Y.
  • Suppose we sample k numbers out of P (with
    replacement).
  • Let Mu be the mean of the sample Y.
  • Objective is to find a relation between Mu
    Y.
  • Estimate mod( Mu Y).
  • Instead of mod, ( Mu Y) is better.

2
6
Standard Error Formula
  • The Standard Error Formula gives us an estimate
    of the error in the sampling process.
  • It is given by E ( Mu Y) (Var) / k.
  • Var is the variance of the population P.
  • The RHS in the formula gives us the standard
    error of the sampling process.
  • The formula does not depend on n, the number of
    elements in the population.

2
2
7
Sampling Methods
  • Sampling must get all columns of a row from the
    database.
  • The aim is to reduce the error of the estimate.
  • The estimate should be unbiased each time, that
    is E Mu Y.
  • Random Sampling doesnt give a good estimate when
    the query has low selectivity.

8
Sample Query
  • Let us apply Random Sampling to a Database Query.
  • Let Emp be a DB table which has Gender as 1 of
    the columns along with 100,000 records.
  • How many female employees are there?
  • The SQL Query for this is SELECT COUNT() FROM
    Emp WHERE genderF.

9
Query Using Random Sampling
  • Let us select a sample of size 100 (Emp_sam)
    assume that no extra overhead is required for
    getting the samples.
  • Now the query on the sample is SELECT
    COUNT()n/k FROM Emp_sam WHERE genderF
  • To find this value lets assume a hypothetical
    column in the DB which has a 0 for Male 1 for
    Female.
  • Now adding all 1s in the result , find the
    average multiplying by n gives us the number of
    females.
  • Let the number of females got by this be 20,000,
    which means there are 80,000 males.

10
Estimation of Error
  • To find the error we need to find the variance.
  • From previous result, number of females20,000
    which means there are 20000 1s.
  • Mean of the sample Mu20000/1000000.2.
  • Var (0-0.2) 80000 (1-0.2) 20000/100000
  • We get the Variance as 0.00448.
  • From the Standard Error formula we have EVar /
    k, that is 0.0000448.

2
2
11
Estimation of Error
  • 0.0000448 is the square of the error when trying
    to estimate ratio of females to the population.
  • By taking the Square Root we get the value as
    0.0067.
  • Multiplying this value by n we get the value
    670.
  • This tells us the error.
  • This means the number of females is 20000 /-
    670.
  • The error in our calculation is 670.

12
Conclusion
  • The error can be reduced by increasing the sample
    size.
  • According to the formula, reducing the variance
    also lessens the error.
  • Without going through all the records we could
    find the result of the query along with the level
    of error associated with it.
Write a Comment
User Comments (0)
About PowerShow.com