Descriptive Statistics - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Descriptive Statistics

Description:

... calculate measures of kurtosis and skewness. Spatial Statistics: ... Central tendency, dispersion, kurtosis, skewness. Distribution. Spatial Statistics: Topic 3 ... – PowerPoint PPT presentation

Number of Views:172
Avg rating:3.0/5.0
Slides: 49
Provided by: pmdrhamid
Category:

less

Transcript and Presenter's Notes

Title: Descriptive Statistics


1
Descriptive Statistics
Spatial Statistics (SGG 2413)
  • Assoc. Prof. Dr. Abdul Hamid b. Hj. Mar Iman
  • Director
  • Centre for Real Estate Studies
  • Faculty of Engineering and Geoinformation Science
  • Universiti Tekbnologi Malaysia
  • Skudai, Johor

2
Learning Objectives
  • Overall To give students a basic understanding
    of descriptive statistics
  • Specific Students will be able to
  • understand the basic concept of descriptive
  • statistics
  • understand the concept of distribution
  • can calculate measures of central tendency
  • dispersion
  • can calculate measures of kurtosis and
    skewness

3
Contents
  • What is descriptive statistics
  • Central tendency, dispersion, kurtosis, skewness
  • Distribution

4
Descriptive Statistics
  • Use sample information to explain/make
    abstraction of population phenomena.
  • Common phenomena
  • Association (e.g. s1,2.3 0.75)
  • Tendency (left-skew, right-skew)
  • Trend, pattern, location, dispersion, range
  • Causal relationship (e.g. if X then Y)
  • Emphasis on meaningful characterisation of data
    (e.g. central tendency, variability), graphics,
    and description
  • Use non-parametric analysis (e.g. ?2, t-test,
    2-way anova)

5
E.g. of Abstraction of phenomena
6
Inferential Statistics
  • Using sample statistics to infer some phenomena
    of population parameters
  • Common phenomena cause-and-effect
  • One-way r/ship
  • Feedback r/ship
  • Recursive
  • Use parametric analysis (e.g. a and ?) through
    regression analysis
  • Emphasis on hypothesis testing

Y f(X)
Y1 f(Y2, X, e1) Y2 f(Y1, Z, e2)
Y1 f(X, e1) Y2 f(Y1, Z, e2)
7
Parametric statistics
  • Statistical analysis that attempts to explain the
    population parameter using a sample
  • E.g. of statistical parameters mean, variance,
    std. dev., R2, t-value, F-ratio, ?xy, etc.
  • It assumes that the distributions of the
    variables being assessed belong to known
    parameterised families of probability
    distributions

8
Examples of parametric relationship
Dep9t 215.8
Dep7t 192.6
9
Non-parametric statistics
  • First used by Wolfowitz (1942)
  • Statistical analysis that attempts to explain the
    population parameter using a sample without
    making assumption about the frequency
    distribution of the assessed variable
  • In other words, the variable being assessed is
    distribution-free
  • E.g. of non-parametric statistics histogram,
    stochastic kernel, non-parametric regression

10
Descriptive Inferential Statistics (DS IS)
  • DS gather information about a population
    characteristic (e.g. income) and describe it with
    a parameter of interest (e.g. mean)
  • IS uses the parameter to test a hypothesis
    pertaining to that characteristic. E.g.
  • Ho mean income RM 4,000
  • H1 mean income lt RM 4,000)
  • The result for hypothesis testing is used to make
    inference about the characteristic of interest
    (e.g. Malaysian ? upper middle income)

11
Sample Statistics Central Tendency
12
Central Tendency Mean
  • For individual observations, . E.g.
  • X 3,5,7,7,8,8,8,9,9,10,10,12
  • 96 n 12
  • Thus, 96/12 8
  • The above observations can be organised into a
    frequency table and mean calculated on the basis
    of frequencies

  • 96 12
  • Thus, 96/12 8

13
Central Tendency - Mean and Mid-point
  • Let say we have data like this

Price (RM 000/unit) of Shop Houses in Skudai
Can you calculate the mean?
14
Central Tendency - Mean and Mid-point (contd.)
  • Lets calculate
  • Town A (228450)/2 339
  • Town B (320430)/2 375
  • Are these figures means?

M ½(Min Max)
15
Central Tendency - Mean and Mid-point (contd.)
  • Lets say we have price data as follows
  • Town A 228, 295, 310, 420, 450
  • Town B 320, 295, 310, 400, 430
  • Calculate the means?
  • Town A
  • Town B
  • Are the results same as previously?
  • ? Be careful about mean and mid-point!

16
Central Tendency Mean of Grouped Data
  • House rental or prices in the PMR are frequently
    tabulated as a range of values. E.g.
  • What is the mean rental across the areas?
  • 23 3317.5
  • Thus, 3317.5/23 144.24

17
Central Tendency Median
  • Let say house rentals in a particular town are
    tabulated
  • Calculation of median rental needs a graphical
    aids?
  • Median (n1)/2 (251)/2 13th. Taman
  • 2. (i.e. between 10 15 points on the vertical
    axis of ogive).
  • 3. Corresponds to RM 140-145/month on the
    horizontal axis
  • 4. There are (17-8) 9 Taman in the range of RM
    140-145/month

5. Taman 13th. is 5th. out of the 9
Taman 6. The rental interval width is 5 7.
Therefore, the median rental can be
calculated as 140 (5/9 x 5) RM 142.8
18
Central Tendency Median (contd.)
19
Central Tendency Quartiles (contd.)
Following the same process as in calculating
median
Upper quartile ¾(n1) 19.5th. Taman UQ 145
(3/7 x 5) RM 147.1/month Lower quartile
(n1)/4 26/4 6.5 th. Taman LQ 135 (3.5/5
x 5) RM138.5/month Inter-quartile UQ LQ
147.1 138.5 8.6th. Taman IQ 138.5 (4/5 x
5) RM 142.5/month
20
Variability
  • Indicates dispersion, spread, variation,
    deviation
  • For single population or sample data
  • where s2 and s2 population and sample
    variance respectively, xi individual
    observations, µ population mean, sample
    mean, and n total number of individual
    observations.
  • The square roots are
  • standard deviation standard deviation

21
Variability (contd.)
  • Why measure of dispersion important?
  • Consider yields of two plant species
  • Plant A (ton) 1.8, 1.9, 2.0, 2.1, 3.6
  • Plant B (ton) 1.0, 1.5, 2.0, 3.0, 3.9
  • Mean A mean B 2.28
  • But, different variability!
  • Var(A) 0.557, Var(B) 1.367
  • Would you choose to grow plant A or B?

22
Variability (contd.)
  • Coefficient of variation CV std. deviation as
    of the mean
  • A better measure compared to std. dev. in case
  • where samples have different means. E.g.
  • Plant X (ton/ha) 1.2, 1.4, 2.6, 2.7, 3.9
  • Plant Y (ton/ha) 1.4, 1.5, 2.1, 3.2, 3.9

23
Variability (cont.)
Calculate CV for both species.
CVx (1.2/2.36) x 100
50.97 CVy (1.2/2.42) x 100 49.46
? Species X is a little more variable than
species Y
24
Variability (cont.)
  • Std. dev. of a frequency distribution
  • E.g. age distribution of second-home buyers
    (SHB)

25
Probability distribution
  • If there 20
    lecturers, the probability that A becomes a
    professor is p 1/20 0.05
  • Out of 100
    births, half of them were girls (p0.5), as the
    number increased to 1,000, two-third were girls
    (p0.67) but from a record of 10,000 new-born
    babies, three-quarter were girls (p0.75)
  • The
    probability of a drug addict recovering from
    addiction is 5050
  • General rule
  • No. of times event X
    occurs
  • Pr (event X) --------------------------------
    -----
  • Total number of
    occurrences
  • Probability of certain event X to occur has a
    specific form of distribution

Logical probability
Experiential probability
Subjective probability
26
Probability Distribution
Classical example of
tossing
What is the distribution of the sum of tosses?
27
Probability Distribution (contd.)
Discrete variable
Values of x are discrete (discontinuous) Sum of
lengths of vertical bars ?p(Xx) 1
all x
28
Probability Distribution (cont.)
Continuous variable
Mean 39.5 Std. dev 2.45
Pr (Area under curve) 1
Pr (Area under curve) 1
Age distribution of second-home buyers in
probability histogram
29
Probability Distribution (cont.)
  • Pr (Age 36) 0.02
  • Pr (Age 37) Pr (Age 36) Pr (Age 37)
    0.02 0.07 0.09
  • Pr (Age 38) Pr (Age 37) Pr (Age 38)
    0.09 0.04 0.13
  • Pr (Age 39) Pr (Age 38) Pr (Age 39)
    0.13 0.18 0.31
  • Pr (Age 40) Pr (Age 39) Pr (Age 40)
    0.31 0.36 0.67
  • Pr (Age 41) Pr (Age 40) Pr (Age 41)
    0.67 0.14 0.81
  • Pr (Age 42) Pr (Age 41) Pr (Age 42)
    0.81 0.10 0.91
  • Pr (Age 43) Pr (Age 42) Pr (Age 43)
    0.91 0.09 1.00

?Cumulative probability corresponds to the
left tail of a distribution
30
Probability Distribution (cont.)
Larger sample
  • As larger and larger samples are drawn, the
    probability distribution is getting smoother
  • Tens of different types of probability
    distribution Z, t, F, gamma, etc
  • Most important normal distribution

Very large sample
31
Normal Distribution - ND
  • Salient features of ND
  • Bell-shaped, symmetrical
  • Total area under curve 1
  • Area under curve between
  • any two points prob. of
  • values in that range (shaded area)
  • Prob. of any exact value 0
  • Has a function of


µ mean of variable x s std. dev. of x p
ratio of circumference of a circle to its
diameter 3.14 e base of natural log
2.71828.
32
Normal Distribution - ND
Population 2
Population 1
?2
?1
?1 ?2
A larger population has narrower base
(smaller variance)
? determines location while ? determines
shape of ND
33
Normal Distribution (cont.)
Has a mean ? and a variance ?2, i.e. X ? N(?,
?2 ) Has the following distribution of
observation
Home-buyers example
Mean age 39.3 Std. dev 2.42
34
Standard Normal Distribution (SND)
  • Since different populations have different ? and
    ? (thus, locations and shapes of distribution),
    they have to be standardised.
  • Most common standardisation standard normal
    distribution (SND) or called Z-distribution
  • ?(Xx) is given by area under curve
  • Has no standard algebraic method of integration
  • ? Z N(0,1)
  • To transform f(x) into f(z)
  • x - µ
  • Z ------- N(0, 1)
  • s

35
Z-Distribution
  • Probability is such a way that
  • Approx. 68 -1lt z lt1
  • Approx. 95 -1.96 lt z lt 1.96
  • Approx. 99 -2.58 lt z lt 2.58

36
Z-distribution (cont.)
  • When X µ, Z 0, i.e.
  • When X µ s, Z 1
  • When X µ 2s, Z 2
  • When X µ 3s, Z 3 and so on.
  • It can be proven that P(X1 ltXlt Xk) P(Z1 ltZlt Zk)
  • SND shows the probability to the right of any
    particular value of Z.

37
Normal distributionQuestions
  • A study found that the mean age, A of second-home
    buyers in Johor Bahru
  • is 39.3 years old with a variance of RM
    2.45.Assuming normality, how sure
  • are you that the mean age is (a) 40 years old
    (b) 39 to 42 years old?
  • Answer (a) P(A 40)
  • PZ (40 39.3)/2.4
  • P(Z 0.2917? 0.3000)
  • 0.3821
  • (b) P(39 A 42)
  • P(A 39) P(A 42)
  • 0.45224 PA
    (42-39.3)/2.4
  • 0.45224 P(A 1.125)
  • 0.45224 0.12924
  • 0.3230

Use Z-table!
Always remember to convert to SND, subtract the
mean and divide by the std. dev.
38
Students t-Distribution
  • Similar to Z-distribution (bell-shaped,
    symmetrical)
  • Has a function of
  • where ? gamma distribution v n-1
    d.o.f ? 3.147
  • Flatter with thicker tails
  • Distributed with t?(0,s) and -8 lt t lt 8
  • As n?8 t?(0,s) ? N(0,1)
  • Probability calculation requires
  • information on d.o.f.

39
How Are t-dist. and Z-dist. Related?
  • Using central limit theorem, ?N(?, ?2/n) will
    become
  • z?N(0, 1) as n?8
  • ?For a large sample, t-dist. of a variable or a
  • parameter is given by
  • The interval of critical values for variable, x
    is

40
Skewness, m3 Kurtosis, m4
  • Skewness, m3 measures degree of symmetry of
    distribution
  • Kurtosis, m4 measures its degree of peakness
  • Both are useful when comparing sample
    distributions with different shapes
  • Useful in data analysis

41
Skewness
42
Kurtosis
Mesokurtic distributionkurtosis 3 Leptokurtic
distributionkurtosis lt 3 Platykurtoc
distributionkurtosis gt 3
43
Occurrence of ganoderma
Occurrence of ganoderma
44
Aluminium residues in the soil
E.g. Al2 H2O--
? Al2O H2
45
Measures of spatial separation
  • E.g. WCM ((545.10-542.86)2 (105.90-105.48)2)0.
    5
  • (5.0176 0.1764)0.5
  • 2.28 (i.e. 2,280 m)

46
Spatial distribution
Occurrence of ganoderma
47
Spatial distribution point data
Ethnic distribution of residence
48
Ethnic distribution of residence
Write a Comment
User Comments (0)
About PowerShow.com