Title: What Is a Good Data Characterization
1What Is a Good Data Characterization?
2Outline
- Characterization via functional equations
- Functional equations
- Homogeneity and its extensions
- Location-invariance and related conditions
- Equivariant functions
- Permutation-invariance
- Outlier detection procedures
- Quasi-linear means
- Results for positive-breakdown estimators
- Characterization via inequalities
- Inequalities as aids to interpretation
- Relations between data characterizations
- Bounds on means and standard deviations
- Inequalities as uncertainty descriptions
- Coda What is a good data characterization?
3What is a Good data characterization?
- A good data characterization K of dataset D
should depend predictably on certain simple,
systematic modifications of D. - One example is that the T(axb)aT(x)b holds.
4Characterization via functional equations
- In favorable cases the functional equations can
be solved to yield much insight into the
behaviour of different classes of data
characterizations. - Functional equations are not as well known to
non-mathematicians as many other branches of
mathematics. In this chapter the author descripe
functional equations in many details.
5Functional equations
- The term functional is quite hard to define
precisely. - A set of simple but very instructive example are
the four functional equations considered by
Cauchy. - Cauchys basic equation
- f(xy)f(x)f(y) (5.11)
- f(rx)rf(x) (5.17)
- Cauchys exponential equation. For real x and y
- f(xy)f(x)f(y) (5.24)
- Cauchys logarithmic equation. For real x and y,
x?0,y?0 - f(xy)f(x)f(y) (5.29)
- (usually enough to consider x,y gt0)
- Cauchys power equation. For real x and y,
x?0,y?0 - f(xy)f(x)f(y) (5.29)
- (usually enough to consider x,y gt0)
6Homogeneity and its extensions
- In the simpliest form, a function f RN?R is
homogeneous if it satisfies f(ax1ax2
...axn) af(x1x2 ...xn), for all x
? RN, a ? R. - A generalized homogeneous function f RN?R must
satisfy the condition f(ax1ax2 ...axn)
g(a)f(x1...xn) , for some function g() and
for all x ? RN, a ? R, a?0. - Trivial solution would be f(x)0 for all x? RN
7Homogeneity and its extensions, contd.
- If f(x)?0, then we can obtain the functional
equation g(a)g(b)g(ab), which is simply the
Cauhcys power equation. - Homogeneity of order c can be defined
- f(ax1ax2 ...axn) ac f(x1x2 ...xn)
- If f RN?R is homogeneous, then f(0)0.
8Location-invariance and related conditions
- A function f RN?R is location invariant if it
satisfies the condition f(x1c,x2
c,...,xnc) f(x1,...,xn)c, for all x ? RN,
c ? R. - A function f f(x1...xn) xi F(x1-xi
,..., xn-xi), satisfies the location-invariance
condition, when F is any function of the n-1
arguments xj-xi where i ?j.
9Equivariant function
- Equivariant function is both location-invariant
and homogeneous function. - A function f RN?R is homogeneous if it satisfies
- f(ax1ax2 ...axn) af(x1x2 ...xn), for all
x ? RN, a ? R. - A function f RN?R is location invariant if it
satisfies the condition f(x1c,x2 c,...,xnc)
f(x1,x2 ,...,xn)c, for all x ? RN, c ? R. - Now, remember that if f RN?R is homogeneous,
f(0)0. - Then, f(x,x,...,x)f(0,0,...,0)xx, for all real
x. - Aczel (1966, p.236) shows that the following
construction leads to an equivariant function - f(x1,x2 ,...,xn) µ s(z1,z2 ,...,zn),
- where µ avg(x), s std(x) and zi(xi µ)/s and
G RN?R arbitrary.
10Permutation-invariance
- Useful condition often imposed on data
characterization is permutation-invariance. - f(Px)f(x) for any permutation P of the
components of x. - This property is natural one for the data where
individual observations are assumed exchangeable
and equally treated. - For example if the function is weighted average,
the weights must be equal in order to have
permutation-invariance, which further implies the
arithmetic average function.
11Outlier detection procedures
- Outlier detection rules introduced previously
need to be invariant under affine changes of
measurement units. - These conditions are actually functional
equations that characterize the location
estimator and scale estimator on which the
outlier procedure is based.
12Outlier detection procedures, contd.
- The location estimator can be represented
µ(x1,x2 ,...,xn) µ sG(z1,z2 ,...,zn),
where µ avg(x), sstd(x) and zi(xi µ)/ s and
G RN?R arbitrary. - It can be solved G(z1,z2 ,...,zn) µ(z1,z2
,...,zn) - The scale estimator S(x) S(s x µ) s S(z).
- Now the outlier detection rule in terms of
z-scores zk G(z)gttS(z) ? xk is an outlier. - From the chapter 3 If G(z) µ(z)0 and S(z) 1,
- the standard z-score representation
- zkgtt ? xk is an outlier.
13Outlier detection procedures, contd.
- Other outlier rules
- Hampel identifier µ(z)med(z1,z2
,...,zn)(x- µ)/ s , where xmed(x) - This corresponds to Hotellings skewness measure
which is known to satisfy the bounds (x- µ)/
s 1. - Now the median based outlier detection criterion
is - zk- (x- µ)/ s gttS(z) ? xk is an outlier.
14Quasi-linear means
- The family of quasi-linear means is a useful
family of data characterizations. - That family has played a important role in both
the theory of functional equations and the
classical theory of inequalities. - Quasi-linear means can be obtained with formula
A(x1,x2 ,...,xn) A(x)F-1(1/N) ? F(xi). - arithmethic mean can be obtained when F(x) x
A(x)(1/N) ?xi - geometric mean can be obtained when F(x) ln x
G(x)(?xi)1/N - harmonic mean can be obtained when F(x) 1/x
H(x)(1/N) ?(1/xi)-1
15Results for positive-breakdown estimators
- An estimator T exhibits a finite sample breakdown
point of m/n if m outliers in a dataset of size n
can be chosen in a way that causes T to exhibit
arbitrarily extreme values. - Arithmetic mean exhibit finite sample breakdown
points of 1/n, which goes to zero in the limit of
infinitely large sample sizes. This kind of
estimators are called zero-breakdown estimators. - Estimators with a finite sample breakdown points
m/n gt 0 for all n, are called positive-breakdown
estimators.
16Results for positive-breakdown estimators
- Positive-breakdown estimators often exhibit one
or more transformation invariance properties.
Typically called equivariance properties. - Estimator T is scale-equivariant, if
- T(xiT, ayi) aT(xiT, yi)
- Estimator T is regression-equivariant, if
- T(xiT, yi xiTv) T(xiT, yi) v
- Estimator T is affine-equivariant, if
- T(AxiT, yi) A-T T(xiT, yi)
17Characterization via inequalities
- Inequalities arise in data analysis for a number
of reasons. - Inequalities can provide useful guidance in
interpreting certain data characterizations - Inequalities can provide insight into the
relationships between different data
characterizations - Inequalities can be used when we are estimating
the bounds on data characterization that cannot
be computed exactly (for example when complete
data is not available but some statistics are
known) - Inequalities can be used as an uncertainty
description in the set-theoretic or
unknown-but-bounded error model.
18Inequalities as aids to interpretation
- Mean (m) and standard deviation (s) are extremely
useful standard data characterization in many
applications. - A direct consequence of Chebychevs inequality
P(x-mgta) s2/a2 - Cauchy-Shwarz inequality ?xk,yk
(?xk2)1/2(?yk2)1/2 - Product moment correlation coefficient
- rx,y follows Cauchy-Shwarz
- inequality rx,y 1
19Relations between data characterizations
- It is important to recognize that inequalities
sometimes relate different data characterizations
of the same data sequence. - One classical inequality with many applications
is the AGM inequality - G(x1,...,xN)(?xi)1/N (1/N) ?xi
A(x1,...,xN), when xi 0 for all i. - This inequality extends to to weighted arithmetic
and geometric means - LG(x1,x2 ,...,xN) LA(x1,x2 ,...,xN)
20AGM inequality
- AGM corresponds to an important special case of a
more general inequality between the generalized
means Mr - Mr(x1,x2 ,...,xN) ?aixir 1/r , ai 0, ?ai
1, r ? 0, xi gt0. - Now the family of generalized means satisfies the
inequality - r lt s ? Mr(x1,x2 ,...,xN) Ms(x1,x2 ,...,xN)
and the inequality is strict unless xi
x for all i. - In the special case r-1 we obtain the harmonic
mean. - It follows that the harmonic, geometric and
arithmetic means are related by - ? (ai / xi)-1 ?xi ai ? ai xi ,
- ai 0, ?ai 1, xi gt0 for all i,
- and as before, both inequalities are strict
unless xi x for all i.
21Bounds on means and standard deviations
- Sometimes complete dataset is not available. Let
us present the computational bounds for the
trasformed data zk t(xk) for the cases where
the transformation is known but neither of the
complete data sequences xk nor zk is
available. - It is assumed that the following statistics for
the original data sequence xk is available - mean m (1/N) ? xi
- standard deviation s ((1/N) ? (xi -m)2)1/2
- the sample minimum mmin(xk)
- the sample maximum Mmax(xk)
22Bounds on means and standard deviations
- The transform function t is studied in more
details by Rowe (1988). - For convex function Fa,b ? a,b, p(x)0
- And the normalization condition
- If we take f(x)x, we obtain F(E(x)) E(F(x)),
which is the basis for the next results.
23Bounds on means and standard deviations
- The simplest of Rowes results Given only the
mean µ, the minimum m, and the maximum M of the
original sequencet(µ) 1/N ? t(xi)
t(m)(µ-m)/(M-µ)t(M)-t(m) - Now, we can for example obtain bounds for
harmonic mean m mM/(Mm-µ) 1/N ?
1/xi-1 µ M - With the Rowes rules we can obtain bounds also
for other functions. These are discussed more
detailed in the chapter 5.3.3.
24Inequalities as uncertainty descriptions
- The set-theoretic or unknown-but-bounded data
model could be an alternative to the more popular
random data model. - This set-theoretic model is closely related to
the ideas of interval arithmetic. - Real data values x are replaced with closed,
bounded intervals XX-,X, for some X- lt X. - Bounds on data characterization computed from
interval-valued data sequence do not depend on
distributional assumptions, but instead on
strictly weaker bounding assumptions. - And this is why
- The midpoint of the interval m(X) (X- X )/2
- The width w(x) X - X-
- Now the interval may be presented
Xm(X)(w(X)/2) -1,1. - This is now analoguos to the normalization of
random variable with mean µ and standard
deviation s to obtain zero-mean, unit-variance
random variable z. - z (x µ ) / s ? x µsz
25Interval arithmetic
- To be able to use the interval data model, we
need to define interval arithmetics (Moore 1979) - A addition X-,X Y-,Y X-Y- ,XY
- S subtraction X-,X - Y-,Y XY
,X-Y- - M multiplication X-,X Y-,Y min(X-Y-,
X-Y, XY-, XY), max( X-Y-, X-Y, XY-, XY) - D division X-,X / Y-,Y X-,X
1/Y-,Y, where 1/Y-,Y 1/Y,1/Y-, provided
0 not in Y-,Y. - Interval arithmetic operations are not fully
equivalent to their real-number countepairs. So
be careful when you are using them.
26Example
- Consider the general 2x2 table summarizing two
mutually exclusive characterizations. - Now the linear equations can be collected to a
matrix
Contingency table
27Example, contd
- We can set the bounds for each n 0nij min(ni.,
n.j)
28Coda What is a good data characterization?
- Although all these functions provide some insight
into various goodness criteria the preceding
discussion have not really answered to the
question posed in the title. What is a good
data characterization? - The goodness may be judged for example with
following - predictable qualitative behaviour,
- ease of interpretation,
- appropriateness to the application,
- historical acceptance,
- availability,
- computational complexity.
29Predictable qualitative behaviour
- Data characterics can be predicted if we use
functional equations that obey known rules. - These methods are easy to use.
30Ease of interpretation
- Use of normalized criteria such as Hotellings
skewness measure leads to easier interpretation
than nonnormalized measures as such. - Ease of interpretation usually has a strong
application-spesific component, related in part
to the criterion of historical acceptance.
31Appropriateness to the application
- Crucial, but may not be met in practise if the
working assumptions on which the selected
characterization is based are violated badly. - Historically, one of the key motivations for
developing robust statistical methods, was the
need to have alternatives for analysis that
require Gaussian distributed data. - for example t-test See Figure.
32Example
- Upper two plots show two Gaussian sequences of
the same length and the same variance but with
different means. This difference is detected with
t-test. - Lower plots show two data sequences with means 1
and 0, but now the data is heavy-tailed Students
t distribution with two degrees of freedom. Now
the hyphoteses that the sample means are equal,
cannot be rejected.
33Historical acceptance
- Many types of analysis are so frequently
approached using one or more methods with strong
historical acceptance that it may require
substantial justification to present results
obtained with any other method. - Gaussian assumption is required in many cases,
and the historically most often used might not be
the best one. - It is best to include standard methods among
considered ones, both to confirm the suspiciouns
of weakness in the historically accepted methods
and to demonstrate these weakness to others.
34Availability
- Availability of the computational software to
implement a particular analysis method. - Freeware packages like R have improved the
availability - However, one has to be always careful what
assumptions are made when using a new analysis
method. - Computational methods usually return numerical
values, even if none of the assumptions done for
the data were true.
35Computational complexity
- Computational complexity is still a significant
limitation of the utility of certain data
charaterizations despite the advances in
computing resources. - In many cases the objective of an algorithm can
be divided to several different optimization
problems.
36Conclusions
- Only rarely one method is best with respect to
all of these criteria. - Sometimes no single method is acceptably good
with respect to all of them. - As a consequence it is important to take broader
view, comparing and evaluating a set of methods
each of which is highly desirable with respect of
these criterias. - Such comparisons can be useful
- They can quantify the uncertainty in the results
- They can suggest direction for refinement of the
analysis that may lead more generally
satisfactory result.