What Is a Good Data Characterization - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

What Is a Good Data Characterization

Description:

A generalized homogeneous function f: RNR must satisfy the condition f(ax1 ax2 ... Equivariant function is both location-invariant and homogeneous function. ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 37
Provided by: studen93
Category:

less

Transcript and Presenter's Notes

Title: What Is a Good Data Characterization


1
What Is a Good Data Characterization?
  • Reija Autio
  • 7.12.2005

2
Outline
  • Characterization via functional equations
  • Functional equations
  • Homogeneity and its extensions
  • Location-invariance and related conditions
  • Equivariant functions
  • Permutation-invariance
  • Outlier detection procedures
  • Quasi-linear means
  • Results for positive-breakdown estimators
  • Characterization via inequalities
  • Inequalities as aids to interpretation
  • Relations between data characterizations
  • Bounds on means and standard deviations
  • Inequalities as uncertainty descriptions
  • Coda What is a good data characterization?

3
What is a Good data characterization?
  • A good data characterization K of dataset D
    should depend predictably on certain simple,
    systematic modifications of D.
  • One example is that the T(axb)aT(x)b holds.

4
Characterization via functional equations
  • In favorable cases the functional equations can
    be solved to yield much insight into the
    behaviour of different classes of data
    characterizations.
  • Functional equations are not as well known to
    non-mathematicians as many other branches of
    mathematics. In this chapter the author descripe
    functional equations in many details.

5
Functional equations
  • The term functional is quite hard to define
    precisely.
  • A set of simple but very instructive example are
    the four functional equations considered by
    Cauchy.
  • Cauchys basic equation
  • f(xy)f(x)f(y) (5.11)
  • f(rx)rf(x) (5.17)
  • Cauchys exponential equation. For real x and y
  • f(xy)f(x)f(y) (5.24)
  • Cauchys logarithmic equation. For real x and y,
    x?0,y?0
  • f(xy)f(x)f(y) (5.29)
  • (usually enough to consider x,y gt0)
  • Cauchys power equation. For real x and y,
    x?0,y?0
  • f(xy)f(x)f(y) (5.29)
  • (usually enough to consider x,y gt0)

6
Homogeneity and its extensions
  • In the simpliest form, a function f RN?R is
    homogeneous if it satisfies f(ax1ax2
    ...axn) af(x1x2 ...xn), for all x
    ? RN, a ? R.
  • A generalized homogeneous function f RN?R must
    satisfy the condition f(ax1ax2 ...axn)
    g(a)f(x1...xn) , for some function g() and
    for all x ? RN, a ? R, a?0.
  • Trivial solution would be f(x)0 for all x? RN

7
Homogeneity and its extensions, contd.
  • If f(x)?0, then we can obtain the functional
    equation g(a)g(b)g(ab), which is simply the
    Cauhcys power equation.
  • Homogeneity of order c can be defined
  • f(ax1ax2 ...axn) ac f(x1x2 ...xn)
  • If f RN?R is homogeneous, then f(0)0.

8
Location-invariance and related conditions
  • A function f RN?R is location invariant if it
    satisfies the condition f(x1c,x2
    c,...,xnc) f(x1,...,xn)c, for all x ? RN,
    c ? R.
  • A function f f(x1...xn) xi F(x1-xi
    ,..., xn-xi), satisfies the location-invariance
    condition, when F is any function of the n-1
    arguments xj-xi where i ?j.

9
Equivariant function
  • Equivariant function is both location-invariant
    and homogeneous function.
  • A function f RN?R is homogeneous if it satisfies
  • f(ax1ax2 ...axn) af(x1x2 ...xn), for all
    x ? RN, a ? R.
  • A function f RN?R is location invariant if it
    satisfies the condition f(x1c,x2 c,...,xnc)
    f(x1,x2 ,...,xn)c, for all x ? RN, c ? R.
  • Now, remember that if f RN?R is homogeneous,
    f(0)0.
  • Then, f(x,x,...,x)f(0,0,...,0)xx, for all real
    x.
  • Aczel (1966, p.236) shows that the following
    construction leads to an equivariant function
  • f(x1,x2 ,...,xn) µ s(z1,z2 ,...,zn),
  • where µ avg(x), s std(x) and zi(xi µ)/s and
    G RN?R arbitrary.

10
Permutation-invariance
  • Useful condition often imposed on data
    characterization is permutation-invariance.
  • f(Px)f(x) for any permutation P of the
    components of x.
  • This property is natural one for the data where
    individual observations are assumed exchangeable
    and equally treated.
  • For example if the function is weighted average,
    the weights must be equal in order to have
    permutation-invariance, which further implies the
    arithmetic average function.

11
Outlier detection procedures
  • Outlier detection rules introduced previously
    need to be invariant under affine changes of
    measurement units.
  • These conditions are actually functional
    equations that characterize the location
    estimator and scale estimator on which the
    outlier procedure is based.

12
Outlier detection procedures, contd.
  • The location estimator can be represented
    µ(x1,x2 ,...,xn) µ sG(z1,z2 ,...,zn),
    where µ avg(x), sstd(x) and zi(xi µ)/ s and
    G RN?R arbitrary.
  • It can be solved G(z1,z2 ,...,zn) µ(z1,z2
    ,...,zn)
  • The scale estimator S(x) S(s x µ) s S(z).
  • Now the outlier detection rule in terms of
    z-scores zk G(z)gttS(z) ? xk is an outlier.
  • From the chapter 3 If G(z) µ(z)0 and S(z) 1,
  • the standard z-score representation
  • zkgtt ? xk is an outlier.

13
Outlier detection procedures, contd.
  • Other outlier rules
  • Hampel identifier µ(z)med(z1,z2
    ,...,zn)(x- µ)/ s , where xmed(x)
  • This corresponds to Hotellings skewness measure
    which is known to satisfy the bounds (x- µ)/
    s 1.
  • Now the median based outlier detection criterion
    is
  • zk- (x- µ)/ s gttS(z) ? xk is an outlier.

14
Quasi-linear means
  • The family of quasi-linear means is a useful
    family of data characterizations.
  • That family has played a important role in both
    the theory of functional equations and the
    classical theory of inequalities.
  • Quasi-linear means can be obtained with formula
    A(x1,x2 ,...,xn) A(x)F-1(1/N) ? F(xi).
  • arithmethic mean can be obtained when F(x) x
    A(x)(1/N) ?xi
  • geometric mean can be obtained when F(x) ln x
    G(x)(?xi)1/N
  • harmonic mean can be obtained when F(x) 1/x
    H(x)(1/N) ?(1/xi)-1

15
Results for positive-breakdown estimators
  • An estimator T exhibits a finite sample breakdown
    point of m/n if m outliers in a dataset of size n
    can be chosen in a way that causes T to exhibit
    arbitrarily extreme values.
  • Arithmetic mean exhibit finite sample breakdown
    points of 1/n, which goes to zero in the limit of
    infinitely large sample sizes. This kind of
    estimators are called zero-breakdown estimators.
  • Estimators with a finite sample breakdown points
    m/n gt 0 for all n, are called positive-breakdown
    estimators.

16
Results for positive-breakdown estimators
  • Positive-breakdown estimators often exhibit one
    or more transformation invariance properties.
    Typically called equivariance properties.
  • Estimator T is scale-equivariant, if
  • T(xiT, ayi) aT(xiT, yi)
  • Estimator T is regression-equivariant, if
  • T(xiT, yi xiTv) T(xiT, yi) v
  • Estimator T is affine-equivariant, if
  • T(AxiT, yi) A-T T(xiT, yi)

17
Characterization via inequalities
  • Inequalities arise in data analysis for a number
    of reasons.
  • Inequalities can provide useful guidance in
    interpreting certain data characterizations
  • Inequalities can provide insight into the
    relationships between different data
    characterizations
  • Inequalities can be used when we are estimating
    the bounds on data characterization that cannot
    be computed exactly (for example when complete
    data is not available but some statistics are
    known)
  • Inequalities can be used as an uncertainty
    description in the set-theoretic or
    unknown-but-bounded error model.

18
Inequalities as aids to interpretation
  • Mean (m) and standard deviation (s) are extremely
    useful standard data characterization in many
    applications.
  • A direct consequence of Chebychevs inequality
    P(x-mgta) s2/a2
  • Cauchy-Shwarz inequality ?xk,yk
    (?xk2)1/2(?yk2)1/2
  • Product moment correlation coefficient
  • rx,y follows Cauchy-Shwarz
  • inequality rx,y 1

19
Relations between data characterizations
  • It is important to recognize that inequalities
    sometimes relate different data characterizations
    of the same data sequence.
  • One classical inequality with many applications
    is the AGM inequality
  • G(x1,...,xN)(?xi)1/N (1/N) ?xi
    A(x1,...,xN), when xi 0 for all i.
  • This inequality extends to to weighted arithmetic
    and geometric means
  • LG(x1,x2 ,...,xN) LA(x1,x2 ,...,xN)

20
AGM inequality
  • AGM corresponds to an important special case of a
    more general inequality between the generalized
    means Mr
  • Mr(x1,x2 ,...,xN) ?aixir 1/r , ai 0, ?ai
    1, r ? 0, xi gt0.
  • Now the family of generalized means satisfies the
    inequality
  • r lt s ? Mr(x1,x2 ,...,xN) Ms(x1,x2 ,...,xN)
    and the inequality is strict unless xi
    x for all i.
  • In the special case r-1 we obtain the harmonic
    mean.
  • It follows that the harmonic, geometric and
    arithmetic means are related by
  • ? (ai / xi)-1 ?xi ai ? ai xi ,
  • ai 0, ?ai 1, xi gt0 for all i,
  • and as before, both inequalities are strict
    unless xi x for all i.

21
Bounds on means and standard deviations
  • Sometimes complete dataset is not available. Let
    us present the computational bounds for the
    trasformed data zk t(xk) for the cases where
    the transformation is known but neither of the
    complete data sequences xk nor zk is
    available.
  • It is assumed that the following statistics for
    the original data sequence xk is available
  • mean m (1/N) ? xi
  • standard deviation s ((1/N) ? (xi -m)2)1/2
  • the sample minimum mmin(xk)
  • the sample maximum Mmax(xk)

22
Bounds on means and standard deviations
  • The transform function t is studied in more
    details by Rowe (1988).
  • For convex function Fa,b ? a,b, p(x)0
  • And the normalization condition
  • If we take f(x)x, we obtain F(E(x)) E(F(x)),
    which is the basis for the next results.

23
Bounds on means and standard deviations
  • The simplest of Rowes results Given only the
    mean µ, the minimum m, and the maximum M of the
    original sequencet(µ) 1/N ? t(xi)
    t(m)(µ-m)/(M-µ)t(M)-t(m)
  • Now, we can for example obtain bounds for
    harmonic mean m mM/(Mm-µ) 1/N ?
    1/xi-1 µ M
  • With the Rowes rules we can obtain bounds also
    for other functions. These are discussed more
    detailed in the chapter 5.3.3.

24
Inequalities as uncertainty descriptions
  • The set-theoretic or unknown-but-bounded data
    model could be an alternative to the more popular
    random data model.
  • This set-theoretic model is closely related to
    the ideas of interval arithmetic.
  • Real data values x are replaced with closed,
    bounded intervals XX-,X, for some X- lt X.
  • Bounds on data characterization computed from
    interval-valued data sequence do not depend on
    distributional assumptions, but instead on
    strictly weaker bounding assumptions.
  • And this is why
  • The midpoint of the interval m(X) (X- X )/2
  • The width w(x) X - X-
  • Now the interval may be presented
    Xm(X)(w(X)/2) -1,1.
  • This is now analoguos to the normalization of
    random variable with mean µ and standard
    deviation s to obtain zero-mean, unit-variance
    random variable z.
  • z (x µ ) / s ? x µsz

25
Interval arithmetic
  • To be able to use the interval data model, we
    need to define interval arithmetics (Moore 1979)
  • A addition X-,X Y-,Y X-Y- ,XY
  • S subtraction X-,X - Y-,Y XY
    ,X-Y-
  • M multiplication X-,X Y-,Y min(X-Y-,
    X-Y, XY-, XY), max( X-Y-, X-Y, XY-, XY)
  • D division X-,X / Y-,Y X-,X
    1/Y-,Y, where 1/Y-,Y 1/Y,1/Y-, provided
    0 not in Y-,Y.
  • Interval arithmetic operations are not fully
    equivalent to their real-number countepairs. So
    be careful when you are using them.

26
Example
  • Consider the general 2x2 table summarizing two
    mutually exclusive characterizations.
  • Now the linear equations can be collected to a
    matrix

Contingency table
27
Example, contd
  • We can set the bounds for each n 0nij min(ni.,
    n.j)

28
Coda What is a good data characterization?
  • Although all these functions provide some insight
    into various goodness criteria the preceding
    discussion have not really answered to the
    question posed in the title. What is a good
    data characterization?
  • The goodness may be judged for example with
    following
  • predictable qualitative behaviour,
  • ease of interpretation,
  • appropriateness to the application,
  • historical acceptance,
  • availability,
  • computational complexity.

29
Predictable qualitative behaviour
  • Data characterics can be predicted if we use
    functional equations that obey known rules.
  • These methods are easy to use.

30
Ease of interpretation
  • Use of normalized criteria such as Hotellings
    skewness measure leads to easier interpretation
    than nonnormalized measures as such.
  • Ease of interpretation usually has a strong
    application-spesific component, related in part
    to the criterion of historical acceptance.

31
Appropriateness to the application
  • Crucial, but may not be met in practise if the
    working assumptions on which the selected
    characterization is based are violated badly.
  • Historically, one of the key motivations for
    developing robust statistical methods, was the
    need to have alternatives for analysis that
    require Gaussian distributed data.
  • for example t-test See Figure.

32
Example
  • Upper two plots show two Gaussian sequences of
    the same length and the same variance but with
    different means. This difference is detected with
    t-test.
  • Lower plots show two data sequences with means 1
    and 0, but now the data is heavy-tailed Students
    t distribution with two degrees of freedom. Now
    the hyphoteses that the sample means are equal,
    cannot be rejected.

33
Historical acceptance
  • Many types of analysis are so frequently
    approached using one or more methods with strong
    historical acceptance that it may require
    substantial justification to present results
    obtained with any other method.
  • Gaussian assumption is required in many cases,
    and the historically most often used might not be
    the best one.
  • It is best to include standard methods among
    considered ones, both to confirm the suspiciouns
    of weakness in the historically accepted methods
    and to demonstrate these weakness to others.

34
Availability
  • Availability of the computational software to
    implement a particular analysis method.
  • Freeware packages like R have improved the
    availability
  • However, one has to be always careful what
    assumptions are made when using a new analysis
    method.
  • Computational methods usually return numerical
    values, even if none of the assumptions done for
    the data were true.

35
Computational complexity
  • Computational complexity is still a significant
    limitation of the utility of certain data
    charaterizations despite the advances in
    computing resources.
  • In many cases the objective of an algorithm can
    be divided to several different optimization
    problems.

36
Conclusions
  • Only rarely one method is best with respect to
    all of these criteria.
  • Sometimes no single method is acceptably good
    with respect to all of them.
  • As a consequence it is important to take broader
    view, comparing and evaluating a set of methods
    each of which is highly desirable with respect of
    these criterias.
  • Such comparisons can be useful
  • They can quantify the uncertainty in the results
  • They can suggest direction for refinement of the
    analysis that may lead more generally
    satisfactory result.
Write a Comment
User Comments (0)
About PowerShow.com