RISK MEASURES AND ESTIMATION ERROR

About This Presentation

Title:

RISK MEASURES AND ESTIMATION ERROR

Description:

... running away to infinity, they do not stabilize it: If a set of weights vanishes ... given sample, a different set will vanish for the next sample, the solution ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 71

Provided by: col69

Category:

more less

Transcript and Presenter's Notes

Title: RISK MEASURES AND ESTIMATION ERROR

1
RISK MEASURES AND ESTIMATION ERROR

Imre Kondor
Collegium Budapest and Eötvös University,
Budapest
International workshop on Financial Risk, Market
Complexity, and Regulation
Collegium Budapest, 8-10 October, 2009

This work has been supported by the National
Office for Research and Technology under grant
No. KCKHA005

2
Coworkers

Szilárd Pafka (ELTE PhD student ? CIB Bank,
?Paycom.net, California)
István Varga-Haszonits (ELTE PhD student
?Morgan-Stanley)
Susanne Still (University of Hawaii)

3
Contents

Instability of risk measures
Wider context model building for complex systems

IESTIMATION ERROR AND INSTABILITY IN PORTFOLIO
SELECTION

5
Consider the following trivial investment problem
(N 2, T 1)

N 2 assets with returns and , iid
normal, say,
and a sample of size T 1, that is a single
observation.
Let us choose the Maximal Loss (ML), the best
combination of the worst losses, as our risk
measure
This is a coherent measure, in the sense of
Artzner at al.,
a limiting case of Expected Shortfall (ES).

In the particular case of N2, T1
subject to , that is
Our optimization problem is then
Obviously, the solution is
, , for and
, for .

7
The two cases

ML as a risk measure is unbounded with
probability 1,
if N 2 and T 1.

8
If there are some constraints, e.g. short selling
is banned,

1
1
1
Prob. 1/2 Prob. 1/2
9
So, for N 2, T 1

Without constraint
the risk measure ML is not bounded with
probability 1, there is no solution, we are
tempted to go infinitely long in the dominant
item, and infinitely short in the dominated one.
With constraint
the risk measure is bounded but monotonic, so
with probability 1 we go as long as allowed by
the constraint in the dominating item, and as
short as necessary for the budget constraint to
be satisfied in the dominated one.

10
The same for N 2 and T 2

, where
and
There is no solution if and
, or
and , that is
when one of the items
dominates the other in the sample. This happens
with
probability ½ (assuming iid variables, say).
There is a finite solution if and
, or
and , that is
when none of the items
dominates the other.The probability of this
event is 1/2 again.

11
Geometrically
12

For N 2, T 2 there is no solution with
probability
1/2, and there is a finite solution with
probability 1/2.
When one of the items dominates, there is no
finite
solution, unless we impose some constraints. Then
we go as long as allowed by the constraints in
the
dominating item, and as short as necessary in the
dominated one.
When neither of them dominates, we have a finite
solution that may or may not fall inside the
allowed
region.

13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17

The existence of a finite solution depends on the
sample.
Although the constraints may prevent the solution
from running away to infinity, they do not
stabilize it If a set of weights vanishes for a
given sample, a different set will vanish for the
next sample, the solution jumps around on the
boundaries of the allowed region.
The smaller the ratio N/T, the larger the
probability of a finite solution, and the smaller
the generalization error.
In real life N/T is almost never small the limit
N,T ?8, with N/T fixed, is closer to reality.

18
Probability of finding a solution for the minimax
problem (general N and T, elliptic underlying
distribution)
In the limit N,T ? 8, with N/T fixed, the
transition becomes sharp at N/T ½. The
estimation error diverges as we go to N/T ½ from
below.

19
Generalization I Expected Shortfall

ES is the conditional expectation value of losses
above a high threshold. It has an obvious
meaning,
it is easy to determine form historical time
series, and
can be optimized via linear programming. ML is
the limiting case of ES, when the threshold goes
to 1.
ES shows the same instability as ML, but the
locus of
this instability depends not only on N/T, but
also on
the threshold ß above which the conditional
average
is calculated. So there will be a critical line.

This critical line or phase boundary for ES has
been obtained numerically by I. K., Sz. Pafka, G.
Nagy Noise sensitivity of portfolio selection
under various risk measures, Journal of Banking
and Finance, 31, 1545-1573 (2007) and calculated
analytically in A. Ciliberti, I. K., and M.
Mézard On the Feasibility of Portfolio
Optimization under Expected Shortfall,
Quantitative Finance, 7, 389-396 (2007)

The estimation error diverges as one
approaches the phase boundary from below
21
Generalization stage II Coherent measures

The intuitive explanation for the instability of
ES and ML is that for a given finite sample there
may exist a dominant item (or a dominant
combination of items) that produces a larger
return at each time point than any of the others,
even if no such dominance relationship exist
between them on very large samples. This leads
the investor to believe that if she goes
extremely long in the dominant item and extremely
short in the rest, she can produce an arbitrarily
large return on the portfolio, at a risk that
goes to minus infinity (i.e. no risk).

22
Coherent measures on a given sample

Such apparent arbitrage can show up for any
coherent risk measure. (I.K. and I.
Varga-Haszonits Feasibility of portfolio
optimization under coherent risk measures,
submitted to Quantitative Finance)
Assume that the finite sample estimator
of our risk measure satisfies the coherence
axioms (Ph. Artzner, F. Delbaen, J. M. Eber, and
D. Heath, Coherent measures of risk, Mathematical
Finance, 9, 203-228, (1999)

23
The formal statements corresponding to the above
intuition

Proposition 1. If there exist two portfolios u
and v so that then the portfolio
optimisation task has no solution under any
coherent measure.
Proposition 2. Optimisation under ML has no
solution, if and only if there exists a pair of
portfolios such that one of them strictly
dominates the other.
Neither of these theorems assumes anything about
the underlying distribution.

24
The formal statements corresponding to the above
intuition

Proposition 1. If there exist two portfolios u
and v so that then the portfolio
optimisation task has no solution under any
coherent measure.
Proposition 2. Optimisation under ML has no
solution, if and only if there exists a pair of
portfolios such that one of them strictly
dominates the other.
Neither of these theorems assumes anything about
the underlying distribution.

25
The formal statements corresponding to the above
intuition

Proposition 1. If there exist two portfolios u
and v so that then the portfolio
optimisation task has no solution under any
coherent measure.
Proposition 2. Optimisation under ML has no
solution, if and only if there exists a pair of
portfolios such that one of them strictly
dominates the other.
Neither of these theorems assumes anything about
the underlying distribution.

26
Further generalization

As a matter of fact, this type of instability
appears even beyond the set of coherent risk
measures, and may appear in downside risk
measures in general.
By far the most widely used risk measure today is
Value at Risk (VaR). It is a downside measure. It
is not convex, therefore the stability problem of
its historical estimator is ill-posed.
Parametric VaR, however, is convex, and this
allows us to study the stability problem. Along
with VaR, we also look into the closely related
parametric estimate for ES.
Parametric estimates are expected to be more
stable than historical ones. We will then be able
to compare the phase diagrams for the historical
and parametric ES.

27
Further generalization

As a matter of fact, this type of instability
appears even beyond the set of coherent risk
measures, and may appear in downside risk
measures in general.
By far the most widely used risk measure today is
Value at Risk (VaR). It is a downside measure. It
is not convex, therefore the stability problem of
its historical estimator is ill-posed.
Parametric VaR, however, is convex, and this
allows us to study the stability problem. Along
with VaR, we also look into the closely related
parametric estimate for ES.
Parametric estimates are expected to be more
stable than historical ones. We will then be able
to compare the phase diagrams for the historical
and parametric ES.

28
Further generalization

As a matter of fact, this type of instability
appears even beyond the set of coherent risk
measures, and may appear in downside risk
measures in general.
By far the most widely used risk measure today is
Value at Risk (VaR). It is a downside measure. It
is not convex, therefore the stability problem of
its historical estimator is ill-posed.
Parametric VaR, however, is convex, and this
allows us to study the stability problem. Along
with VaR, we also look into the closely related
parametric estimate for ES.
Parametric estimates are expected to be more
stable than historical ones. We will then be able
to compare the phase diagrams for the historical
and parametric ES.

29
Further generalization

As a matter of fact, this type of instability
appears even beyond the set of coherent risk
measures, and may appear in downside risk
measures in general.
By far the most widely used risk measure today is
Value at Risk (VaR). It is a downside measure. It
is not convex, therefore the stability problem of
its historical estimator is ill-posed.
Parametric VaR, however, is convex, and this
allows us to study the stability problem. Along
with VaR, we also look into the closely related
parametric estimates for ES.
Parametric estimates are expected to be more
stable than historical ones. We will then be able
to compare the phase diagrams for the historical
and parametric ES.

30
Parametric estimation of VaR, ES, and
semi-variance

For simplicity, we assume that the historical
data are fitted to a Gaussian underlying process.
For a Gaussian process all three risk measures
can be written as
,
where

Here is the error function.
The condition for the existence of an optimum for
VaR and ES is
,
where

Note that there is no unconditional optimum even
if we know the underlying process exactly.
It can be shown that the meaning of the condition
is similar to the previous one (think e.g. of a
portfolio with one exceptionally high return item
that has a variance comparable to the others).
If we do not know the true process, but assume it
is, say, a Gaussian, we may estimate its mean
returns and covariances from the observed finite
time series as
and

Assume, for simplicity, that all the mean
returns are zero. After a long and tedious
application of the replica method imported from
the theory of random systems, the solvability
condition works out to be
lt
for all three risk measures. Note that this is
stronger than the solvability condition for the
exactly known process.

34
For the parametric VaR and ES the result is shown
in the figure
35

In the region above the respective phase
boundaries the optimization problem does not have
a solution.
In the region below the phase boundary there is a
solution, but for it to be a good approximation
to the true risk we must go deep into the
feasible region. If we go to the phase boundary
from below, the estimation error diverges.
The phase boundary for ES runs above that of VaR,
so for a given confidence level a the critical
ratio for ES is larger than for VaR (we need less
data in order to have a solution). For
practically important values of a (95-99) the
difference is not significant.

In the region above the respective phase
boundaries the optimization problem does not have
a solution.
In the region below the phase boundary there is a
solution, but for it to be a good approximation
to the true risk we must go deep into the
feasible region. If we go to the phase boundary
from below, the estimation error diverges.
The phase boundary for ES runs above that of VaR,
so for a given confidence level a the critical
ratio for ES is larger than for VaR (we need less
data in order to have a solution). For
practically important values of a (95-99) the
difference is not significant.

In the region above the respective phase
boundaries the optimization problem does not have
a solution.
In the region below the phase boundary there is a
solution, but for it to be a good approximation
to the true risk we must go deep into the
feasible region. If we go to the phase boundary
from below, the estimation error diverges.
The phase boundary for ES runs above that of VaR,
so for a given confidence level ß the critical
ratio for ES is larger than for VaR (we need less
data in order to have a solution). For
practically important values of ß (95-99) the
difference is not significant.

38
Parametric vs. historical estimates

The parametric ES curve runs above the historical
one we need less data to have a solution when
the risk is estimated parametrically than when we
use raw historical data. It seems as if we had
some additional information in the parametric
approach.
Where does this information come from?
It is injected into the calculation by hand
when fitting the data to an independently chosen
probability distribution.

39
Parametric vs. historical estimates

The parametric ES curve runs above the historical
one we need less data to have a solution when
the risk is estimated parametrically than when we
use raw historical data. It seems as if we had
some additional information in the parametric
approach.
Where does this information come from?
It is injected into the calculation by hand
when fitting the data to an independently chosen
probability distribution.

40
Adding linear constraints

In practice, portfolio optimization is always
subject to some constraints on the allowed range
of the weights, such as a ban on short selling
and/or limits on various assets, industrial
sectors, regions, etc. These constraints restrict
the region over which the optimum is sought to a
finite volume where no infinite fluctuations can
appear. One might then think that under such
constraints the instability discussed above
disappears completely.

This is not so. If we work in the vicinity of the
phase boundary, sample to sample fluctuations in
the weights will still be large, but the
constraints will prevent the solution from
running away to infinity. Instead, it will stick
to the walls of the allowed region.
For example, for a ban on short selling (wi gt 0)
these walls will be the coordinate planes, and as
N/T increases, more and more of the weights will
become zero. This phenomenon is well known in
portfolio optimization. (B. Scherer, R. D.
Martin,
Introduction to Modern Portflio Optimization
with NUOPT and S-PLUS, Springer, New York (2005))

This is not so. If we work in the vicinity of the
phase boundary, sample to sample fluctuations in
the weights will still be large, but the
constraints will prevent the solution from
running away to infinity. Instead, it will stick
to the walls of the allowed region.
For example, for a ban on short selling (wi gt 0)
these walls will be the coordinate planes, and as
N/T increases, more and more of the weights will
become zero. This phenomenon is well known in
portfolio optimization. (B. Scherer, R. D.
Martin,
Introduction to Modern Portflio Optimization
with NUOPT and S-PLUS, Springer, New York (2005))

This spontaneous reduction of diversification is
entirely due to estimation error and does not
reflect any real structure of the objective
function.
In addition, for the next sample a completely
different set of weights will become zero the
solution keeps jumping about on the walls of the
allowed region.
Clearly, in this situation the solution reflects
the structure of the limit system (i.e. the
portfolio managers beliefs), rather than the
structure of the market. Therefore, whenever we
are working in or close to the unstable region
(which is almost always), the constraints only
mask rather than cure the instability.

This spontaneous reduction of diversification is
entirely due to estimation error and does not
reflect any real structure of the objective
function.
In addition, for the next sample a completely
different set of weights will become zero the
solution keeps jumping about on the walls of the
allowed region.
Clearly, in this situation the solution reflects
the structure of the limit system, (i.e. the
portfolio managers beliefs), rather than the
structure of the market. Therefore, whenever we
are working in or close to the unstable region
(which is almost always), the constraints only
mask rather than cure the instability.

This spontaneous reduction of diversification is
entirely due to estimation error and does not
reflect any real structure of the objective
function.
In addition, for the next sample a completely
different set of weights will become zero the
solution keeps jumping about on the walls of the
allowed region.
Clearly, in this situation the solution reflects
the structure of the limit system (i.e. the
portfolio managers beliefs), rather than the
structure of the market. Therefore, whenever we
are working in, or close to, the unstable region
(which is almost always), the constraints only
mask rather than cure the instability.

46
Closing remarks on portfolio selection

Given the nature of the portfolio optimization
task, one will typically work in that region of
parameter space where sample fluctuations are
large. Since the critical point where these
fluctuations diverge depends on the risk measure,
the confidence level, and on the method of
estimation, one must be aware of how close ones
working point is to the critical boundary,
otherwise one will be grossly misled by the
unstable algorithm.
The divergent estimation error due to information
deficit is related to the instability discovered
by Marsili in complete market models.

Downside risk measures have been introduced,
because they ignore positive fluctuations that
investors are not supposed to be afraid of.
Perhaps they should be the downside risk
measures display the instability described here
which is basically due to a false arbitrage alert
and may induce an investor to take very large
positions on the basis of fragile information
stemming from finite samples.
In a way, the global disaster engulfing us is a
macroscopic example of such a folly.

48
(No Transcript)
49
One more step Portfolio optimization is
equivalent to Linear Regression
50

Linear regression is a standard framework in
which to attempt to construct a first statistical
model.
It is ubiquitous (microarrays, medical sciences,
epidemology, sociology, macroeconomics, etc.)
It has a time-honored history and works fine
especially if the independent variables are few,
there are enough data, and they are drawn from a
tight distribution (such as a Gaussian)
Complications arise if we have a large number of
explicatory variables (their number grows at a
rate of 5 per decade), and a limited number of
data (as almost always).
Then we face a serious estimation error problem.

Linear regression is a standard framework in
which to attempt to construct a first statistical
model.
It is ubiquitous (microarrays, medical sciences,
epidemology, sociology, macroeconomics, etc.)
It has a time-honored history and works fine
especially if the independent variables are few,
there are enough data, and they are drawn from a
tight distribution (such as a Gaussian)
Complications arise if we have a large number of
explicatory variables (their number grows at a
rate of 5 per decade), and a limited number of
data (as almost always).
Then we face a serious estimation error problem.

Linear regression is a standard framework in
which to attempt to construct a first statistical
model.
It is ubiquitous (microarrays, medical sciences,
epidemology, sociology, macroeconomics, etc.)
It has a time-honored history and works fine
especially if the independent variables are few,
there are enough data, and they are drawn from a
tight distribution (such as a Gaussian)
Complications arise if we have a large number of
explicatory variables (their number grows at a
rate of 5 per decade), and a limited number of
data (as almost always).
Then we face a serious estimation error problem.

Linear regression is a standard framework in
which to attempt to construct a first statistical
model.
It is ubiquitous (microarrays, medical sciences,
epidemology, sociology, macroeconomics, etc.)
It has a time-honored history and works fine
especially if the independent variables are few,
there are enough data, and they are drawn from a
tight distribution (such as a Gaussian)
Complications arise if we have a large number of
explicatory variables (their number grows at a
rate of 5 per decade), and a limited number of
data (as almost always).
Then we face a serious estimation error problem.

54
Assume we know the underlying process and
minimize the residual error for an infinitely
large sample
55
In practice we can only minimize the residual
error for a sample of length T
56
The relative error

This is a measure of the estimation error.
It is a random variable, depends on the sample
Its distribution strongly depends on the ratio
N/T, where N is the number of dimensions and T
the sample size.
The average of qo diverges at a critical value of
N/T!

57
Critical behaviour for N,T large, with N/Tfixed

The average of qo diverges at the critical point
N/T1, just as in portfolio theory.

The regression coefficients fluctuate wildly
unless N/T 1. Geometric interpretation one
cannot fit a plane to one point.
58
MODELING COMPLEX SYSTEMS
59

Normally, one is supposed to work in the NltltT
limit, i.e. with low dimensional problems and
plenty of data.
Complex systems are very high dimensional and
irreducible (incompressible), they require a
large number of explicatory variables for their
faithful representation.
Therefore, we have to face the unconventional
situation in the regression problem that NT, or
even NgtT, and then the error in the regression
coefficients will be large.

Normally, one is supposed to work in the NltltT
limit, i.e. with low dimensional problems and
plenty of data.
Complex systems are very high dimensional and
irreducible (incompressible), they require a
large number of explicatory variables for their
faithful representation.
Therefore, we have to face the unconventional
situation in the regression problem that NT, or
even NgtT, and then the error in the regression
coefficients will be large.

Normally, one is supposed to work in the NltltT
limit, i.e. with low dimensional problems and
plenty of data.
Complex systems are very high dimensional and
irreducible (incompressible), they require a
large number of explicatory variables for their
faithful representation.
Therefore, we have to face the unconventional
situation in the regression problem that NT, or
even NgtT, and then the error in the regression
coefficients will be large.

If the number of explicatory variables is very
large and they are all of the same order of
magnitude, then there is no structure in the
system, it is just noise (like a completely
random string). So we have to assume that some of
the variables have a larger weight than others,
but we do not have a natural cutoff beyond which
it would be safe to forget about the higher order
variables. This leads us to the assumption that
the regression coefficients must have a scale
free, power law like distribution for complex
systems.

How can we understand that, in the social
sciences, medical sciences, etc., we are getting
away with insufficient statistics, even with NgtT?
We are projecting external information into our
statistical assessments. (I can draw a
well-determined straight line across even a
single point, if I know that it must be parallel
to another line.)
Humans do not optimize, but use quick and dirty
heuristics. This has an evolutionary meaning if
something looks vaguely like a leopard, one
jumps, rather than trying to seek the optimal fit
to the observed fragments of the picture to a
leopard.
We are very good at completing the picture.

How can we understand that, in the social
sciences, medical sciences, etc., we are getting
away with insufficient statistics, even with NgtT?
We are projecting external information into our
statistical assessments. (I can draw a
well-determined straight line across even a
single point, if I know that it must be parallel
to another line.)
Humans do not optimize, but use quick and dirty
heuristics. This has an evolutionary meaning if
something looks vaguely like a leopard, one
jumps, rather than trying to seek the optimal fit
to the observed fragments of the picture to a
leopard.
We are very good at completing the picture.

How can we understand that, in the social
sciences, medical sciences, etc., we are getting
away with insufficient statistics, even with NgtT?
We are projecting external information into our
statistical assessments. (I can draw a
well-determined straight line across even a
single point, if I know that it must be parallel
to another line.)
Humans do not optimize, but use quick and dirty
heuristics. This has an evolutionary meaning if
something looks vaguely like a leopard, one
jumps, rather than trying to seek the optimal fit
to the observed fragments of the picture to a
leopard.
We are very good at completing the picture.

Prior knowledge, the larger picture, values,
deliberate or unconscious bias, etc. are
essential features of model building.
When we have a chance to check this prior
knowledge millions of times in carefully designed
laboratory experiments, this is a well-justified
procedure.
In several applications (macroeconomics, medical
sciences, epidemology, etc.) there is no way to
perform these laboratory checks, and errors may
build up as one uncertain piece of knowledge
serves as a prior for another uncertain
statistical model. This is how we construct
myths, ideologies and social theories.

Prior knowledge, the larger picture, values,
deliberate or unconscious bias, etc. are
essential features of model building.
When we have a chance to check this prior
knowledge millions of times in carefully designed
laboratory experiments, this is a well-justified
procedure.
In several applications (macroeconomics, medical
sciences, epidemology, etc.) there is no way to
perform these laboratory checks, and errors may
build up as one uncertain piece of knowledge
serves as a prior for another uncertain
statistical model. This is how we construct
myths, ideologies and social theories.

Prior knowledge, the larger picture, values,
deliberate or unconscious bias, etc. are
essential features of model building.
When we have a chance to check this prior
knowledge millions of times in carefully designed
laboratory experiments, this is a well-justified
procedure.
In several applications (macroeconomics, medical
sciences, epidemology, etc.) there is no way to
perform these laboratory checks, and errors may
build up as one uncertain piece of knowledge
serves as a prior for another uncertain
statistical model. This is how we construct
myths, ideologies and social theories.

It is conceivable that theory building (in the
sense of constructing a low dimensional model)
for social phenomena will prove to be impossible,
and the best we will be able to do is to build a
life-size computer model of the system, a kind of
gigantic Simcity, or Borges map.
By playing and experimenting with these models we
may develop an intuition about their complex
behaviour that we couldnt gain by observing the
single sample of a society or economy.