RISK MEASURES AND ESTIMATION ERROR - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

RISK MEASURES AND ESTIMATION ERROR

Description:

... running away to infinity, they do not stabilize it: If a set of weights vanishes ... given sample, a different set will vanish for the next sample, the solution ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 71
Provided by: col69
Category:

less

Transcript and Presenter's Notes

Title: RISK MEASURES AND ESTIMATION ERROR


1
RISK MEASURES AND ESTIMATION ERROR
  • Imre Kondor
  • Collegium Budapest and Eötvös University,
    Budapest
  • International workshop on Financial Risk, Market
    Complexity, and Regulation
  • Collegium Budapest, 8-10 October, 2009

  • This work has been supported by the National
    Office for Research and Technology under grant
    No. KCKHA005

2
Coworkers
  • Szilárd Pafka (ELTE PhD student ? CIB Bank,
    ?Paycom.net, California)
  • István Varga-Haszonits (ELTE PhD student
    ?Morgan-Stanley)
  • Susanne Still (University of Hawaii)

3
Contents
  • Instability of risk measures
  • Wider context model building for complex systems

4
  • IESTIMATION ERROR AND INSTABILITY IN PORTFOLIO
    SELECTION

5
Consider the following trivial investment problem
(N 2, T 1)
  • N 2 assets with returns and , iid
    normal, say,
  • and a sample of size T 1, that is a single
    observation.
  • Let us choose the Maximal Loss (ML), the best
  • combination of the worst losses, as our risk
    measure
  • This is a coherent measure, in the sense of
    Artzner at al.,
  • a limiting case of Expected Shortfall (ES).



6
  • In the particular case of N2, T1
  • subject to , that is
  • Our optimization problem is then
  • Obviously, the solution is
  • , , for and
  • , for .

7
The two cases
  • ML as a risk measure is unbounded with
    probability 1,
  • if N 2 and T 1.

8
If there are some constraints, e.g. short selling
is banned,

1
1
1
Prob. 1/2 Prob. 1/2
9
So, for N 2, T 1
  • Without constraint
  • the risk measure ML is not bounded with
    probability 1, there is no solution, we are
    tempted to go infinitely long in the dominant
    item, and infinitely short in the dominated one.
  • With constraint
  • the risk measure is bounded but monotonic, so
    with probability 1 we go as long as allowed by
    the constraint in the dominating item, and as
    short as necessary for the budget constraint to
    be satisfied in the dominated one.

10
The same for N 2 and T 2
  • , where
  • and
  • There is no solution if and
    , or
  • and , that is
    when one of the items
  • dominates the other in the sample. This happens
    with
  • probability ½ (assuming iid variables, say).
  • There is a finite solution if and
    , or
  • and , that is
    when none of the items
  • dominates the other.The probability of this
    event is 1/2 again.

11
Geometrically
12
  • For N 2, T 2 there is no solution with
    probability
  • 1/2, and there is a finite solution with
    probability 1/2.
  • When one of the items dominates, there is no
    finite
  • solution, unless we impose some constraints. Then
  • we go as long as allowed by the constraints in
    the
  • dominating item, and as short as necessary in the
  • dominated one.
  • When neither of them dominates, we have a finite
  • solution that may or may not fall inside the
    allowed
  • region.

13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
  • The existence of a finite solution depends on the
    sample.
  • Although the constraints may prevent the solution
    from running away to infinity, they do not
    stabilize it If a set of weights vanishes for a
    given sample, a different set will vanish for the
    next sample, the solution jumps around on the
    boundaries of the allowed region.
  • The smaller the ratio N/T, the larger the
    probability of a finite solution, and the smaller
    the generalization error.
  • In real life N/T is almost never small the limit
    N,T ?8, with N/T fixed, is closer to reality.

18
Probability of finding a solution for the minimax
problem (general N and T, elliptic underlying
distribution)
In the limit N,T ? 8, with N/T fixed, the
transition becomes sharp at N/T ½. The
estimation error diverges as we go to N/T ½ from
below.

19
Generalization I Expected Shortfall
  • ES is the conditional expectation value of losses
  • above a high threshold. It has an obvious
    meaning,
  • it is easy to determine form historical time
    series, and
  • can be optimized via linear programming. ML is
  • the limiting case of ES, when the threshold goes
    to 1.
  • ES shows the same instability as ML, but the
    locus of
  • this instability depends not only on N/T, but
    also on
  • the threshold ß above which the conditional
    average
  • is calculated. So there will be a critical line.

20
  • This critical line or phase boundary for ES has
    been obtained numerically by I. K., Sz. Pafka, G.
    Nagy Noise sensitivity of portfolio selection
    under various risk measures, Journal of Banking
    and Finance, 31, 1545-1573 (2007) and calculated
    analytically in A. Ciliberti, I. K., and M.
    Mézard On the Feasibility of Portfolio
    Optimization under Expected Shortfall,
    Quantitative Finance, 7, 389-396 (2007)

The estimation error diverges as one
approaches the phase boundary from below
21
Generalization stage II Coherent measures
  • The intuitive explanation for the instability of
    ES and ML is that for a given finite sample there
    may exist a dominant item (or a dominant
    combination of items) that produces a larger
    return at each time point than any of the others,
    even if no such dominance relationship exist
    between them on very large samples. This leads
    the investor to believe that if she goes
    extremely long in the dominant item and extremely
    short in the rest, she can produce an arbitrarily
    large return on the portfolio, at a risk that
    goes to minus infinity (i.e. no risk).

22
Coherent measures on a given sample
  • Such apparent arbitrage can show up for any
    coherent risk measure. (I.K. and I.
    Varga-Haszonits Feasibility of portfolio
    optimization under coherent risk measures,
    submitted to Quantitative Finance)
  • Assume that the finite sample estimator
    of our risk measure satisfies the coherence
    axioms (Ph. Artzner, F. Delbaen, J. M. Eber, and
    D. Heath, Coherent measures of risk, Mathematical
    Finance, 9, 203-228, (1999)




23
The formal statements corresponding to the above
intuition
  • Proposition 1. If there exist two portfolios u
    and v so that then the portfolio
    optimisation task has no solution under any
    coherent measure.
  • Proposition 2. Optimisation under ML has no
    solution, if and only if there exists a pair of
    portfolios such that one of them strictly
    dominates the other.
  • Neither of these theorems assumes anything about
    the underlying distribution.

24
The formal statements corresponding to the above
intuition
  • Proposition 1. If there exist two portfolios u
    and v so that then the portfolio
    optimisation task has no solution under any
    coherent measure.
  • Proposition 2. Optimisation under ML has no
    solution, if and only if there exists a pair of
    portfolios such that one of them strictly
    dominates the other.
  • Neither of these theorems assumes anything about
    the underlying distribution.

25
The formal statements corresponding to the above
intuition
  • Proposition 1. If there exist two portfolios u
    and v so that then the portfolio
    optimisation task has no solution under any
    coherent measure.
  • Proposition 2. Optimisation under ML has no
    solution, if and only if there exists a pair of
    portfolios such that one of them strictly
    dominates the other.
  • Neither of these theorems assumes anything about
    the underlying distribution.

26
Further generalization
  • As a matter of fact, this type of instability
    appears even beyond the set of coherent risk
    measures, and may appear in downside risk
    measures in general.
  • By far the most widely used risk measure today is
    Value at Risk (VaR). It is a downside measure. It
    is not convex, therefore the stability problem of
    its historical estimator is ill-posed.
  • Parametric VaR, however, is convex, and this
    allows us to study the stability problem. Along
    with VaR, we also look into the closely related
    parametric estimate for ES.
  • Parametric estimates are expected to be more
    stable than historical ones. We will then be able
    to compare the phase diagrams for the historical
    and parametric ES.

27
Further generalization
  • As a matter of fact, this type of instability
    appears even beyond the set of coherent risk
    measures, and may appear in downside risk
    measures in general.
  • By far the most widely used risk measure today is
    Value at Risk (VaR). It is a downside measure. It
    is not convex, therefore the stability problem of
    its historical estimator is ill-posed.
  • Parametric VaR, however, is convex, and this
    allows us to study the stability problem. Along
    with VaR, we also look into the closely related
    parametric estimate for ES.
  • Parametric estimates are expected to be more
    stable than historical ones. We will then be able
    to compare the phase diagrams for the historical
    and parametric ES.

28
Further generalization
  • As a matter of fact, this type of instability
    appears even beyond the set of coherent risk
    measures, and may appear in downside risk
    measures in general.
  • By far the most widely used risk measure today is
    Value at Risk (VaR). It is a downside measure. It
    is not convex, therefore the stability problem of
    its historical estimator is ill-posed.
  • Parametric VaR, however, is convex, and this
    allows us to study the stability problem. Along
    with VaR, we also look into the closely related
    parametric estimate for ES.
  • Parametric estimates are expected to be more
    stable than historical ones. We will then be able
    to compare the phase diagrams for the historical
    and parametric ES.

29
Further generalization
  • As a matter of fact, this type of instability
    appears even beyond the set of coherent risk
    measures, and may appear in downside risk
    measures in general.
  • By far the most widely used risk measure today is
    Value at Risk (VaR). It is a downside measure. It
    is not convex, therefore the stability problem of
    its historical estimator is ill-posed.
  • Parametric VaR, however, is convex, and this
    allows us to study the stability problem. Along
    with VaR, we also look into the closely related
    parametric estimates for ES.
  • Parametric estimates are expected to be more
    stable than historical ones. We will then be able
    to compare the phase diagrams for the historical
    and parametric ES.

30
Parametric estimation of VaR, ES, and
semi-variance
  • For simplicity, we assume that the historical
    data are fitted to a Gaussian underlying process.
  • For a Gaussian process all three risk measures
    can be written as
  • ,
  • where

31
  • Here is the error function.
  • The condition for the existence of an optimum for
    VaR and ES is
  • ,
  • where

32
  • Note that there is no unconditional optimum even
    if we know the underlying process exactly.
  • It can be shown that the meaning of the condition
    is similar to the previous one (think e.g. of a
    portfolio with one exceptionally high return item
    that has a variance comparable to the others).
  • If we do not know the true process, but assume it
    is, say, a Gaussian, we may estimate its mean
    returns and covariances from the observed finite
    time series as
  • and

33
  • Assume, for simplicity, that all the mean
    returns are zero. After a long and tedious
    application of the replica method imported from
    the theory of random systems, the solvability
    condition works out to be
  • lt
  • for all three risk measures. Note that this is
    stronger than the solvability condition for the
    exactly known process.

34
For the parametric VaR and ES the result is shown
in the figure
35
  • In the region above the respective phase
    boundaries the optimization problem does not have
    a solution.
  • In the region below the phase boundary there is a
    solution, but for it to be a good approximation
    to the true risk we must go deep into the
    feasible region. If we go to the phase boundary
    from below, the estimation error diverges.
  • The phase boundary for ES runs above that of VaR,
    so for a given confidence level a the critical
    ratio for ES is larger than for VaR (we need less
    data in order to have a solution). For
    practically important values of a (95-99) the
    difference is not significant.

36
  • In the region above the respective phase
    boundaries the optimization problem does not have
    a solution.
  • In the region below the phase boundary there is a
    solution, but for it to be a good approximation
    to the true risk we must go deep into the
    feasible region. If we go to the phase boundary
    from below, the estimation error diverges.
  • The phase boundary for ES runs above that of VaR,
    so for a given confidence level a the critical
    ratio for ES is larger than for VaR (we need less
    data in order to have a solution). For
    practically important values of a (95-99) the
    difference is not significant.

37
  • In the region above the respective phase
    boundaries the optimization problem does not have
    a solution.
  • In the region below the phase boundary there is a
    solution, but for it to be a good approximation
    to the true risk we must go deep into the
    feasible region. If we go to the phase boundary
    from below, the estimation error diverges.
  • The phase boundary for ES runs above that of VaR,
    so for a given confidence level ß the critical
    ratio for ES is larger than for VaR (we need less
    data in order to have a solution). For
    practically important values of ß (95-99) the
    difference is not significant.

38
Parametric vs. historical estimates
  • The parametric ES curve runs above the historical
    one we need less data to have a solution when
    the risk is estimated parametrically than when we
    use raw historical data. It seems as if we had
    some additional information in the parametric
    approach.
  • Where does this information come from?
  • It is injected into the calculation by hand
    when fitting the data to an independently chosen
    probability distribution.

39
Parametric vs. historical estimates
  • The parametric ES curve runs above the historical
    one we need less data to have a solution when
    the risk is estimated parametrically than when we
    use raw historical data. It seems as if we had
    some additional information in the parametric
    approach.
  • Where does this information come from?
  • It is injected into the calculation by hand
    when fitting the data to an independently chosen
    probability distribution.

40
Adding linear constraints
  • In practice, portfolio optimization is always
    subject to some constraints on the allowed range
    of the weights, such as a ban on short selling
    and/or limits on various assets, industrial
    sectors, regions, etc. These constraints restrict
    the region over which the optimum is sought to a
    finite volume where no infinite fluctuations can
    appear. One might then think that under such
    constraints the instability discussed above
    disappears completely.

41
  • This is not so. If we work in the vicinity of the
    phase boundary, sample to sample fluctuations in
    the weights will still be large, but the
    constraints will prevent the solution from
    running away to infinity. Instead, it will stick
    to the walls of the allowed region.
  • For example, for a ban on short selling (wi gt 0)
    these walls will be the coordinate planes, and as
    N/T increases, more and more of the weights will
    become zero. This phenomenon is well known in
    portfolio optimization. (B. Scherer, R. D.
    Martin,
  • Introduction to Modern Portflio Optimization
    with NUOPT and S-PLUS, Springer, New York (2005))

42
  • This is not so. If we work in the vicinity of the
    phase boundary, sample to sample fluctuations in
    the weights will still be large, but the
    constraints will prevent the solution from
    running away to infinity. Instead, it will stick
    to the walls of the allowed region.
  • For example, for a ban on short selling (wi gt 0)
    these walls will be the coordinate planes, and as
    N/T increases, more and more of the weights will
    become zero. This phenomenon is well known in
    portfolio optimization. (B. Scherer, R. D.
    Martin,
  • Introduction to Modern Portflio Optimization
    with NUOPT and S-PLUS, Springer, New York (2005))

43
  • This spontaneous reduction of diversification is
    entirely due to estimation error and does not
    reflect any real structure of the objective
    function.
  • In addition, for the next sample a completely
    different set of weights will become zero the
    solution keeps jumping about on the walls of the
    allowed region.
  • Clearly, in this situation the solution reflects
    the structure of the limit system (i.e. the
    portfolio managers beliefs), rather than the
    structure of the market. Therefore, whenever we
    are working in or close to the unstable region
    (which is almost always), the constraints only
    mask rather than cure the instability.

44
  • This spontaneous reduction of diversification is
    entirely due to estimation error and does not
    reflect any real structure of the objective
    function.
  • In addition, for the next sample a completely
    different set of weights will become zero the
    solution keeps jumping about on the walls of the
    allowed region.
  • Clearly, in this situation the solution reflects
    the structure of the limit system, (i.e. the
    portfolio managers beliefs), rather than the
    structure of the market. Therefore, whenever we
    are working in or close to the unstable region
    (which is almost always), the constraints only
    mask rather than cure the instability.

45
  • This spontaneous reduction of diversification is
    entirely due to estimation error and does not
    reflect any real structure of the objective
    function.
  • In addition, for the next sample a completely
    different set of weights will become zero the
    solution keeps jumping about on the walls of the
    allowed region.
  • Clearly, in this situation the solution reflects
    the structure of the limit system (i.e. the
    portfolio managers beliefs), rather than the
    structure of the market. Therefore, whenever we
    are working in, or close to, the unstable region
    (which is almost always), the constraints only
    mask rather than cure the instability.

46
Closing remarks on portfolio selection
  • Given the nature of the portfolio optimization
    task, one will typically work in that region of
    parameter space where sample fluctuations are
    large. Since the critical point where these
    fluctuations diverge depends on the risk measure,
    the confidence level, and on the method of
    estimation, one must be aware of how close ones
    working point is to the critical boundary,
    otherwise one will be grossly misled by the
    unstable algorithm.
  • The divergent estimation error due to information
    deficit is related to the instability discovered
    by Marsili in complete market models.

47
  • Downside risk measures have been introduced,
    because they ignore positive fluctuations that
    investors are not supposed to be afraid of.
  • Perhaps they should be the downside risk
    measures display the instability described here
    which is basically due to a false arbitrage alert
    and may induce an investor to take very large
    positions on the basis of fragile information
    stemming from finite samples.
  • In a way, the global disaster engulfing us is a
    macroscopic example of such a folly.

48
(No Transcript)
49
One more step Portfolio optimization is
equivalent to Linear Regression
50
  • Linear regression is a standard framework in
    which to attempt to construct a first statistical
    model.
  • It is ubiquitous (microarrays, medical sciences,
    epidemology, sociology, macroeconomics, etc.)
  • It has a time-honored history and works fine
    especially if the independent variables are few,
    there are enough data, and they are drawn from a
    tight distribution (such as a Gaussian)
  • Complications arise if we have a large number of
    explicatory variables (their number grows at a
    rate of 5 per decade), and a limited number of
    data (as almost always).
  • Then we face a serious estimation error problem.

51
  • Linear regression is a standard framework in
    which to attempt to construct a first statistical
    model.
  • It is ubiquitous (microarrays, medical sciences,
    epidemology, sociology, macroeconomics, etc.)
  • It has a time-honored history and works fine
    especially if the independent variables are few,
    there are enough data, and they are drawn from a
    tight distribution (such as a Gaussian)
  • Complications arise if we have a large number of
    explicatory variables (their number grows at a
    rate of 5 per decade), and a limited number of
    data (as almost always).
  • Then we face a serious estimation error problem.

52
  • Linear regression is a standard framework in
    which to attempt to construct a first statistical
    model.
  • It is ubiquitous (microarrays, medical sciences,
    epidemology, sociology, macroeconomics, etc.)
  • It has a time-honored history and works fine
    especially if the independent variables are few,
    there are enough data, and they are drawn from a
    tight distribution (such as a Gaussian)
  • Complications arise if we have a large number of
    explicatory variables (their number grows at a
    rate of 5 per decade), and a limited number of
    data (as almost always).
  • Then we face a serious estimation error problem.

53
  • Linear regression is a standard framework in
    which to attempt to construct a first statistical
    model.
  • It is ubiquitous (microarrays, medical sciences,
    epidemology, sociology, macroeconomics, etc.)
  • It has a time-honored history and works fine
    especially if the independent variables are few,
    there are enough data, and they are drawn from a
    tight distribution (such as a Gaussian)
  • Complications arise if we have a large number of
    explicatory variables (their number grows at a
    rate of 5 per decade), and a limited number of
    data (as almost always).
  • Then we face a serious estimation error problem.

54
Assume we know the underlying process and
minimize the residual error for an infinitely
large sample
55
In practice we can only minimize the residual
error for a sample of length T
56
The relative error
  • This is a measure of the estimation error.
  • It is a random variable, depends on the sample
  • Its distribution strongly depends on the ratio
    N/T, where N is the number of dimensions and T
    the sample size.
  • The average of qo diverges at a critical value of
    N/T!

57
Critical behaviour for N,T large, with N/Tfixed
  • The average of qo diverges at the critical point
    N/T1, just as in portfolio theory.

The regression coefficients fluctuate wildly
unless N/T 1. Geometric interpretation one
cannot fit a plane to one point.
58
MODELING COMPLEX SYSTEMS
59
  • Normally, one is supposed to work in the NltltT
    limit, i.e. with low dimensional problems and
    plenty of data.
  • Complex systems are very high dimensional and
    irreducible (incompressible), they require a
    large number of explicatory variables for their
    faithful representation.
  • Therefore, we have to face the unconventional
    situation in the regression problem that NT, or
    even NgtT, and then the error in the regression
    coefficients will be large.

60
  • Normally, one is supposed to work in the NltltT
    limit, i.e. with low dimensional problems and
    plenty of data.
  • Complex systems are very high dimensional and
    irreducible (incompressible), they require a
    large number of explicatory variables for their
    faithful representation.
  • Therefore, we have to face the unconventional
    situation in the regression problem that NT, or
    even NgtT, and then the error in the regression
    coefficients will be large.

61
  • Normally, one is supposed to work in the NltltT
    limit, i.e. with low dimensional problems and
    plenty of data.
  • Complex systems are very high dimensional and
    irreducible (incompressible), they require a
    large number of explicatory variables for their
    faithful representation.
  • Therefore, we have to face the unconventional
    situation in the regression problem that NT, or
    even NgtT, and then the error in the regression
    coefficients will be large.

62
  • If the number of explicatory variables is very
    large and they are all of the same order of
    magnitude, then there is no structure in the
    system, it is just noise (like a completely
    random string). So we have to assume that some of
    the variables have a larger weight than others,
    but we do not have a natural cutoff beyond which
    it would be safe to forget about the higher order
    variables. This leads us to the assumption that
    the regression coefficients must have a scale
    free, power law like distribution for complex
    systems.

63
  • How can we understand that, in the social
    sciences, medical sciences, etc., we are getting
    away with insufficient statistics, even with NgtT?
  • We are projecting external information into our
    statistical assessments. (I can draw a
    well-determined straight line across even a
    single point, if I know that it must be parallel
    to another line.)
  • Humans do not optimize, but use quick and dirty
    heuristics. This has an evolutionary meaning if
    something looks vaguely like a leopard, one
    jumps, rather than trying to seek the optimal fit
    to the observed fragments of the picture to a
    leopard.
  • We are very good at completing the picture.

64
  • How can we understand that, in the social
    sciences, medical sciences, etc., we are getting
    away with insufficient statistics, even with NgtT?
  • We are projecting external information into our
    statistical assessments. (I can draw a
    well-determined straight line across even a
    single point, if I know that it must be parallel
    to another line.)
  • Humans do not optimize, but use quick and dirty
    heuristics. This has an evolutionary meaning if
    something looks vaguely like a leopard, one
    jumps, rather than trying to seek the optimal fit
    to the observed fragments of the picture to a
    leopard.
  • We are very good at completing the picture.

65
  • How can we understand that, in the social
    sciences, medical sciences, etc., we are getting
    away with insufficient statistics, even with NgtT?
  • We are projecting external information into our
    statistical assessments. (I can draw a
    well-determined straight line across even a
    single point, if I know that it must be parallel
    to another line.)
  • Humans do not optimize, but use quick and dirty
    heuristics. This has an evolutionary meaning if
    something looks vaguely like a leopard, one
    jumps, rather than trying to seek the optimal fit
    to the observed fragments of the picture to a
    leopard.
  • We are very good at completing the picture.

66
  • Prior knowledge, the larger picture, values,
    deliberate or unconscious bias, etc. are
    essential features of model building.
  • When we have a chance to check this prior
    knowledge millions of times in carefully designed
    laboratory experiments, this is a well-justified
    procedure.
  • In several applications (macroeconomics, medical
    sciences, epidemology, etc.) there is no way to
    perform these laboratory checks, and errors may
    build up as one uncertain piece of knowledge
    serves as a prior for another uncertain
    statistical model. This is how we construct
    myths, ideologies and social theories.

67
  • Prior knowledge, the larger picture, values,
    deliberate or unconscious bias, etc. are
    essential features of model building.
  • When we have a chance to check this prior
    knowledge millions of times in carefully designed
    laboratory experiments, this is a well-justified
    procedure.
  • In several applications (macroeconomics, medical
    sciences, epidemology, etc.) there is no way to
    perform these laboratory checks, and errors may
    build up as one uncertain piece of knowledge
    serves as a prior for another uncertain
    statistical model. This is how we construct
    myths, ideologies and social theories.

68
  • Prior knowledge, the larger picture, values,
    deliberate or unconscious bias, etc. are
    essential features of model building.
  • When we have a chance to check this prior
    knowledge millions of times in carefully designed
    laboratory experiments, this is a well-justified
    procedure.
  • In several applications (macroeconomics, medical
    sciences, epidemology, etc.) there is no way to
    perform these laboratory checks, and errors may
    build up as one uncertain piece of knowledge
    serves as a prior for another uncertain
    statistical model. This is how we construct
    myths, ideologies and social theories.

69
  • It is conceivable that theory building (in the
    sense of constructing a low dimensional model)
    for social phenomena will prove to be impossible,
    and the best we will be able to do is to build a
    life-size computer model of the system, a kind of
    gigantic Simcity, or Borges map.
  • By playing and experimenting with these models we
    may develop an intuition about their complex
    behaviour that we couldnt gain by observing the
    single sample of a society or economy.

70
  • THANK YOU!
Write a Comment
User Comments (0)
About PowerShow.com