Title: Statistical physics and finance
1Statistical physics and finance
- I. Kondor
- Collegium Budapest and Eötvös University
- Seminar talk at Morgan-Stanley Fixed Income
- Budapest, March 1, 2007
2Coworkers
- Sz. Pafka (ELTE CIB Bank Paycom, Santa Monica)
- G. Nagy (Debrecen University CIB Bank)
- R. Karádi (Budapest University of Technology
ProcterGamble) - N. Gulyás (ELTE Budapest Bank Lombard Leasing
ELTE Collegium Budapest) - I. Varga-Haszonits István (ELTE Morgan-Stanley)
- G. Papp (ELTE)
- A. Ciliberti (Roma and ScienceFinance, Paris)
- M. Mézard (Orsay)
3Contents
- Links between economics and physics
- What can physics offer to finance that
mathematics might not? - Three examples random matrices, phase
transitions and replicas
4Early links
- The physics-complex of classical economics
- Maxwell
- Bachelier
5Physicists in finance
- From the early nineties on financial institutions
hire more and more physicists. - Some 30-35 of the invited speakers of risk
managements conferences are ex-physicists. - Today finance is one of the standard fields of
employment for physics graduates and PhDs (EU
document on the harmonization of the Bologna-type
higher education curricula Tuning Educational
Structures in Europe http//tuning.unideusto.org/
tuningeu/ ).
6Econophysics is there such a thing?
- The term was introduced by H. E. Stanley, it is
not universally beloved, but wide-spread. - Do these two disciplines have anything to do with
each other? - A trivial answer we are dealing with stochastic
processes in finance, and statistical physics is
their main field of application. - But the theory of stochastic processes in its
pure form belongs to probability theory.
7So the question is Why do banks hire not only
probabilists, applied mathematicians, computer
scientists, statisticians, etc., but also
physicists?
- What is the special knowledge or skill, if any,
that physicists can bring into finance? What can
physics offer to finance? (Stanley at the Nikkei
conference) - A common, albeit vague, answer modeling skills,
creative use of mathematics, knowledge of a
wide spectrum of approximation and numerical
methods, etc. may contribute to the market value
of physicists.
8A bit deeper
- Physics has got the farthest in the understanding
of strongly interacting systems and collective
phenomena. - Textbook-economics is, at best, on the conceptual
level of mean-field theory even today
(representative agent). - The building up of structures and new qualities
from simple interactions, emergence, collective
coordinates, averaging over microscopic degrees
of freedom, etc. these conceptual tools are not
known in finance or economics at large (cf. Basel
II).
9Therefore I think that
- some knowledge of quantum mechanics, many body
problem, field theory, renormalisation, phase
transitions, nonlinear and complex systems, etc.,
although neither necessary nor sufficient, may be
useful (as a conceptual introduction or just as a
source of metaphores) in the understanding of
social phenomena, including the market.
10In this talk I will illustrate the use of
conceptual tools imported from physics on the
following three examples
- Random matrices
- Phase transitions and critical phenomena
- Replica method
11The concrete field of application will be the
problem of portfolio selection
- The basic question How to distribute our wealth
over the set of possible investment instruments
so that to earn the highest return at the lowest
risk? - Here I will focus my attention on the minimal
risk portfolio, irrespective of the return.
12The original formulation of the problem
-
- The returns , i1,2,,N, are random
variables drawn from a known (say, multivariate
normal) distribution, with covariance matrix
( is the correlation
matrix, the standard deviation of ). - Find the weights , , for
which the variance - of the portfolio
is minimal.
13Unconstrained short selling
- We have not stipulated that the weights be
positive, they can be of either sign, with an
arbitrarily large absolute value. This is
obviously unrealistic, for, among other things,
liquidity reasons. Nevertheless, it is useful to
consider the problem first in this idealised form
(just as the finance text-books do), because then
the optimal weights can be calculated
analytically - If we ban short selling, the task becomes one in
quadratic programming.
14Infinite volume limit
- Allowing unlimited short selling makes the domain
of the optimization task infinite. This is not an
innocent idealisation, because, as we will see,
the solution vector can show huge fluctuations,
and the restriction on the domain could bound
these fluctuations . - Similarly to the theory of phase transitions,
however, it is expedient to understand the
essence of the phenomenon in the limit of
infinite volume, and take into account the
finite-volume effects only later.
15Variants of the problem
- When we use the standard deviation as a risk
measure, we are assuming that the underlying
process is normal, or has some similarly
concentrated distribution. Typically, financial
processes are not like this. - Alternative risk measures mean absolute
deviation (MAD), average loss above a high
threshold (ES), maximal loss (ML), or, indeed,
any homogeneous convex functional defined over
the distribution of the losses.
16Empirical covariance matrices
- The covariance matrix has to be determined from
measurements on the market. From the returns
observed at time t we have the estimator - The number of covariance matrix elements of a
portfolio composed of N instruments is O(N²). In
the time series of length T of N instruments we
have NT data. In order to have a precise
estimate, we should have N ltltT . Large portfolios
can contain hundreds of instruments, while it is
hardly meaningful to use data older than, say, 4
years, that is T1000. Therefore the inequality
N/T ltlt 1 almost never holds in reality. Thus our
estimates will contain a lot of noise, and the
estimation error will depend on the scaling
variable N/T .
17Information deficit
- Thus the Markowitz problem suffers from the
curse of dimensions, or from information
deficit - The estimates will contain error and the
resulting portfolios will be suboptimal - How serious is this effect?
- How sensitive are the various risk measures to
this kind of error? - How can we reduce the error?
18Fighting the curse of dimensions
- Economists have been struggling with this problem
for ages. Since the root of the problem is lack
of sufficient information, the remedy is to
inject external info into the estimate. This
means imposing some structure on s. This
introduces bias, but beneficial effect of noise
reduction may compensate for this. - Examples
- single-index models (ßs) All these help
to - multi-index models various degrees.
- grouping by sectors Most studies are
based - principal component analysis on
empirical data - Bayesian shrinkage estimators, etc.
- Random matrix theory
19Random matrices
20Origins of random matrix theory (RMT)
- Wigner, Dyson 1950s
- Originally meant to describe (to a zeroth
approximation) the spectral properties of heavy
atomic nuclei - - on the grounds that something that is
sufficiently complex is almost random - - fits into the picture of a complex system, as
one with a large number of degrees of freedom,
without symmetries, hence irreducible, quasi
random. - - markets, by the way, are considered stochastic
for similar reasons
21RMT
- Later found applications in a wide range of
problems, from quantum gravity through quantum
chaos, mesoscopics, random systems, etc., etc. - Has developed into a rich field with a huge set
of results for the spectral properties of various
classes of random matrices - They can be thought of as a set of central limit
theorems for matrices
22Wigner semi-circle law
- Mij symmetrical NxN matrix with i.i.d. elements
(the distribution has zero mean and finite second
moment) - ?k eigenvalues of
- The density of eigenvalues ?k (normed by N) goes
to the Wigner semi-circle for N?8 with prob. 1 -
- ,
- , otherwise
-
23Remarks on the semi-circle law
- Can be proved by the method of moments (as done
originally by Wigner) or by the resolvent method
(Marchenko and Pastur and countless others) - Holds also for slightly dependent or
non-homogeneous entries - The convergence is fast (believed to be of 1/N,
but proved only at a lower rate), especially what
concerns the support
24Wishart matrices
- Generate very long time series for N iid random
variables, with an arbitrary distribution of
finite variance, and cut out samples of length T
from these, as if making empirical observations. - The true covariance matrix of these variables
is the unit matrix, but if we try to reconstruct
this from the simulated samples we will not
recover the unit matrix for any finite T.
Instead, we will have an empirical covariance
matrix.
25 Correlation matrix of iid normal random variables
- The spectrum consists of a single,
N-fold degenerate - eigenvalue ? 1
-
-
- The noise lifts the degeneracy and makes
a band out of the single eigenvalue.
1
0
C
26The corresponding empirical covariance matrix
is the Wishart matrix
- If N and T ?8 such, that their ratio r N/T
is fixed, lt 1, then the spectrum of this
empirical covariance matrix will be the Wishart
or Marchenko-Pastur spectrum (eigenvalue
distribution) -
where
27Remarks
- The theorem also holds when the (average) sample
covariance matrix is of finite rank - The assumption that the entries are identically
distributed is not necessary - If T lt N the distribution is the same with an
extra point of mass 1 T/N at the origin - If T N the Marchenko-Pastur law is the squared
Wigner semi-circle - The proof extends to slightly dependent and
inhomogeneous entries - The convergence is fast, believed to be of 1/N ,
but proved only at a lower rate
28N1000 T/N2
29- If the matrix elements are not centered but have,
say, a common mean, one large eigenvalue breaks
away, the rest stay in the random band - Eigenvector components just as in the Wigner
case, the eigenvectors in the bulk are random,
the one outside is delocalized (has nonzero
entries everywhere) - There is a lot of fluctuation, level crossing,
random rotation of eigenvectors taking place in
the random band - The eigenvector belonging to the large eigenvalue
(when there is one) is much more stable. The
larger the eigenvalue, the more so.
30An intriguing observation
- L.Laloux, P. Cizeau, J.-P. Bouchaud, M. Potters,
PRL 83 1467 (1999) and Risk 12 No.3, 69 (1999) - and
- V. Plerou, P. Gopikrishnan, B. Rosenow, L.A.N.
Amaral, H.E. Stanley, PRL 83 1471 (1999) - noted that there is such a huge amount of noise
in empirical covariance matrices that it may be
enough to make them useless. - A paradox Covariance matrices are in widespread
use and banks still survive ?!
31Laloux et al. 1999
The spectrum of the covariance matrix obtained
from the time series of SP 500 with N406,
T1308, i.e. N/T 0.31, compared with that of a
completely random matrix (solid curve). Only
about 6 of the eigenvalues lie beyond the random
band.
32Remarks on the paradox
- The number of junk eigenvalues may not
necessarily be a proper measure of the effect of
noise The small eigenvalues and their
eigenvectors fluctuate a lot, indeed, but perhaps
they have a relatively minor effect on the
optimal portfolio, whereas the large eigenvalues
and their eigenvectors are fairly stable. - The investigated portfolio was too large compared
with the length of the time series (although it
is hard to find a better ratio in practice). - Working with real, empirical data, it is hard to
distinguish the effect of insufficient
information from other parasitic effects, like
nonstationarity (which is why we prefer to work
with simulated data for the purposes of
theoretical studies).
33A filtering procedure suggested by RMT
- The appearence of random matrices in the context
of portfolio selection triggered a lot of
activity, mainly among physicists. Laloux et al.
and Plerou et al. proposed a filtering method
based on random matrix theory (RMT) subsequently.
This has been further developed and refined by
many workers. - The proposed filtering consists basically in
discarding as pure noise that part of the
spectrum that falls below the upper edge of the
random spectrum. Information is carried only by
the eigenvalues and their eigenvectors above this
edge. Optimization should be carried out by
projecting onto the subspace of large
eigenvalues, and replacing the small ones by a
constant chosen so as to preserve the trace. This
would then drastically reduce the effective
dimensionality of the problem.
34- Interpretation of the large eigenvalues The
largest one is the market, the other big
eigenvalues correspond to the main industrial
sectors. - The method can be regarded as a systematic
version of principal component analysis, with an
objective criterion on the number of principal
components. - In order to better understand this novel
filtering method, I introduce a simple market
model
35Simple modell market sectors
single
- fold degenerate
1
- fold degenerate
36- The empirical covariance martix corresponding to
this model consists of the Marchenko Pastur
spectrum, a large (Frobenius-Perron) eigenvalue
(the whole market), and a number of medium-sized
eigenvalues. - If we resolve the equivalence of the sectors,
with the appropriate tuning of the parameters we
can mimic the spectrum observed on real markets
(Noh model)
37- We have made extensive studies on the RMT-based
filtering, and found that it performs
consistently well compared with other, more
conventional methods. - An additional advantage is that the method can be
tuned according to the assumed structure of the
market. - There are attempts to extract information from
below the random band edge.
38Divergent sampling error an algorithmic phase
transition
39A measure of the effect of noise
- Assume we know the true covariance matrix and
- the noisy one . Then a natural, though not
unique, - measure of the impact of noise is
- where w are the optimal weights corresponding
- to and , respectively.
40The model-simulation approach
- For the purposes of our numerical calculations
we chose various model covariance matrices
and generated long simulated time series with
them. Then we cut out segments of length T from
these time series, as if observing them on the
market, and tried to reconstruct the covariance
matrices from them. We optimized a portfolio both
with the true and with the observed
covariance matrix and determined the measure .
41Fluctuations over the samples
- The relative error refers to a given
sample, so it is a random variable that
fluctuates from sample to sample. - Likewise, there are strong fluctuations in the
weights of the optimal portfolio
42The distribution of qo over the samples
43The average of qo as function of N/T
44The divergent error signals an algorithmic phase
transition (I.K., Sz. Pafka, G. Nagy)
- The rank of the covariance matrix is minN,T
- In the limit N/T 1 the lower band edge of
eigenvalues goes to zero, around the lower edge
there are many small eigenvalues many soft
modes. - N/T 1 is the critical point of the problem
- Upon approaching the critical point we find
scaling laws, e.g. -
- the expectation value of the portfolio error is
, -
-
- while the standard deviation diverges as
- For TltN zero modes appear and the optimization
becomes meaningless
45Fluctuations of the weights the distribution of
the weights of a portfolio consisting of N100
iid normal variables, in a given sample, for T500
46Sample to sample fluctuation of the weight of a
given instrument, non-overlapping windows, N100,
T500
47Fluctuation of the weight of a given instrument,
step by step moving average, N100, T500
48After RMT filtering the error drops to an
acceptable level and we can even penetrate the
region TltN
49Finite volume
- A ban on short selling, or any other constraint
that renders the domain of optimization finite,
or filtering will all supress the infinite
fluctuations. However, the weights will keep
wildly fluctuating as we approach N/T1 and an
increasing part of them will stick to the walls
of the allowed region. These zero weights will
belong to different instruments in different
samples. If we are not sufficiently far away from
the critical point, the solution of the Markowitz
problem cannot serve as the basis of rational
decision making.
50Universality
- We have studied a number of different market
models, different risk measures and different
underlying processes (including fat tailed ones,
and, with István Varga-Haszonits, also
autoregressive, GARCH-like processes). The value
of the critical point and the coefficients can
change, but we have not yet found convincing
evidence for any change in the critical exponents
we have not yet discovered the boundaries of
the universality class.
51How come these phenomena have not been noticed
earlier?
- Somehow the scaling has escaped attention
- Typically, econometricians study the limit
Nfixed, T?8 , rather than N/Tfixed, N,T ?8 . - The instability of the weights is an everyday
experience, but the idea that their fluctuations
can actually diverge has not arisen. If one
insists on using empirical data, one cannot study
the fluctuations over the samples, because there
are not enough of them. - Random matrices, critical phenomena, zero modes,
etc. are mostly unknown in finance. - The different aspects of the problem have not
been integrated into a coherent picture that can
only be achieved on the basis of the phase
transition concept.
52Replicas
53Optimization and statistical mechanics
- Any convex optimization problem can be
transformed into a problem in statistical
mechanics, by promoting the objective function
into a Hamiltonian, and introducing a fictitious
temperature. At the end we can recover the
original problem in the limit of zero
temperature. - Averaging over the time series segments (samples)
is similar to what is called quenched averaging
in the statistical physics of random systems one
has to average the logarithm of the partition
function (i.e. the cumulant generating function). - Averaging can then be performed by the replica
trick a heuristic, but very powerful method
that is on its way to be firmly established by
mathematicians (Guerra and Talagrand).
54The first application of replicas in a finance
context the ES phase boundary (A. Ciliberti,
I.K., M. Mézard)
- ES is the average loss above a high threshold ß
(a conditional expectation value). Very popular
among academics and slowly spreading in practice.
In addition, as shown by Uryasev and Rockafellar,
the optimization of ES can be reduced to linear
programming, for which very fast algorithms
exist. - Portfolios optimized under ES are much more noisy
than those optimized under either the variance or
absolute deviation. The critical point of ES is
always below N/T 1/2 and it depends on the
threshold, so it defines a phase boundary on the
N/T- ß plane. - The measure ES can become unbounded from below
with a certain probability for any finite N and T
, and then the optimization is not feasible! - The transition for finite N,T is smooth, for N,T
?8 it becomes a sharp phase boundary that
separates the region where the optimization is
feasible from that where it is not.
55Formulation of the problem
- The time series of returns
- The objective function
- The variables
- The linear programming problem
- Normalization
-
56Associated statistical mechanics problem
- Partition function
- Free energy
- The optimal value of the objective function
57The partition function
58Replicas
- Trivial identity
- We consider n identical replicas
- The probability distribution of the n-fold
replicated system - At an appropriate moment we have to analytically
continue to real ns
59Averaging over the random samples
60Replica-symmetric Ansatz
- By symmetry considerations
- Saddle point condition
- where
61Condition for the existence of a solution to the
linear programming problem
- The meaning of the parameter
- Equation of the phase boundary
-
62(No Transcript)
63The limit
- The problem goes over into the minimax problem of
maximal loss - In this limit the phase boundary can be
determined by a direct geometric argument (I.K.,
Sz. Pafka, G. Nagy)
64The probability of the solvability of the minimax
problem
- For TgtN the probability of a solution (for any
elliptical underlying process) -
- (The problem is isomophic to some operations
research or random geometrical tasks Todd, M.J.
(1991), Probabilistic models for linear
programming, Math. Oper. Res. 16, 671-693. ) - For large N and T , p goes over into the error
function. - For N,T? 8, the transition becomes sharp at
N/T1/2.
65(No Transcript)
66(No Transcript)
67Conclusions
- P.W. Anderson The fact is that the techniques
which were developed for this apparently very
specialized problem of a rather restricted class
of special phase transitions and their behavior
in a restricted region are turning out to be
something which is likely to spread over not just
the whole of physics but the whole of science.
68In a similar spirit...
- I think the phenomenon treated here, that is the
sampling error catastrophe due to lack of
sufficient information appears in a much wider
set of problems than just the problem of
investment decisions. (E.g. multivariate
regression, all sorts of linearly programmable
technology and economy related optimization
problems, microarrays, etc.) - Whenever a phenomenon is influenced by a large
number of factors, but we have a limited amount
of information about this dependence, we have to
expect that the estimation error will diverge and
fluctuations over the samples will be huge.