Title: Econ 600: Mathematical Economics
1Econ 600 Mathematical Economics
- July/August 2006
- Stephen Hutton
2Why optimization?
- Almost all economics is about solving constrained
optimization problems. Most economic models
start by writing down an objective function. - Utility maximization, profit maximization, cost
minimization, etc. - Static optimization most common in
microeconomics - Dynamic optimization most common in
macroeconomics
3My approach to course
- Focus on intuitive explanation of most important
concepts, rather than formal proofs. - Motivate with relevant examples
- Practice problems and using tools in problem sets
- Assumes some basic math background (people with
strong background might not find course useful) - For more details, see course notes, textbooks,
future courses - Goal of course introduction to these concepts
4Order of material
- Course will skip around notes a bit during the
static course specifically, Ill cover the first
half of lecture 1, then give some definitions
from lecture 3, then go back to lecture 1 and do
the rest in order. - Sorry! ?
5Why not basic optimization?
- Simplest method of unconstrained optimization
(set deriv 0) often fails - Might not identify the optima, or optima might
not exist - Solution unbounded
- Function not always differentiable
- Function not always continuous
- Multiple local optima
6Norms and Metrics
- It is useful to have some idea of distance or
closeness in vector space - The most common measure is Euclidean distance
this is sufficient for our purposes (dealing with
n-dimensional real numbers) - General requirements of norm anything that
satisfies conditions 1), 2), 3) (see notes)
7Continuity
- General intuitive sense of continuity (no gaps or
jumps). Whenever x is close to x, f(x) is close
to f(x) - Formal definitionsA sequence of elements, xn
is said to converge to a point, x in Rn if for
every ? gt 0 there is a number, N such that for
all n lt N, xn-x lt ?. - A function fRn?Rn is continuous at a point, x if
for ALL sequences xn converging to x, the
derived sequence of points in the target space
f((xn) converges to the point f(x). - A function is continuous if it is continuous at
all points in its domain. - What does this mean in 2d? Sequence of points
converging from below, sequence of points
converging from above. Holds true in higher
levels of dimensionality.
8Continuity 2
- Why continuity? Needed to guarantee existence of
solution - So typically assume continuity on functions to
guarantee (with other assumptions) that a
solution to the problem exists - Sometimes continuity is too strong. To guarantee
a maximum, upper semi-continuity is enough. To
guarantee a minimum, lower semi-continuity - Upper semi-continuity For all xn ? x, limn??
f(xn) f(x) - Lower semi-continuity For all xn ? x, limn??
f(xn) ? f(x) - Note that if these hold with equality, we have
continuity. - Note, figure 6 in notes is wrong
9Open sets(notes from lecture 3)
- For many set definitions and proofs we use the
concept of an open ball of arbitrarily small
size. - An open ball is a set of points (or vectors)
within a given distance from a particular point
(or vector). FormallyLet e be a small real
number. Be(x)y x-ylt e. - A set of points S in Rn is open if for all points
in S, there exists an open ball that is entirely
contained within S. Eg (1,2) vs (1,2. - Any union of open sets is open.
- Any finite intersection of open sets is open.
10Interior, closed set(notes in lecture 3)
- The interior of a set S is the largest open set
contained in S. Formally, Int(S) UiSi where Si
is an open subset of S. - If S is open, Int(S)S
- A set is closed if all sequences within the set
converge to points within the set. Formally, fix
a set S and let xm be any sequence of elements
in S. If limm??xmr where r is in S, for all
convergent sequences in S, then S is closed. - S is closed if and only if SC is open.
11Boundary, bounded, compact(notes in lecture 3)
- The boundary of a set S denoted B(S) is the set
of points such that for all egt0, Be(x)nS is not
empty and Be(x)nSC is not empty. Ie any open
ball contains points both in S and not in S. - If S is closed, SB(S)
- A set S is bounded if the distance between all
objects in the set is finite. - A set is compact if it is closed and bounded.
- These definitions correspond to their commonsense
interpretations.
12Weierstrasss Theorem(notes in lecture 3)
- Gives us a sufficient condition to ensure that a
solution to a constrained optimization problem
exists. If the constraint set C is compact and
the function f is continuous, then there always
exists at least one solution tomax f(x) s.t. x
is in C - Formally Let fRn?R be continuous. If C is a
compact subset of Rn, then there exists x in C,
y in C s.t. f(x)?f(x)?f(y) for all x in C.
13Vector geometry
- Want to extend intuition about slope 0 idea of
optimum to multiple dimensions. We need some
vector tools to do this - Inner product xy(x1y1x2y2xnyn)
- Euclidean norm and inner product related
x2xx - Two vectors are orthogonal (perpendicular) if xy
0. - Inner product of two vectors v, w is vw in
matrix notation. - vw gt 0 then v, w form acute angle
- vw lt 0 then v, w form obtuse angle.
- vw 0 then v, w orthogonal.
14Linear functions
- A function fV?W is linear if for any two real
numbers a,b and any two elements v,v in V we
have f(avbv) af(v)bf(v) - Note that our usual interpretation of linear
functions in R1 (f(x)mxb) are not generally
linear, these are affine. (Only linear if b0). - Every linear function defined on Rn can be
represented by an n-dimensional vector
(f1,f2,fn) with the feature that f(x) Sfixi - Ie value of function at x is inner product of
defining vector with x. - Note, in every situation we can imagine dealing
with, functionals are also functions.
15Hyperplanes
- A hyperplane is the set of points given by
xf(x)c where f is a linear functional and c
is some real number. - Eg1 For R2 a typical hyperplane is a straight
line. - Eg2 For R3 a typical hyperplane is a plane.
- Think about a hyperplane as one of the level sets
of the linear functional f. As we vary c, we
change level sets. - The defining vector of f(x) is orthogonal to the
hyperplane.
16Separating Hyperplanes
- A half-space is the set of points on one side of
a hyperplane. Formally HS(f) xf(x)?c or
HS(f) xf(x)c. - Consider any two disjoint sets when can we
construct a hyperplane that separates the sets? - Examples in notes.
- If C lies in a half-space defined by H and H
contains a point on the boundary of C, then H is
a supporting hyperplane of C.
17Convex sets
- A set is convex if the convex combination of all
points in a set is also in the set. - No such thing as a concave set. Related but
different idea to convex/concave functions. - Formally a set C in Rn is convex if for all x, y
in C, for all ? between 0,1 we have ?x(1-?)y
is in C. - Any convex set can be represented as intersection
of halfspaces defined by supporting hyperplanes. - Any halfspace is a convex set.
18Separating Hyperplanes 2
- Separating hyperplane theorem Suppose X, Y are
non-empty convex sets in Rn such that the
interior of YnX is empty and the interior of Y is
not empty.Then there exists a vector a in Rn
which is the defining vector of a separating
hyperplane between X and Y.Proof in texts. - Applications general equilibrium theory, second
fundamental theorem of welfare economics.
Conditions where a pareto optimum allocation can
be supported as a price equilibrium. Need convex
preferences to be able to guarantee that there is
a price ratio (a hyperplane) that can sustain an
equilibrium.
19Graphs
- The graph is what you normally see when you plot
a function. - Formally the graph of a function from V to W is
the ordered pair of elements,
20Derivatives
- We already know from basic calculus that a
necessary condition for x to be an unconstrained
maximum of a function f is that its derivative be
zero (if the derivative exists) at x. - A derivative tells us something about the slope
of the graph of the function. - We can also think about the derivative as telling
us the slope of the supporting hyperplane to the
graph of f at the point (x,f(x)).(see notes)
21Multidimensional derivativesand gradients
- We can extend what we know about derivatives from
single-dimensional space to multi-dimensional
space directly. - The gradient of f at x is just the n-dimensional
(column) vector which lists all the partial
derivatives if they exist. - This nx1 matrix is also known as the Jacobian.
- The derivative of f is the transpose of the
gradient. - The gradient can be interpreted as a supporting
hyperplane of the graph of f.
22Second order derivatives
- We can think about the second derivative of
multidimensional functions directly as in the
single dimension case. - The first derivative of the function f was an nx1
vector the second derivative is an nxn matrix
known as the Hessian. - If f is twice continuously differentiable (ie all
elements of Hessian exist) then the Hessian
matrix is symmetric (second derivatives are
irrespective of order).
23Homogeneous functions
- Certain functions in Rn are particularly
well-behaved and have useful properties that we
can exploit without having to prove them every
time. - A function fRn?R is homogeneous of degree k if
f(tx1,tx2,.,tkf(x).In practice we will deal
with homogeneous functions of degree 0 and degree
1.Eg demand function is homog degree 0 in
prices (in general equilibrium) or in prices and
wealth double all prices and income has no
impact on demand. - Homogeneous functions allow us to determine the
entire behavior of the function from only knowing
about the behavior in a small ball around the
originWhy? Because for any point x, we can
define x as a scalar multiple of some point x it
that ball, so xtx - If k1 we say that f is linearly homogeneous.
- Eulers theorem if f is h.o.d. k then
24Homogenous functions 2
- A ray through x is the line (or hyperplane)
running through x and the origin running forever
in both directions.Formally a ray is the set
x in Rnxtx, for t in R - The gradient of a homogenous function is the
essentially the same along any ray (linked by a
scalar multiple). Ie the gradient at x is
linearly dependent with the gradient at x.Thus
level sets along any ray have the same
slope.Application homogeneous utility functions
rule out income effects in demand. (At constant
prices, consumers demand goods in the same
proportion as income changes.)
25Homothetic functions
- A function fRn?R is homothetic if f(x)h(v(x))
where hR?R is strictly increasing and vR?R
is h.o.d. k. - Application we often assume that preferences are
homothetic. This gives that indifference sets
are related by proportional expansion along rays. - This means that we can deduce the consumers
entire preference relation from a single
indifference set.
26More properties of gradients(secondary
importance)
- Consider a continuously differentiable function,
fRn?R. The gradient of f (Df(x)) is a vector in
Rn which points in the direction of greatest
increase of f moving from the point x. - Define a (very small) vector v s.t. Df(x)v0 (ie
v is orthogonal to the gradient). Then the
vector v is moving us away from x in a direction
that adds zero to the value of f(x). Thus, any
points on the vector v are at the same level of
f(x). So we have a method of finding the level
sets of f(x) by solving Df(x)v0. Also, v is
tangent to the level set of f(x). - The direction of greatest increase of a function
at a point x is at right angles to the level set
at x.
27Upper contour sets
- The level sets of a function are the set of
points which yield the same value of the
function. Formally, for fRn?R the level set is
xf(x)cEg indifference curves are level sets
of utility functions. - The upper contour set is the set of points above
the level set, ie the set xf(x)? c.
28Concave functions
- For any two points, we can trace out the line of
points joining them through tx(1-t)y, varying t
between 0 and 1. This is a convex combination of
x and y. - A function is concave if for all x, yie line
joining any two points is (weakly) less than the
graph of the function between those two points - A function is strictly concave if the inequality
is strict for all x,y.
29Convex functions
- A function is convex if for all x, yie line
joining any two points is (weakly) greater than
the graph of the function between the points. - A function is strictly convex if the inequality
is strict for all x,y. - A function f is convex if f is concave.
- The upper contour set of a convex function is a
convex set. The lower contour set of a concave
function is a convex set.
30Concavity, convexity and second derivatives
- If fR?R and f is C2, then f is concave iff
f(x)0 for all x. (And strictly concave for
strict inequality). - If fR?R and f is C2, then f is convex iff
f(x)?0 for all x. (And strictly convex for
strict inequality).
31Concave functions and gradients
- Any concave function lies below its gradient (or
below its subgradient if f is not C1). - Any convex function lies above its gradient (or
above subgradient if f is not C1. - Graphically function lies below/above line
tangent to graph at any point.
32Negative and positive (semi-) definite
- Consider any square symmetric matrix A.
- A is negative semi-definite if xAx0 for all
x.If in addition xAx0 implies that x0, then A
is negative definite. - A is positive semi-definite if xAx?0 for all
x.If in addition xAx0 implies that x0, then A
is positive definite.
33Principal minors and nsd/psd
- Let A be a square matrix. The kth order leading
principal minor of A is the determinant of the
kxk matrix obtained by deleting the last n-k rows
and columns. - An nxn square symmetric matrix is positive
definite if its n leading principal minors are
strictly positive. - An nxn square symmetric matrix is negative
definite if its n leading principal minors are
alternate in sign with a11 lt 0. - There are conditions for getting nsd/psd from
principal minors.
34Reminder determinant of a 3x3 matrix
- You wont have to take the determinant of a
matrix bigger than 3x3 without a computer, but
for 3x3
35Concavity/convexity and nd/pd
- Any ease way to identify if a function is convex
or concave is from the Hessian matrix. - Suppose fRn?R is C2. Then
- f is strictly concave iff the Hessian matrix is
negative definite for all x. - f is concave iff the Hessian matrix is negative
semi-definite for all x. - f is strictly convex iff the Hessian matrix is
positive definite for all x. - f is convex iff the Hessian matrix is positive
semi-definite for all x.
36Quasi-concavity
- A function is quasi-concave if f(tx
(1-t)y)?minf(x),f(y) for x,y in Rn, 0t1 - Alternatively a function is quasi-concave if its
upper contour sets are convex sets. - A function is strictly quasi-concave if in
addition f(tx (1-t)y)minf(x),f(y) for 0lttlt1
implies that xy - All concave functions are quasi-concave (but not
vice versa). - Why quasi-concavity? Strictly quasi-concave
functions have a unique maximum.
37Quasi-convexity
- A function is quasi-convex if f(tx (1-t)y)
maxf(x),f(y) for x,y in Rn, 0t1 - Alternatively a function is convex if its lower
contour sets are convex sets. - A function is strictly quasi-convex if in
addition f(tx (1-t)y)maxf(x),f(y) for 0lttlt1
implies that xy - All convex functions are quasi-convex (but not
vice versa). - Why quasi-convexity? Strictly quasi-convex
functions have a unique miniumum.
38Bordered Hessian
- The bordered hessian matrix H is just the hessian
matrix next to the Jacobian and its transpose -
-
- If the leading principal minors of H from k3
onwards alternate in sign with the first lpmgt0,
then f is quasi-concave. If they are all
negative, then f is quasi-convex.
39Concavity and monotonic transformations
- (Not in the lecture notes, but useful for solving
some of the problem set problems). - The sum of two concave functions is concave
(proof in PS2). - Any monotonic transformation of a concave
function is quasiconcave (though not necessarily
concave). Formally, if h(x)g(f(x)), where f(x)
is concave and g(x) is monotonic, then h(x) is
quasi-concave. - Useful trick the ln(x) function is a monotonic
transformation.
40Unconstrained optimization
- If x is a solution to the problem maxxf(x), x is
in Rn, what can we say about characteristics of
x? - A point x is a global maximum of f if for all x
in Rn, f(x)?f(x). - A point x is a local maximum of f if there exists
an open ball of positive radius around x, Be(x)
s.t. for all x in the ball, f(x) ? f(x). - If x is a global maximum then it is a local
maximum (but not necessarily vice versa). - If f is C1, then if f is a local maximum of f,
then the gradient of f at x 0. Necessary but
not sufficient.This is the direct extension of
the single dimension case.
41Unconstrained optimization 2
- If x is a local maximum of f, then there is an
open ball around x, Be(x) s.t. f is concave on
Be(x). - If x is a local minimum of f, then there is an
open ball around x, Be(x) s.t. f is convex on
Be(x). - Suppose f is C2. If x is a local maximum, then
the Hessian of f at x is negative semi-definite. - Suppose f is C2. If x is a local minimum, then
the Hessian of f at x is positive semi-definite. - To identify a global max, we either solve for all
local maxima and then compare them, or look for
additional features on f that guarantee that any
local max are global.
42Unconstrained optimization 3
- If f Rn?R is concave and C1, then Df(x)0
implies that x is a global maximum of f. (And x
being a global maximum implies that the gradient
is zero.) This is both a necessary and
sufficient condition. - In general, we only really look at maximization,
since all minimization problems can be turned
into maximization problems by looking at f. - x solves max f(x) if and only if x solves min
f(x).
43Non-differentiable functions(secondary
importance)
- In economics, we rarely have to deal with
non-differentiable functions normally we assume
these away. - The superdifferential of a concave function f at
a point x is the set of all supporting
hyperplanes of the graph of f at the point
(x,f(x)). - A supergradient of a function f at a point x is
an element of the superdiffential of f at x. - If x is an unconstrained local maximum of a
function fRn?R, then the vector of n zeros must
be an element of the superdifferential of f at
x. - And equivalently subdifferential, subgradient,
local minimum for convex functions.
44Constrained optimization
- General form of constrained optimization
- Normally we write the constraint by writing out
restrictions (eg x ?1) rather than using set
notation. - Sometimes (for equality constraints) it is more
convenient to solve problems by substituting the
constraint(s) into the objective function, and so
solving an unconstrained optimization problem. - Most common restrictions equality or inequality
constraints. -
- Eg Manager trying to induce worker to provide
optimal effort (moral hazard contract).
45Constrained optimization 2
- No reason why can only have one restriction. Can
have any number of constraints, which may be of
any form. Most typically we use equality and
inequality constraints these are easier to solve
analytically than constraints that x belong to
some general set. - These restrictions define the constraint set.
- Most general notation, while using only
inequality constraints - where G(x) is a mx1 vector of inequality
constraints (m is number of constraints). - Eg For the restrictions 3x1x210, x1?2, we
have
46Constrained optimization 3
- We will need limitations on the constraint set to
guarantee solution of existence (Weierstrass
theorem). - What can happen if constraint set not convex,
closed? (examples) - Denoting constraint setscharacterizes all
values of x in Rn where f(x) ? c
47General typology of constrained maximization
- Unconstrained maximization. C is just the whole
vector space that x lies in (usually Rn). We
know how to solve these. - Lagrange Maximization problems. Here the
constraint set is defined solely by equality
constraints. - Linear programming problems. Not covered in this
course. - Kuhn-Tucker problems. These involve inequality
constraints. Sometimes we also allow equality
constraints, but we focus on inequality
constraints. (Any problem with equality
constraints could be transformed by substitution
to deal only with inequality constraints.)
48Lagrange problems
- Covered briefly here, mostly to compare and
contrast with Kuhn-Tucker. - Canonical Lagrange problem is of form
- Often we have a problem with inequality
constraints, but we can use economic logic to
show that at our solution the constraints will
bind, and so we can solve the problem as if we
had equality constraints. - Eg Consumer utility maximization if utility
function is increasing in all goods, then
consumer will spend all income. So budget
constraint pxw becomes pxw.
49Lagrange problems 2
- Lagrange theorem in the canonical Lagrange
problem (CL) above, suppose that f and G are C1
and suppose that the nxm matrix DG(x) has rank
m. Then if x solves CL, there exists a vector
? in Rn such that Df(x) DG(x) ?0. Ie -
- This is just a general form of writing what we
know from solving Lagrange problems we get n
FOCs that all equal zero at the solution. - Rank m requirement is called Constraint
qualification, we will come back to this with
Kuhn Tucker. But this is a necessary (not
sufficient) condition for the existence of
Lagrange Multipliers.
50Basic example
- max f(x1,x2) s.t. g1(x1,x2) c1, g2(x1,x2)c2
- L f(x1,x2)?1(g1(x1,x2)-c1)?2(g2(x1,x2)-c2)
- FOCsx1 f1(x1,x2) ?1g11(x1,x2) ?2g21(x1,x2)
0x2 f2(x1,x2) ?1g12(x1,x2) ?2g22(x1,x2)
0 - Plus constraints?1 g1(x1,x2) c1 0?2
g2(x1,x2) c2 0
51Lagrange problems 3
- We can also view the FOCs from the theorem as
- Ie we can express the gradient of the objective
function as a linear combination of the gradients
of the constraint functions, where the weights
are determined by ?. (see diagram in notes) - Note that no claims are made about the sign of ?
(but sign will be more important in KT).
52Kuhn Tucker 1
- The most common form of constrained optimization
in economics takes the form - (Note that we can include non-negativity
constraints inside the G(x) vector, or not.) - Examples utility maximization.
- Cost minimization
53Kuhn Tucker 2
- Key problem with inequality constraints solution
to problem might be on boundary of constraint, or
might be internal. (see diagram in notes) - Main advance of KT sets up necessary conditions
for optimum in situations where constraints bind,
and for situations where they dont. Then
compare between these cases. - Basic idea if constraints bind at a solution,
then the value of the function must decrease as
we move away from the constraint. So if at
constraint xc, we cant be at a maximum unless
f(x)?0 at that point. If constraint is x ?c, we
cant be at a maximum unless f(x)0 at that
point. Otherwise, we could increase the value of
the function without violating any of the
constraints.
54Kuhn-Tucker 3
- We say a weak inequality constraint is binding if
the constraint holds with equality. - Unlike Lagrange problems, in KT problems,
constraints might bind a solution, or they might
not (if we have an internal solution). If a
particular constraint does not bind, then its
multiplier is zero if the constraint does bind,
then the multiplier is non-zero (and is gt0 or lt0
depending on our notational formulation of the
problem). - We can think of the multiplier on a constraint as
being the shadow value of relaxing that
constraint. - Main new thing to deal with complementary
slackness conditions. Complementary slackness
conditions are a way of saying that either a) a
particular constraint is binding (and so the
respective multiplier for that constraint is
non-zero), which implies a condition on the slope
of the function at the constraint (it must be
increasing towards the constraint) b) a
constraint does not bind (so we must be in an
internal solution, with a FOC that equals zero).
55Example 1
- Max f(x) s.t. 10-x?0, x ?0L f(x)
?(10-x)FOCsx f(x)- ? 0? 10-x ?0CSCS1
(f(x)-?)x0CS2 (10-x)?0
56Example 1, contd
- Case 1, strict interior. xgt0, xlt10 From CS2, we
have ?0.From CS1, we have f(x) 0. (ie
unconstrained optimum) - Case 2, left boundary, x0.From CS2, we have
?0.From FOC1 (x) we need f(x) 0. - Case 3, right boundary, x0.From CS1, we have
f(x) ?, and we know ??0 by construction, so we
must have f(x) ?0. - Thus, we can use the KT method to reject any
candidate cases that dont have the right slope.
57Solving KT problems
- Two methods, basically identical but slightly
different in how they handle non-negativity
constraints. - Method 1 (treat non-negativity constraints as
different from other conditions) - Write the Lagrangean with a multiplier for each
constraint other than non-negativity constraints
on choice variables. If we write the constraints
in the Lagrangean as g(x)?0, we should add (not
substract) the multipliers in the Lagrangean,
assume the multipliers ??0, and this will make
the FOCs for x non-positive, and the FOCs for the
multiplers ? non-negative. - Take FOCs for each choice variable and each
multiplier. - Take CS conditions from the FOC for each choice
variable that has a non-negativity constraint,
and for each multiplier. - Take cases for different possibilities of
constraints binding reject infeasible cases,
compare feasible cases.
58Solving KT problems 2
- Second method treat non-negativity constraints
as the same as any other constraint functionally
the same but doesnt take shortcuts. - Write the Lagrangean with a multiplier for each
constraint. This will give us more multipliers
than the previous method. - Take FOCs for each choice variable and each
multiplier. - Take CS conditions for each multiplier. This
gives us the same number of CS conditions as the
previous method. - Take cases for different possibilities of
constraints binding reject infeasible cases,
compare feasible cases.
59Example 2, method 1
- Max x2 s.t. x?0, x2L x2 ?(2-x)FOCs x 2x
- ? 0 ? (2-x) ? 0CS (2x ?)x 0 (2-x)?
0 - Case 1, internal solution, xgt0, ?0
contradiction from FOC1 rules this case out. - Case 2, left boundary, x0, ?0. Consistent, but
turns out to be a minimum. - Case 3, right boundary, ? gt 0, xgt0. CS2 implies
x2.
60Example 2, method 2
- Max x2 s.t. x?0, x2L x2 ?1(2-x) ?2(x)
FOCs x 2x ?1 ?2 0 ?1 (2-x) ? 0 ?2
x ? 0 CS (2-x)?1 0 x?2 0 - Case 1, internal solution, ?10, ?20 From FOC1,
consistent only if x0 (ie actually case 2) - Case 2, left boundary, ?10, ?2gt0 From CS2, x0.
Consistent, but turns out to be a minimum. - Case 3, right boundary, ?1 gt 0, ?20. CS1
implies x2. - Case 4, ?1 gt 0, ?2gt0 from CS1 and CS2, clearly
contradictory (0x2).
61Sign issues
- There are multiple ways of setting up the KT
Lagrangean using different signs. - One way is as above in the Lagrangean, add ?g(x)
terms, write the g(x) terms as ? 0, assume the
multipliers ?i?0, which implies that the FOC
terms are 0 for choice variables and ?0 for
multipliers. The lecture notes (mostly) use this
method. - Another way is to subtract the ?g(x) terms in L,
and write the g(x) terms as 0, assume implies
?i?0, which implies the FOC terms are 0 for
choice variables and ?0 for multipliers.SB uses
this method. - Whatever method you choose, be consistent.
62Example 1, SB signing
- Max f(x) s.t. 10-x?0, x ?0L f(x) -
?(x-10)FOCsx f(x)- ? 0?
-(x-10)?0CSCS1 (f(x)-?)x0CS2 -(x-10)?0
63Kuhn Tucker 4
- Formal treatment start with Lagrangian. When
fRn?R and GRn?Rm, the Lagrangian of the KT
problem is a new function LRnm?R. - Important to note the domain limit on L the
Lagrangian is non-negative (and so (We could
rewrite the problem restricting the multipliers
to be negative by changing the in the
Lagrangian to - .)(We could also rewrite the
problem without the implicit non-negativity
constraints in general KT problems not in
economic settings, we need not require x
non-negative.)
64Kuhn-Tucker 5
- As in the Lagrange method case, we can rewrite
the Lagrangian asdecomposing G into its
components. - For any fixed point x, define indices of GK
igi(x)0 and M ixigt0. - Define by differentiating G with only the K
components wrt components j in M . This is MxK
matrix.
65Kuhn Tucker Theorem
- Suppose that x solves the canonical KT as a
local maximum and suppose that H(x) has maximal
rank (Constraint Qualification). Then there
exists ??0 s.t. (ie FOCs for choice
vbles) for i1,..n (ie CS conditions for
non-negativity constraints) (ie FOCs for
multipliers) (ie CS conditions for
multipliers)
66KT theorem notes
- The constraint qualification (H(x) has maximal
rank) is complex and is typically ignored. But
technically we need this to guarantee the
theorem, and that the solution method yields
actual necessary conditions - These are necessary conditions for a solution.
Just because they are satisfied does not mean we
have solved the problem we could have multiple
candidate solutions, or multiple solutions, or no
solution at all (if no x exists).
67KT and existence/uniqueness
- Suppose G(x) is concave, and f(x) is strictly
quasi-concave (of G(x) strictly concave, and f(x)
quasi-concave), then if x solves KT, x is
quasi-concave. Furthermore, if xG(x)?0,x?0 is
compact and non-empty and f(x) is continuous,
then there exists x which solves KT. - Proof Existence from Weierstrass theorem. For
uniqueness Suppose there are some x, x that
both solve KT. Then f(x) f(x) and G(x)?0,
G(x)?0. Since G is concave, for t in 0,1 we
have G(tx (1-t)x) ? tG(x) (1-t)G(x) ? 0.
So tx (1-t)x is feasible for KT. But f
strictly quasi-concave implies f(tx(1-t)x) gt
minf(x),f(x)f(x). So we have a feasible x
(tx (1-t)x) which does better than x and x.
Which contradicts x, x both being optimal
solutions.
68The constraint qualification
- Consider the problemmax x1 s.t. (1-x1)3-x2 ?0,
x1 ?0, x2 ?0.(see picture in notes, (1,0) is
soln)At solution, x2 ?0 is a binding
constraint.Note that gradient of constraint at
(1,0) isDg(1,0) (2(x1-1),-1) (0,-1) at
soln.This gives H matrix of which has a rank
of 1. - The gradient of f(1,0) is (1,0), which cannot be
expressed as a linear combination of (0,1) or
(0,-1). So no multipliers exist that satisfy the
KT necessary conditions.
69Non-convex choice sets
- Sometimes we have non-convex choice sets
typically these lead to multiple local optima. - In these cases, we can go ahead and solve the
problem separately in each case and then compare.
OR we can solve the problem simultaneously.
70Example labour supply with overtime
- Utility function U(c,l)calß
- Non-negativity constraint on consumption. Time
constraints l ? 0 and 24 l ? 0 on leisure (note
l is leisure, not labour). - Overtime means that wage rate w per hour for
first 8 hours, 1.5w per hour for extra hours.
This meansc w(24-l) for l ? 16c 8w
1.5w(16-l) for l 16.
71Overtime 2
- The problem is that we have different functions
for the boundary of the constraint set depending
on the level of l. The actual problem we are
solving has either the first constraint OR the
second constraint if we tried solving the
problem by maximising U(x) s.t. both constraints
for all l then we would solve the wrong
problem.(see figures in notes) - To solve the problem, note that the complement of
the constraint set is convex.c ? w(24-l) for l
? 16c ? 8w 1.5w(16-l) for l 16 - So consider the constraint set given by(c
w(24-l))(c-8w-1.5w(16-l)) ? 0(see figure in
notes)
72Overtime 3
- Then, without harm we could rewrite the problem
asmaxc,l calß s.t. c ?0, l ?0, 24-l ?0 -(c
w(24-l))(c-8w-1.5w(16-l)) ? 0 - Note that this is not identical to the original
problem (it omits the bottom left area), but we
can clearly argue that the difference is
harmless, since the omitted area is dominated by
points in allowed area. - Note that if x solves max f(x), it solves max
g(f(x)) where g(.) is a monotonic transformation. - So lets max log(calß) instead s.t. the same
constraints. - This gives the LagrangeanL alog(c)ßlog(l)µ(24
-l)?(c-w(24-l))(c-(8w1.5w(16-l)))
73Overtime 4
- We can use economic and mathematical logic to
simplify the problem. First, note that since the
derivative of the log function is infinity at c0
or l0, this clearly cant be a solution, so µ0
at any optimum and we can ignore CS conditions on
c and l. - So rewrite Lagrangean dropping µ termL
alog(c)ßlog(l)?(c-w(24-l))(c-(8w1.5w(16-l))) - Now lets look at the FOCs.
74Overtime 5
- FOCscl?
- CS conditionnoting that the equalities occur
in the FOCs because we argued that non-negativity
constraints for c and l dont bind.
75Overtime 6
- If l and c were such that the FOC for ? were
strictly negative, we must have ?0 by CS, but
this makes the first two FOCs impossible to
satisfy.So (c-8w-1.5w(16-l))0 and/or
(c-w(24-l))0In other words, we cant have an
internal solution to the problem (which is good,
since these are clearly dominated). - Case 1 (c-w(24-l))0 (no overtime worked)From
first two FOCs, we get awlßc, which with
c24w-wl gives us c 24a/(aß) - Case 2 (c-8w-1.5w(16-l))0 (overtime)From the
first two FOCs, we get 3awl2ßc, which we can
combine with c 8w1.5w(16-l)) to get an
expression for c in terms of parameters. - Actual solution depends on particular parameters
of utility function (graphically could be
either).
76The cost minimization problem
- Cost minimization problem what is the cheapest
way to produce at least y output from x inputs at
input price vector w. - C(y,w) -maxx wx s.t. f(x) ? y, y?0, x ?0.
- If f(x) is a concave function, then the set
xf(x)?y is a convex set (since this is an
upper contour set). - To show that C(y,w) is convex in yConsider any
two levels of output y, y and define
ytty(1-t)y (ie convex combination).
77Convexity of the cost function
- Let x be a solution to the cost minimization
problem for y, xt for yt, x for y. - Concavity of f(x) impliesf(tx(1-t)x)?tf(x)(1-
t)f(x). - Feasibility implies f(x) ? y, f(x) ?y.
- Together these implyf(tx(1-t)x)?tf(x)(1-t)f(x
)?ty (1-t)ytt - So the convex combination tx(1-t)x is feasible
for yt.
78Convexity of the cost fn 2
- By definitionC(y,w) wxC(y,w)
wxC(yt,w) wxt - But C(yt,w) wxt w(tx(1-t)x)twx(1-t)wx
t C(y,w) (1-t)C(y,w)where the inequality
comes since xt solves the problem for yt. - So C(.) is convex in y.
79Implicit functions(SB easier than lecture notes)
- So far we have been working only with functions
in which the endogenous variables are explicit
functions of the exogenous or independent
variables.Ie y F(x1,x2,xn) - This is not always the case frequently we have
economic situations with exogenous variables
mixed in with endogenous variables.G(x1,x2,xn,y)
0 - If for each x vector this equation determines a
corresponding value of y, then this equation
defines an implicit function of the exogenous
variables x. - Sometimes we can solve the equation to write y as
an explicit function of x, but sometimes this is
not possible, or it is easier to work with the
implicit function.
80Implicit functions 2
- 4x 2y 5 expresses y as an implicit function
of x. Here we can easily solve for the explicit
function. - y2-5xy4x20 expresses y implicitly in terms of
x. Here we can also solve for the explicit
relationship using the quadratic formula but it
is a correspondence, not a function, y4x OR
x. - Y5-5xy4x20 cannot be solved into an explicit
function, but still implicitly defines y in terms
of x.Eg x0 implies y0. x1 implies y1.
81Implicit functions 3
- Consider a profit-maximizing firm that uses a
single input x at a cost of w per unit to make a
single output y using technology yf(x), and
sells the output for p per unit.Profit function
p(x)pf(x)-wxFOC pf(x)-w0 - Think of p and w as exogenous variables. For
each choice of p and w, the firm will choice a
value of x that satisfies the FOC.To study
profit-maximising behaviour in general, we need
to work with this FOC defining x as an implicit
function of p and w. - In particular, we will want to know how the
choice of x changes in response to changes in p
and w.
82Implicit functions 4
- An implicit function (or correspondence) of y in
terms of x does not always exist, even if we can
write an equation of the form G(x,y)cEg
x2y21. When xgt1 there is no y that satisfies
this equation. So there is no implicit function
mapping xs greater than 1 into ys. - We would like to have us general conditions
telling us when an implicit function exists.
83Implicit functions 5
- Consider the problemmaxx?0f(xq) s.t.
G(xq)?0where q is some k dimensional vector of
exogenous real numbers. - Call a solution to this problem x(q), and the
value the solution attains V(q) f(x(q)q). - Note that x(q) may not be unique, but V(q) is
still well-defined (ie there may be multiple xs
that maximise the function, but they all give the
same value (otherwise some wouldnt solve the
maximisation problem)) - Interesting question how do V and x change with
q? - We have implicitly defined functions mapping qs
to Vs.
84Implicit functions 6
- The problem above really describes a family of
optimization problems each different value of
the q vector yields a different member of the
family (ie a different optimization problem). - The FOCs from KT suggest that it will be useful
to be able to solve generally systems of
equations where (why? Because the FOCs
constitute such a system.) - Eg Finding the equation for a level set, is to
find z(q) such that T(z(q),q)-c0. Here, z(q) is
an implicit function - As noted previously, not all systems provide
implicit functions. Some give correspondences,
or give situations where there is mapping x(q). - The implicit function theorem tells us when it is
possible to find an implicit function from a
system of equations.
85Implicit function theorem(for system of
equations)
- Let TRkp?Rk be C1. Suppose that T(z,q)0.
If the kxk matrix formed by stacking the k
gradient vectors (wrt z) of T1,T2,Tk is
invertible (or equivalently has full rank or is
non-singular), then there exist k C1 functions
each mapping Rp ?Rk such thatz1(q)z1,
z2(q)z2, . zk(q)zk andT(z(q),q) 0 for
all q in Be(q) for some egt0.
86IFT example
- Consider the utility maximisation problemmaxx
in Rn U(x) s.t. pxI, U strictly quasi-concave,
DU(x)gt0, dU(0)/dxi?. - We know a solution to this problem satisfies xi gt
0 (because of dU(0)/dxi?) and I-px0 (because
DU(x)gt0) and the FOCsdU/dx1-?p10dU/dxn-?pn0
I-px0
87IFT example contd
- This system of equations maps from the space
R2n2 (because x and p are nx1, ? and I are
scalars) to the space Rn1 (the number of
equations). - To apply the IFT, set z (x, ?), q(p,I)Create
a function T R2n2? Rn1 given by
88IFT example contd
- If this function T is C1 and if the n1xn1
matrix of derivatives of T (wrt x and ?) is
invertible, then by the IFT we know that there
exist n1 C1 functionsx1(p,I), x2(p,I), .
xn(p,I), ?(p,I)s.t. T(x1(p,I), x2(p,I), .
xn(p,I), ?(p,I)) 0 for all p,I in a
neighborhood of a given price income vector
(p,I). - Ie, the IFT gives us the existence of
continuously differentiable consumer demand
functions.
89Theorem of the maximum
- Consider the family of lagrangian problemsV(q)
maxxf(xq) s.t. G(xq)0This can be
generalized to KT by restricting attention only
to constraints that are binding at a given
solution. - Define the function TRnmp?Rnm by
90Theorem of the Maximum 2
- The FOCs for this problem at an optimum are
represented by T(x,?q)0. We want to know
about defining the solutions to the problem, x
and ?, as functions of q. - The IFT already tells when we can do this if the
(nm)x(nm) matrix constructed by taking the
derivative of T wrt x and ? is invertible, then
we can find C1 functions x(q) and ?(q) s.t.
T(x(q), ?(q)q)0 for q in a neighborhood
of q. - Ie we need the matrix below to have full rank
91Theorem of the Maximum 3
- Suppose the Lagrange problem above satisfies the
conditions of the implicit function theorem at
x(q),q. If f is C1 at x(q),q, then V(q) is
C1 at q. - Thus, small changes in q around q will have
small changes in V(q) around V(q).
92Envelope Theorem
- Applying the IFT to our FOCs means we know (under
conditions) that x(q) that solves our FOCs exists
and is C1, and that V(.) is C1. - The envelope theorem tells us how V(q) changes in
response to changes in q. - The basic answer from the ET is that all we need
to do is look at the direct partial derivative of
the objective function (or of the Lagrangian for
constrained problems) with respect to q. - We do not need to reoptimise and pick out
different x(q) and ?(q), because the fact that we
were at an optimum means these partial derivs are
already zero.
93Envelope theorem 2
- Consider the problemmaxxf(xq) s.t.
G(xq)0, GRn?Rm.where q is a p-dimensional
vector of exogenous variables.Assume that, at a
solution, the FOCs hold with equality and that we
can ignore the CS conditions.(Or assume that we
only include constraints that bind at the
solution in G() ) - Suppose that the problem is well behaved, so we
have that at a particular value q, the solution
x(q), ?(q) are C1 and V(q)f(x(q)q) is
C1.(Note that we could get these from the IFT
and the Theorem of the Maximum)
94Envelope theorem 3
- Suppose the problem above satisfies the
conditions of the IFT at x(q). If f is C1 at
x(q),q thenie the derivative of the
value function V(q) is equal to the derivative of
the Lagrangean
95Envelope theorem 4
- So, to determine how the value function changes,
we merely need to look at how the objective
function and constraint functions change with q
directly. - We do not need to include the impact of changes
in the optimization variables x and ?, because we
have already optimized L(x,?,q) with respect to
these. - So, for an unconstrained optimization problem,
the effect on V(.) is just the derivative of the
objective function. - For a constrained optimization problem, we also
need to add in the effect on the constraint.
Changing q could effect the constraint (relaxing
or tightening it), which we know has shadow value
?. - Proof is in lecture notes.
96Envelope theorem example
- Consider a problem for the formmaxxf(x) s.t.
q-g(x) ?0 - Thus, as q gets bigger, the constraint is easier
to satisfy. What would we gain from a small
increase in q, and thus a slight relaxation of
the constraint? - The Lagrangian is L(x,?q) f(x) ?(q-g(x))
- The partial deriv of the Lagrangian wrt q is ?.
Thus, dV(q)/dq ?. - A small increase in q increases the value by
?.Thus, the lagrange multiplier is the shadow
price. It describes the price of relaxing the
constraint. - If the constraint does not bind, ?0 and dV(q)/dq
0.
97Envelope theorem example 2
- We can use the envelope theorem to show that in
the consumer max problem, ? is the marginal
utility of income. - Consider the cost min problemC(y,w) maxx
-wx s.t. f(x)-y?0. - Lagrangian is L(x,?y,w) -wx
?(f(x)-y)Denote the optimal solution to be
x(y,w). - From the ET, we get
98ET example 2, contd
- This is known as Shephards lemma the partial
derivative of the cost function with respect to
wi is just xi, the demand for factor i. - Also note thatie the change in demand for
factor i with respect to a small change in price
of factor j is equal to the change in demand for
factor j in response to a small change in the
price of factor i.
99Correspondences
- A correspondence is a transformation that maps a
vector space into collections of subsets in
another vector space. - Eg a correspondence FRn??R takes any n
dimensional vector and gives as its output a
subset of R. If this subset has a only one
element for every input vector, then the
correspondence is also a function. - Examples of correspondences solution to the cost
minimization problem, or the utility maximization
problem.
100Correspondences 2
- A correspondence F is bounded if for all x and
for all y in F(x), the size of y is bounded.
That is, yM for some finite M. For bounded
correspondences we have the following
definitions. - A correspondence F is convex-valued if for all x,
F(x) is a convex set. (All functions are
convex-valued correspondences). - A correspondence F is upper hemi-continuous at a
point x if for all sequences xn that converge
to x, and all sequences yn such that yn in
F(xn) converge to y, then y is in F(x). - For bounded correspondences, if a correspondence
is uhc for all x, then its graph is a closed set.
101Correspondences 3
- A correspondence F is lower hemi-continuous at a
point x, if for all sequences xn that converge
to x and for all y in F(x), there exists a
sequence yn s.t. yn is in F(xn) and the
sequence converges to y. - See figure in notes.
102Fixed point theorems
- A fixed point of a function fRn?Rn is a point x,
such that xf(x). A fixed point of a
correspondence FRn??Rn is a point x such that x
is an element of F(x). - Solving a set of equations can be described as
finding a fixed point. (Suppose you are finding
x to solve f(x) 0. Then you are looking for a
fixed point in the function g(x), where g(x) x
f(x), since for a fixed point x in g, x
g(x) x f(x), so f(x) 0.) - Fixed points are crucial in proofs of existence
of equilibriums in GE and in games.
103Fixed point theorems
- If fR?R, then a fixed point of f is any point
where the graph of f crosses the 45 degree line
(ie the line f(x)x). - A function can have many fixed points, a unique
fixed point, or none at all. - When can we be sure that a function possesses a
fixed point? We use fixed point theorems.
104Brouwer fixed point theorem
- Suppose fRn?Rn and for some convex, compact set
C (that is a subset of Rn) f maps C into itself.
(ie if x is in C, then f(x) is in C). If f is
continuous, then f possesses a fixed point. - Continuity
- Convexity of C
- Compactness of C
- C maps into itself.
105Kakutani fixed point theorem
- Suppose FRn??Rn is a convex-valued
correspondence, and for some convex compact set C
in Rn, F maps C into itself. (ie if x is in C,
then F(x) is a subset of C). If F is upper
hemicontinuous, then F possesses a fixed point. - These FPTs give existence. To get uniqueness we
need something else.
106Contraction mappings
- Suppose fRn?Rn such that f(x)f(y) ?
x-y for some ? lt 1 and for all x,y. Then f
ix a contraction mapping. - Let Ca,b be the set of all continuous functions
f0,1?R with the supnorm metric f) maxx
in a,b f(x). Suppose TC?C (that is, T takes a
continuous function, does something to it and
returns a new, possibly different continuous
function). If, for all f,g in C, Tf-Tg ?
f-g for some ? lt 1, then T is a contraction
mapping.
107Contraction mapping theorem
- If f or T (as defined above) is a contraction
mapping, it possesses a unique fixed point, x.
108Dynamic optimisation
- Up to now, we have looked at static optimisation
problems, where agents select variables to
maximise a single objective function. - Many economic models, particularly in
macroeconomics (eg saving and investment
behaviour), use dynamic models, where agents make
choices each period that affect their potential
choices in future periods, and often have a
total objective function that maximises the
(discounted) sum of payoffs in each period. - Much of the material in the notes is focused on
differential and difference equations (lectures
1-4), but we will attempt to spend more time on
lectures 5-6, which are the focus of most dynamic
models.
109Ordinary differential equations
- Differential equations are used to model
situations which treat time as a continuous
variable (as opposed to in discrete periods,
where we use difference equations). - An ordinary differential equation is an
expression which describes a relationship between
a function of one variable and its derivatives. - Formallywhere is a vector of
parametersF if a function Rm1p?R
110Ordinary differential equations 2
- The solution is a function x(t) that, together
with its derivatives, satisfies this equation. - This is an ordinary differential equation because
x is a function of one argument, t, only. If it
was a function of more than one variable, we
would have a partial differential equation, which
we will not study here. - A differential equation is linear if F is linear
in x(t) and its derivatives. - A differential equation is autonomous if t does
not appear as an independent argument of F, but
enters through x only. - The order of a differential equation is the order
of the highest derivative of x that appears in it
(ie order m above).
111First order differential equation
- Any differential equation can be reduced to a
first-order differential equation system by
introducing additional variables. - Consider x3(t) ax2(t) bx1(t) x(t)Define
y(t) x1(t), z(t) x2(t) - Then y1(t) x2(t) z(t), z1(t)x3(t).
- So we have the system
112Particular and general solutions
- A particular solution to a differential equation
is a differentiable function x(t) that satisfies
the equation for some subinterval I0 of the
domain of definition of t, I. - The set of all solutions is called the general
solution, xg(t). - To see that the solution to a differential
equation is generally not unique, considerx1(t)
2x(t).One solution is x(t) e2t. But for any
constant c, x(t) ce2t is also a solution. - The non-uniqueness problem can be overcome by
augmenting the differential equation with a
boundary condition x(t0) x0.
113Boundary value problems
- A boundary value problem is defined by a
differential equationx1(t) ft,x(t)and a
boundary conditionx(t0) x0, (x0,t0) is an
element of X x I - Under some conditions, every boundary value
problem has a unique solution. - Fundamental Existence Uniqueness theoremLet F
be C1 in some neighborhood of (x0,t0). Then in
some subinterval I0 of I containing t0 there is a
unique solution to the boundary value problem.
114Boundary values problems 2
- If F is not C1 in some neighborhood of (x0,t0),
the solution may not be unique. Consider x1(t)
3x(t)2/3 x,t in R x(0)0Both x(t) t3 and
x(t) 0 are solutions.Note f(x) 3x(t)2/3 is
not differentiable at x0. - The solution may not exist globally.Consider x1
(t) x(t)2 x,t in R x(0) 1x(t) 1/(1-t)
is a solution, but is only defined for t in -?,1)
115Steady states and stability
- When using continuous time dynamic models, we are
often interested in the long-run properties of
the differential equation. - In particular, we are interested in the
properties of its steady state (our equilibrium
concept for dynamic systems, where the system
remains unchanged from period to period), and
whether or not the solution eventually converges
to the steady state (ie is the equilibrium
stable, will we return there after shocks). - We can analyze the steady state without having to
find an explicit solution for the differential
equation.
116Steady states and stability 2
- Consider the autonomous differential
equation x1(t) fx(t) - A steady state is a point such that
- Phase diagrams to illustrate this.
- Steady states may not exist, may not be unique,
may not be isolated. - Stability consider an equation that is initially
at rest at an equilibrium point , and suppose
that some shock causes a deviation from .We
want to know if the equation will return to the
steady state (or at least remain close to it), or
if it will get farther and farther away over time.
117Steady states and stability 3
- Let be an isolated (ie locally unique) steady
state of the autonomous differential
equation x1(t)fx(t), - We say that is stable if for any e gt 0, there
exists d in (0,e such thatie any solution
x(t) that at some point enters a ball of radius d
around remains within a ball of (possibly
larger) radius e forever after.
118Steady states and stability 4
- A steady state is asymptotically stable if it is
stable AND d can be chosen in such a way that any
solution that satisfiesfor some t0 will also
satisfy - That is, any solution that gets sufficiently
close to not only remains nearby but converges
to as t ??.
119Phase diagrams arrows of motion
- The sign of x1(t) tells us about the direction
that x(t) is moving (see diagram). - x1(t) gt 0 implies that x(t) is increasing (arrows
of motion point right). - x1(t) lt 0 implies that x(t) is decreasing (arrows
of motion point left). - Thus x1 and x3 in diagram are locally
asymptotically stable x2 is unstable. - x1 in the second diagram (see notes) is globally
asymptotically stable.
120Phase diagrams arrows of motion 2
- We can conclude that if for all x in some
neighborhood of a steady state - x(t) lt implies x1(t) gt 0 AND x(t) gt implies
that x1(t)lt0, then is asymptotically stable. - x(t)lt implies x1(t) lt 0 and x(t) gt implies
x1(t)gt0 then is unstable. - Therefore, we can determine the stability
property of a steady state by checking the sign
of the derivative of fx(t) at . - is (locally) asymptotically stable if
- is unstable if
- If , then we dont know.
121Grobman-Hartman theorem
- Let be a steady state of out standard
autonomous differential equation - We say that is a hyperbolic equilibrium if
- The previous analysis suggests we can study the
stability properties of a nonlinear differential
equation by linearizing it, as long as the
equilibrium is hyperbolic. - Theorem If is a hyperbolic equilibrium of the
autonomous differential equation above, then
there is a neighborhood U of such that the
equation is topologically equivalent to the
linear equationin U.(Note that this is a
first-order Taylor series approximation of f
around ). (See notes