Title: Human Growth: From data to functions
1Human Growth From data to functions
2Challenges to measuring growth
- We need repeated and regular access to subjects
for up to 20 years. - Height changes over the day, and must be measured
at a fixed time. - Height is measured in supine position in infancy,
followed by standing height. The change involves
an adjustment of about 1 cm. - Measurement error is about 0.5 cm in later years,
but is rather larger in infancy.
3Challenges to functional modeling
- We want smooth curves that fit the data as well
as is reasonable. - We will want to look at velocity and
acceleration, so we want to differentiate twice
and still be smooth. - In principle the curves should be monotone i.
e., have a positive derivative.
4The monotonicity problem
- The tibia of a newborn measured daily shows us
that over the short term growth takes places in
spurts. - This babys tibia grows as fast as 2 mm/day!
- How can we fit a smooth monotone function?
5Weighted sums of basis functions
- We need a flexible method for constructing curves
to fit the data. - We begin with a set of basic functional building
blocks fk(t), called basis functions. - Our fitting function x(t) is a weighted sum of
these
6What are the main choices for basis functions?
- Fourier series
- a constant term,
- a sine/cosine pair of fixed frequency, and
- followed by a series of sine/cosine pairs with
integer multiples of the base frequency. - Fourier series are best for periodic data.
7Five Fourier basis functions
8B-splines
- These basis functions are piecewise polynomials
defined by a set of discrete values called knots.
- The order of the polynomials (degree 1)
controls their smoothness. - Each basis function is nonzero only over a number
of contiguous inter-knot intervals equal to the
order. - Polynomials are a special type of B-spline, and
are thus included within the system.
9When should I use B-splines?
- B-splines are the basis of choice for most
non-periodic. - They give complete control over flexibility,
allowing more flexibility where needed and less
where not needed. - Computing with B-splines is extremely efficient.
10Five order 2 B-spline basis functions A basis
for polygonal lines
11Eight order 4 B-spline basis functions A basis
for twice-differentiable functions
12B-splines for growth data
- We use order 6 B-splines because we want to
differentiate the result at least twice. Order 4
splines look smooth, but their second derivatives
are rough. - We place a knot at each of the 31 ages.
- The total number of basis functions order
number of interior knots. 35 in this case.
13Isnt using 35 basis functions to fit 31
observations a problem?
- Yes. We will fit each observation exactly.
- This will ignore the fact that the measurement
error is typically about 0.5 cm. - But well fix this up later, when we look at
roughness penalties.
14Okay, lets see what happens
- These two Matlab commands define the basis and
fit the data - hgtbasis
- create_bspline_basis(1,18, 35, 6, age)
- hgtfd
- data2fd(hgtfmat, age, hgtbasis)
15Why we need to smooth
- Noise in the data has a huge impact on derivative
estimates.
16Please let me smooth the data!
- This command sets up 12 B-spline basis functions
defined by equally spaced knots. This gives us
about the right amount of fitting power given the
error level. - hgtbasis
- create_bspline_basis(1,18, 12, 6)
17- These are velocities are much better.
- They go negative on the right, though.
18Lets see some accelerations
- These acceleration curves are too unstable at the
ends. - We need something better.
19A measure of roughness
- What do we mean by smooth?
- A function that is smooth has limited curvature.
- Curvature depends on the second derivative. A
straight line is completely smooth.
20Total curvature
- We can measure the roughness of a function x(t)
by integrating its squared second derivative. - The second derivative notation is D2x(t).
21Total curvature of acceleration
- Since we want acceleration to be smooth, we
measure roughness at the level of acceleration
22The penalized least squares criterion
- We strike a compromise between fitting the data
and keeping the fit smooth.
23How does this control roughness?
- Smoothing parameter ? controls roughness.
- When ? 0, only fitting the data matters.
- But as ? increases, we place more and more
emphasis on penalizing roughness. - As ? ? 8, only roughness matters, and functions
having zero roughness are used.
24- We can either smooth at the data fitting step, or
smooth a rough function. - This Matlab command smooths the fit to the data
obtained using knots at ages. The roughness of
the fourth derivative is controlled. - lambda 0.01
- hgtfd smooth_fd(hgtfd, lambda, 4)
25Accelerations using a roughness penalty
- These accelerations are much less variable at the
extremes.
26The corresponding velocities
27How did you choose ??
- We smooth just enough to obtain tolerable
roughness in the estimated curves (accelerations
in this case), but not so much as to lose
interesting variation. - There are data-driven methods for choosing ?, but
they offer only a reasonable place to begin
exploring. - But smoothing inevitably involves judgment.
28What about monotonicity?
- The growth curves should be monotonic.
- The velocities should be non-negative.
- Its hard to prevent linear combinations of
anything from breaking the rules. - We need an indirect approach to constructing a
monotonic model
29A differential equation for monotonicity
- Any strictly monotonic function x(t) must satisfy
a simple linear differential equation
The reason is simple because of strict
monotonicity, the first derivative Dx(t) will
never be 0, and function w(t) is therefore
simply D2x(t)/Dx(t).
30The solution of the differential equation
- Consequently, any strictly monotonic function
x(t) must be expressible in the form
This suggests that we transform the monotone
smoothing problem into one of estimating function
w(t), and constants ß0 and ß1.
31What we have learned
- B-spline bases are a good choice for fitting
non-periodic functions Fourier series are right
for periodic situations. - We can control smoothness by either using a
restricted number of basis functions, or by
imposing a roughness penalty. - Roughness penalty methods generally work better.
- Differential equations can play a useful role
when fitting constrained functions to data.
32More information
- Ramsay Silverman (1997, 2004), Chs. 3, 4, 13
- Ramsay Silverman (2002), Ch. 6.
- The long-term growth data are from the Berkeley
growth study. - The infant growth data were collected by Michael
Hermanussen.
33Where do we go from here?
- We need to look more systematically at how to
smooth data. - This involves deciding what basis function system
to use. - Splines are so important that we have to look at
them in more detail. - Heres a serious problem
34(No Transcript)
35Whats wrong with the mean?
- The cross-sectional mean is the heavy blue line.
- It has less amplitude variation than any single
curve. - The pubertal growth spurt for the mean lasts
longer than does any single curve. - The problem is that we are averaging over curves
in quite different stages of growth.
36Whats wrong with the mean?
- The cross-sectional mean is the heavy blue line.
- It has less amplitude variation than any single
curve. - The pubertal growth spurt for the mean lasts
longer than does any single curve. - The problem is that we are averaging over curves
in quite different stages of growth.
37Phase and Amplitude Variation
- Functional data like growth curves often show
variation in the timing of events, like the
pubertal growth spurt. - This is called phase variation.
- We have to find out how to separate phase from
amplitude variation before we can do even simple
things like compute mean curves.