Title: Are the Trends Changing Joinpoint Regression Analysis
1Are the Trends Changing?Joinpoint Regression
Analysis
- Hyune-Ju Kim
- Syracuse University
- Joint Work with
- M. Fay, E. Feuer, B. Yu, M. Barrett and D.
Midthune - National Cancer Institute
2Outline
- Motivation
- Joinpoint Regression (segmented line regression)
- Model
- Fitting a joinpoint regression model
- Determining the number of joinpoints
- Software Joinpoint
- http//srab.cancer.gov/joinpoint/
-
3 4Polynomial model vs. linear model
5Change-point model 1 Jump model
6Change-point model 2 Joinpoint model
7Joinpoint Regression Model
- Piecewise linear regression
- Segmented line regression
- Broken line regression
- Spline regression
- Joinpoint
- Change-points
- Changeover points
- Joins
- Knots
8Motivating Example
9Year-by-Year Fit
RSS19.41 for 1991-2000
RSS22.86 for 1969-1990
10 Search for Min RSS
11 Questions
- How can we impose the constraint that the two
phases are continuous at the change-point? - Where is/are the change-point(s)?
- How do we know if a two-phase model is preferred
to a three-phase model?
12 13II-1. Mathematical Model
14k-joinpoint model
Model y ß0ß1xd1(x-t1)dk(x-tk) error
ß2ß1d1
ß3ß1d1d2
t2
ß1
t1
15 Questions of Interests
- Fitting a joinpoint regression model
- Point estimates of ?, ? and ?
- Grid search (Lerman, 1980)
- Hudsons method (Hudson, 1966)
- Confidence interval for the joinpoints and
regression coefficients - Large sample theory (Feder, 1975 Hinkley 1971)
- Determining the number of joinpoints, k.
- Permutation procedure, BIC
16II-2. Model Fitting
17Lermans Grid Search (Lerman (1980), Applied
Statistics)
- If the joinpoints are fixed at (t1,,tk), then
fit a least square (LS) regression model using
covariates - x, (x-t1),,(x-tk)
- and get a residual sum of squared error (SSE).
- Search all possible combinations of (t1,,tk) to
find the point (t1,,tk) which minimizes the SSE.
- The LS estimates of the regression coefficients
ß and ? are the estimates corresponding to the
estimated joinpoints.
18Fit of 1-joinpoint model
Model E(yx) ß0ß1xd1(x-t1)
19Fit of 1-joinpoint model
Model E(yx) ß0ß1xd1(x-t1)
?
t11991
20Confidence intervals
95 Confidence interval for the joinpoints x
SSE(x) ? MinSSE ? (1 F(0.95, k, p)) , where
F(0.95, k, p) is the 95th percentile of an
F-distribution with k and p degrees of freedom,
and pn-2(k1)
21Parameter estimates (Grid search)
Model 1 1 Joinpoint(s) Number of
Observations 32 Number of
Parameters 4 Degrees of Freedom
28 Sum of Squared Errors
0.00084419 Mean Squared Error
0.00003015 Estimated Joinpoint(s) Join Pt
Estimate (95 Confidence Interval)
1 91.0000000000 ( 90.0000000000 ,
93.0000000000 ) Estimated Regression
Coefficients (Beta) Parameter
Standard Parameter Estimate Error
Z Prob t Intercept1
4.989298 0.014415 346.125422 0.000000
Intercept2 6.204406 0.066682
93.044993 0.000000 Slope1
0.004244 0.000181 23.479840 0.000000
Slope2 -0.009109 0.000694
-13.118777 0.000000 Slope2-Slope1
-0.013353 0.000717 -18.610506
0.000000 Annual Percent Change (APC) Segment
Range APC (95 Confidence
Interval) 1 69 - 91 0.425282 (
0.388046 , 0.462532 ) 2 91 - 100
-0.906767 ( -1.047844 , -0.765489 )
22Hudsons method (Hudson (1966), JASA))
- Partition the data points into k1 Segment.
- Fit the unconstrained LS line for each segment.
- Calculate the intersection of the regression
lines for the neighboring segments. - If the intersections are not in the right
locations, then adjust them to either end. - The estimated joinpoints are the intersections
for the partition with the minimum SSE.
23Parameter estimates (Hudsons algorithm)
- Model 1 1 Joinpoint(s)
- Number of Observations 32
- Number of Parameters 4
- Degrees of Freedom 28
- Sum of Squared Errors 0.00079300
- Mean Squared Error 0.00002832
- Estimated Joinpoint(s)
- Join Pt Estimate (95 Confidence
Interval) - 1 91.4254830000 ( 90.0000000000 ,
93.0000000000 ) - Estimated Regression Coefficients (Beta)
- Parameter Standard
Z Prob t - Parameter Estimate Error
- Intercept1 4.996409 0.013429
372.060337 0.000000 - Intercept2 6.274229 0.065980
95.093460 0.000000 - Slope1 0.004150 0.000167
24.808813 0.000000
24II-3. Determining the number of joinpoints
- Hypothesis testing
- H There are k joinpoints
- H There are k joinpoints.
- Information based model selection
- BIC (Bayesian Information Criteria)
-
0
0
1
1
25Permutation Test
- Consider the null and alternative hypothesis
- H There are k joinpoints
- H There are k joinpoints.
- Step 1 Choose the test statistic.
- Step 2 Compute the value of the test statistic
by fitting the null and alternative models and by
computing the residual sum of squares. - Step 3 Assess if the observed amount of
reduction in error is significant enough to
choose a model with a larger number of joins.
0
0
1
1
26Test Statistic
27Example Data 1 JP vs. 2 JP
- Null fit (1 JP) Joinpoint at 1991, Mean
Squared Error1.2761 - Alternative fit (2 JP) Joinpoints at 1973,
1991, MSE0.9296 - T6.2185
28- Question Is T6.2185 significant enough to
reject the null hypothesis of 1 joinpoint in
favor of the alternative hypothesis of 2
joinpoints? - Answer We need to find the null distribution of
T(y) and the p-valueP(T(y)T), where T is the
observed value of the test statistic. - Problem The null distribution of T(y) is not
known, even asymptotically, when the location of
the joinpoints are not known. - Our solution Permutation test (Kim et al. (2000)
Statistics in Medicine)
29Idea of Permutation
(3)
(1)
(2)
(3)
(2)
(1)
30Fits for the permuted data sets
Joinpoint est1991 RSS34.9579 T0.9212
Joinpoint est1991 RSS35.1528 T4.4592
31 Empirical distribution of the Test Statistic
- T 6.219
- Number of permutations 30
- ( of T-values ? T) 2
- 2/30 0.0667 ? P-value
32Estimation of the p-value
33Determining the number of joinpoints
- Method 1 Sequential application of the
permutation tests - Pre-specify kmin,,and kmax.
- Start from testing H0 kk0 vs. H1 kk1, where
k0 kmin and k1 kmax. - Use permutation test to make a decision
- If H0 is rejected, then test for H0 kk0 kmin
1 vs. H1 kk1. - If H0 is not rejected, then test for H0 kk0
vs. H1 kk1 kmax -1. - Stop when k is determined.
- Note The a-level for each test
a/(kmax-k0) - Method 2 Bayesian information criterion
- k argmin B(k) argmin log(SSE(k)/n) (2k2)
(log n)/n , - where the minimum is taken over k such that 0?
k ? kmax.
?
34Example Data (Method 1)
- Number of permutations 4499
- kmin 0 and kmax 3
- Final selected model two-joinpoint model
-
35Example Data (Method 1)
36Example Data (Method 2)
- kmax 3
- The model with the smallest BIC is selected.
- Final selected model three-joinpoint model
-
37Fit for the 3 JP model
38References
- Kim HJ, Fay MP, Feuer EJ, and Midthune D (2000),
Permutation Tests for Joinpoint Regression with
Applications to Cancer Rates,, Statistics in
Medicine, 19, 335-351. - Weighted LS to handle heteroscadastic and
correlated errors - Power study via simulations
- Kim HJ, Fay MP, Yu B, Barret MJ and Feuer EJ.
(2004), Comparability of segmented line
regression models, Biometrics 60, 1005-1014. - Yu B., Barrett MJ, Kim HJ, and Feuer EJ. (2006),
Estimating joinpoints in continuous time scale
for multiple change-point models, To appear in
Journal of Computational Statistics and Data
Analysis
39 40(No Transcript)
41Data Input
42(No Transcript)
43Data used
44Input tab Default Settings
45Joinpoints tab Default Settings
46Execute Session
47Results - Graph
48Results - Data
49Results Model Estimates
50Results Permutation Tests
51Current Developments
- Early stopping rule can reduce the number of
permutations while controlling the resampling
risks - Comparability test enables us to determine if two
or more groups share the common joinpoint model - Clustering 18 contiguous age groups so that the
age groups within the same cluster share the
common joinpoint model.