Title: Value of Unlabeled Data in Regression
1Department of Computer Science
EngineeringCollege of Engineering
Semi-supervised Learning with Data Calibration
for Long-Term Time Series Forecasting
Haibin Cheng and Pang-Ning Tan
- MOTIVATION
- Long term time series forecasting is needed for a
broad range of applications, including climate
impact assessments and urban planning. - CHALLENGES IN LONG-TERM FORECASTING
- Many time series prediction problems have focused
on single step or short term prediction problems
due to the inherent difficulty in controlling the
propagation of errors from one prediction step to
the next step - Extensive amount of historical data is needed for
reliable prediction, which is expensive to
obtain. - Presence of concept drifts in the modeling
domain. - CONTRIBUTIONS
- Developed a semi-supervised time series
regression approach for long-term forecasting by
incorporating future data from model simulations
(e.g., global climate models for impact
assessments) with historical observations - Developed a covariance-preserving data
calibration approach to align historical
observations with model simulation data.
Semi HMMR algorithm Input Historical data L
(Xl, Yl ) and future unlabeled Xu Output Future
response Yu Method 1. Train an initial HMMR
model ?0 ( ?, A, ?, C) using the training data
L. 2. Perform local estimation of Yu 3.
Perform global estimation of Yu using the current
parameters in ?. 4. Calculate the final
estimation of Yu. 5. Calculate the confidence
of the predicted values in Yu. 6. Combine
predicted value and confidence estimated in steps
4 and 5 with training data L to re-train HMMR
model ?'(?', A', ?', C). 7. Repeat steps 3-6
until convergence (?'-? ltlt ?)
- Value of Unlabeled Data in Regression
- Assumptions
- Model assumptions match well with underlying
data. - Labeled and unlabeled data must be generated from
the same distribution.
- Experimental Evaluation
- 1. Performance comparison in terms of average
root mean square error (rmse)
3. Application to statistical downscaling for
future climate scenario projections 60 randomly
selected locations in North America
4. Effect of covariance-preserving data
calibration on semi-supervised HMMR
- Conclusions
- Unlabeled data (e.g., from model simulations) can
be used in a semi-supervised learning framework
to improve long-term time series forecasting. - Covariance-preserving data calibration helps
improve semi-supervised learning by reducing the
inconsistencies between historical observations
and model simulation data
2. Value of unlabeled data Y-axis Error
Rate X-axis Labeled/Unlabeled Data Semi-supervi
sed HMMR effectively utilizes the unlabeled data
to improve its prediction, especially when
labeled data is scarce.
5. Effect of covariance-preserving data
calibration on loss of neighborhood
information