Value of Unlabeled Data in Regression

About This Presentation

Title:

Value of Unlabeled Data in Regression

Description:

Labeled and unlabeled data must be generated from the same distribution. ... Many time series prediction problems have focused on single step or short term ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 2

Provided by: projecti

Category:

more less

Transcript and Presenter's Notes

Title: Value of Unlabeled Data in Regression

1
Department of Computer Science
EngineeringCollege of Engineering
Semi-supervised Learning with Data Calibration
for Long-Term Time Series Forecasting
Haibin Cheng and Pang-Ning Tan

MOTIVATION
Long term time series forecasting is needed for a
broad range of applications, including climate
impact assessments and urban planning.
CHALLENGES IN LONG-TERM FORECASTING
Many time series prediction problems have focused
on single step or short term prediction problems
due to the inherent difficulty in controlling the
propagation of errors from one prediction step to
the next step
Extensive amount of historical data is needed for
reliable prediction, which is expensive to
obtain.
Presence of concept drifts in the modeling
domain.
CONTRIBUTIONS
Developed a semi-supervised time series
regression approach for long-term forecasting by
incorporating future data from model simulations
(e.g., global climate models for impact
assessments) with historical observations
Developed a covariance-preserving data
calibration approach to align historical
observations with model simulation data.

Semi HMMR algorithm Input Historical data L
(Xl, Yl ) and future unlabeled Xu Output Future
response Yu Method 1. Train an initial HMMR
model ?0 ( ?, A, ?, C) using the training data
L. 2. Perform local estimation of Yu 3.
Perform global estimation of Yu using the current
parameters in ?. 4. Calculate the final
estimation of Yu. 5. Calculate the confidence
of the predicted values in Yu. 6. Combine
predicted value and confidence estimated in steps
4 and 5 with training data L to re-train HMMR
model ?'(?', A', ?', C). 7. Repeat steps 3-6
until convergence (?'-? ltlt ?)

Value of Unlabeled Data in Regression
Assumptions
Model assumptions match well with underlying
data.
Labeled and unlabeled data must be generated from
the same distribution.

Experimental Evaluation
1. Performance comparison in terms of average
root mean square error (rmse)

3. Application to statistical downscaling for
future climate scenario projections 60 randomly
selected locations in North America
4. Effect of covariance-preserving data
calibration on semi-supervised HMMR

Conclusions
Unlabeled data (e.g., from model simulations) can
be used in a semi-supervised learning framework
to improve long-term time series forecasting.
Covariance-preserving data calibration helps
improve semi-supervised learning by reducing the
inconsistencies between historical observations
and model simulation data

2. Value of unlabeled data Y-axis Error
Rate X-axis Labeled/Unlabeled Data Semi-supervi
sed HMMR effectively utilizes the unlabeled data
to improve its prediction, especially when
labeled data is scarce.
5. Effect of covariance-preserving data
calibration on loss of neighborhood
information

Write a Comment

User Comments (0)