PeiChann Chang, Professor Information Management Department Yuan Ze University

presentation player overlay
1 / 72
About This Presentation
Transcript and Presenter's Notes

Title: PeiChann Chang, Professor Information Management Department Yuan Ze University


1
Pei-Chann Chang, ProfessorInformation Management
DepartmentYuan Ze University
  • Knowledge Discovery in Financial Time Series Data

2
Time Series Databases
  • A time series is a sequence of real numbers,
    representing the measurements of a real variable
    at equal time intervals
  • Stock price movements
  • Volume of sales over time
  • Daily temperature (Electricity Consumption)
    readings
  • Weather data or electrocardiogram (ECG) data
  • A time series database is a large collection of
    time series
  • all NYSE stocks

3
Classical Time Series Analysis
  • Identifying Patterns
  • Trend analysis
  • A companys linear growth in sales over the years
  • Seasonality
  • Winter sales are approximately twice summer sales
  • Forecasting
  • What is the expected sales for the next quarter?

4
Traffic Speed Time Series
  • Non-stationary
  • Strong deterministic part
  • Weak stochastic part

5
Wavelet Transform Denoising (1993,1995)
  • Decompose transform the original signal into the
    wavelet domain.
  • Threshold suppress small coefficients in the new
    wavelet representation.
  • Reconstruct invert back the modified
    representation.

6
Multiscale Approximation and Denoising
7
(No Transcript)
8
  • MRA used to obtain the DWT of a discrete signal
    by iteratively applying lowpass and highpass
    filters, and subsequently down sampling them by
    two.
  • At each level, this procedure computes

9
Figure 1. Computing DWT by MRA
10
Wavelet Representations
  • Orthogonal representations
  • Fast algorithms for decomposition and
    reconstruction
  • Restrictions on design of the representation
    functions
  • Wavelet representations are not shift invariant

11
(No Transcript)
12
Time Series Problems (from a databases
perspective)
  • The Similarity Problem
  • X x1, x2, , xn
  • Y y1, y2, , yn
  • Define and compute Sim(X, Y)
  • E.g. do stocks X and Y have similar movements?

13
  • Similarity measure should allow for imprecise
    matches
  • Similarity algorithm should be very efficient
  • It should be possible to use the similarity
    algorithm efficiently in other computations, such
    as
  • Indexing
  • Subsequence similarity
  • clustering
  • rule discovery
  • etc.

14
(b) BLL
(a) MOT
(c) DG
(d) MIR
Figure 1. Instances of Double Bottom pattern.
15
Similarity measure
  • Given a good data representation, how to choose
    an indexing structure with good performance.
  • R-tree, R-tree, R-tree and simple inverted files
    are common choices.

16
  • Indexing problem
  • Find all lakes whose water level fluctuations are
    similar to X
  • Subsequence Similarity Problem
  • Find out other days in which stock X had similar
    movements as today
  • Clustering problem
  • Group regions that have similar sales patterns
  • Rule Discovery problem
  • Find rules such as if stock X goes up and Y
    remains the same, then Z will shortly go down

17
Examples
  • Find companies with similar stock prices over a
    time interval
  • Find products with similar sell cycles
  • Cluster users with similar credit card
    utilization
  • Cluster products
  • Use patterns to classify a given time series
  • Find patterns that are frequently repeated
  • Find similar subsequences in DNA sequences
  • Find scenes in video streams

18
Basic approach to the Indexing problem Extract
a few key features for each time series Map
each time sequence X to a point f(X) in the
(relatively low dimensional) feature space,
such that the (dis) similarity between X and Y is
approximately equal to the Euclidean distance
between the two points f(X) and f(Y)
f(X)
X
  • Use any well-known spatial access method (SAM)
    for indexing the feature space

19
  • Scalability an important issue
  • If similarity measures, time series models, etc.
    become more sophisticated, then the other
    problems (indexing, clustering, etc.) become
    prohibitive to solve
  • Research challenge
  • Design solutions that attempt to strike a balance
    between accuracy and efficiency

20
Euclidean Similarity Measure
  • View each sequence as a point in n-dimensional
    Euclidean space (n length of sequence)
  • Define (dis)similarity between sequences X and Y
    as
  • Lp (X, Y)

21
  • Advantages
  • Easy to compute
  • Allows scalable solutions to the other problems,
    such as
  • indexing
  • clustering
  • etc...

22
  • Disadvantages
  • Does not allow for different baselines
  • Stock X fluctuates at 100, stock Y at 30
  • Does not allow for different scales
  • Stock X fluctuates between 95 and 105, stock Y
    between 20 and 40

23
  • Normalization of Sequences
  • Goldin and Kanellakis, 1995
  • Normalize the mean and variance for each sequence
  • Let µ(X) and ?(X) be the mean and variance of
    sequence X
  • Replace sequence X by sequence X, where
  • Xi (Xi - µ (X) )/ ?(X)

24
  • Similarity definition still too rigid
  • Does not allow for noise or short-term
    fluctuations
  • Does not allow for phase shifts in time
  • Does not allow for acceleration-deceleration
    along the time dimension
  • etc .

25
Example
26
A general similarity framework involving a
transformation rules languageJagadish,
Mendelzon, Milo
Each rule has an associated cost

27
Examples of Transformation RulesCollapse
adjacent segments into one segmentnew slope
weighted average of previous slopesnew length
sum of previous lengths
28
  • Combinations of Moving Averages, Scales, and
    Shifts
  • Rafiei and Mendelzon, 1998
  • Moving averages are a well-known technique for
    smoothening time sequences
  • Example of a 3-day moving average
  • xi (xi1 xi xi1)/3

29
  • Disadvantages of Transformation Rules
  • Subsequent computations (such as the indexing
    problem) become more complicated
  • Feature extraction becomes difficult, especially
    if the rules to apply become dependent on the
    particular X and Y in question
  • Euclidean distances in the feature space may not
    be good approximations of the sequence distances
    in the original space

30
Dynamic Time WarpingBerndt, Clifford, 1994
  • Extensively used in speech recognition
  • Allows acceleration-deceleration of signals along
    the time dimension
  • Basic idea
  • Consider X x1, x2, , xn , and Y y1, y2, ,
    yn
  • We are allowed to extend each sequence by
    repeating elements
  • Euclidean distance now calculated between the
    extended sequences X and Y

31
(No Transcript)
32
(No Transcript)
33
Dynamic Time WarpingBerndt, Clifford, 1994
34
(No Transcript)
35
Restrictions on Warping Paths
  • Monotonicity
  • Path should not go down or to the left
  • Continuity
  • No elements may be skipped in a sequence
  • Warping Window
  • i j lt w
  • Others .

36
Formulation
  • Let D(i, j) refer to the dynamic time warping
    distance between the subsequences
  • x1, x2, , xi
  • y1, y2, , yj
  • D(i, j) xi yj min D(i 1, j),
  • D(i 1, j 1), D(i, j 1)

37
Solution by Dynamic Programming
  • Basic implementation O(n2) where n is the
    length of the sequences
  • will have to solve the problem for each (i, j)
    pair
  • If warping window is specified, then O(nw)
  • Only solve for the (i, j) pairs where i j
    lt w

38
Piecewise Linear Representation of Time Series
Time series approximated by K linear segments
39
PLR of Financial Time seris
?3-1 ????????
40
(No Transcript)
41
  • Such approximation schemes
  • achieve data compression
  • allow scaling along the time axis
  • How to select K?
  • Too small gt many features lost
  • Too large gt redundant information retained
  • Given K, how to select the best-fitting segments?
  • Minimize some error function
  • These problems pioneered in Pavlidis Horowitz
    1974, further studied by Keogh, 1997

42
Defining Similarity
43
Integrating a Wavelet and TSK Fuzzy Rules for
Stock Price Forecasting
  • Pei-Chann Chang, Professor
  • Information Management Department
  • Yuan Ze University

44
Outline
  • Introduction
  • Literature Survey
  • Research Approaches
  • Simulation Results
  • Summary and Future research

45
1. Introduction
  • Mining stock market tendency is a challenging
    task.
  • Many factors influence the performance of a stock
    market including political events, general
    economic conditions, and traders expectations.

46
1. Introduction
  • Attempts to predict the financial markets,
    ranging from traditional time series approaches
    to artificial intelligence techniques, such as
    fuzzy systems and, artificial neural network
    (ANN) methodologies.

47
1. Introduction
  • The main drawback with ANNs, and other black-box
    techniques, is the tremendous difficulty in
    interpreting the results
  • Do not provide an insight into the nature of the
    interactions between the technical indicators and
    the stock market fluctuations.

48
1.Introduction
  • New tools and techniques needed in dealing with
    noise, dimensionality, and nonlinearity in stock
    price prediction.
  • The proposed framework combines several soft
    computing techniques such as a wavelet transform,
    TSK fuzzy system, data clustering, simulated
    annealing and KNN for Stock forecasting.

49
2. Literature Survey
  • White 1990 used a feed-forward NN (FFNN) to
    study the IBM daily common stock returns
  • Yao and Poh 59 use Technical Indicators (K and
    D) along with price information to predict
    future price values

50
2.Literature Survey
  • Austin and Looney 8 develop a neural network
    that predicts the proper time to move money into
    and out of the stock market.
  • Mingo LĂ“PEZ et al. 36 use time delay
    connections in enhanced neural networks to
    forecast IBEX-35 (Spanish Stock Index) index
    close prices

51
2.Literature Survey
  • Nenortaite and Simutis 39 present a trading
    approach based on one-step ahead profit estimates
    created by combining neural networks with
    particle swarm optimization algorithms
  • Jaruszewicz and Mandziuk 28 train ANNs using
    both technical analysis variables and intermarket
    data, to predict one day changes in the NIKKEI
    index.

52
2.Literature Survey
  • The wavelet transform decomposes a process into
    different scales, which makes it useful in
    differentiating seasonalities, revealing
    structural breaks and volatility clusters, and
    identifying local and global dynamic properties
    of a process at these timescales

53
2.Literature Survey
  • This research, motivated by the effective
    preprocessing capability of wavelets and the
    predictive power of fuzzy rule system, presents a
    hybrid system by integrating the wavelet and a
    TSK fuzzy rule system for stock price prediction.

54
3.Methodology
  • A time series is a sequence of real numbers,
    representing the measurements of a real variable
    at equal time intervals
  • Stock price movements
  • Volume of sales over time
  • Daily temperature readings
  • ECG data
  • A time series database is a large collection of
    time series
  • all NYSE stocks

55
3.Methodology
  • The Similarity Problem
  • X x1, x2, , xn
  • Y y1, y2, , yn
  • Define and compute Sim(X, Y)
  • E.g. do stocks X and Y have similar movements?

56
3.Methodology
  • TSK Fuzzy System Based Prediction
  • Input Selection Using Stepwise Regression
    Analysis (SRA)
  • TSK fuzzy rule systems

57
3.Methodology
  • Data Preprocessing using Wavelet Theory

Fig1 A Wavelet transform process
58
3.Methodology
  • Gaussian fuzzy membership functions are adopted

59
3.Methodology
  • TSK fuzzy rule systems

a set of K IF-THEN rules in the following
form Ri If x1 is Ai1, x2 is Ai2 xn is Ain,
then yi ?i1 ?i1 x1 ?in xn,
60
3.Methodology
  • Data Clustering
  • The K-means clustering algorithm is employed for
    data clustering
  • Optimization of the Parameters in Fuzzy Rules
    Using Simulated Annealing

61
3.Methodology
  • Using K-Nearest-Neighbor as a Sliding Window

62
3.Methodology
  • Performance Measures
  • Mean Absolute Percentage Error (MAPE)

63
4.Simulation Results
  • The data set applied for test is the TSE index
    and it has been decomposed into two different
    sets the training data, test data.
  • The data for TSE index are from July 18, 2003 to
    December 31, 2005, totally 614 records.

64
4.Simulation Results
  • Before training the TSK fuzzy model, a wavelet
    transformation has been applied to preprocess the
    data. According to the MAPE, a 3-level wavelet
    preprocessing is thus applied.

65
4.Simulation Results
66
4.Simulation Results
  • Six factors selected as the inputs of the TSK
    model to predict the stock price.
  • They are six day moving average (MA), six day
    bias (BIAS), six day relative strength index
    (RSI), nine day stochastic line (KD), the moving
    average divergence (MABIAS) and the 13 days
    psychological line (PSY).

67
4.Simulation Results
  • In the experiment, the stock price data was
    clustered into 2 to 8 clusters. The performance
    (MAPE) of the algorithms with different number of
    clusters is shown in Fig. 6. From the figure, we
    see that the best performance has been achieved
    with 3 clusters

68
4.Simulation Results

Fig2 MAPE of the proposed model for different
data clusters.
69
4.Simulation Results
  • Four different algorithms to be compared with
  • the traditional back-propagation neural networks
    (BPN),
  • a standard TSK,
  • multiple regression model (MRM),
  • and a forecasting method by integrating genetic
    algorithm with Wang and Mendals (GAWM)

70
4.Simulation Results

Table1. MAPE performance from different methods
71
5. Conclusions
  • A TSK fuzzy model proposed for stock price
    prediction.
  • The data preprocessed using the Haar wavelet.
  • Then, SRA technique employed to select the most
    relevant factors for prediction.
  • To avoid rule explosion, the k-means clustering
    algorithm employed to group the data into a
    number of clusters and one fuzzy rule is
    generated for each cluster.
  • KNN applied for fine-tune the forecasting results.

72
5. Future Researches
  • A lot of techniques in the Soft Computing or
    Computational Intelligence community on time
    series similarity measures and data forecasting
    problems
  • Turning Points prediction instead of stock price
    forecasting
  • Simple similarity models that allow efficient
    indexing, and effective retrieving for data
    training.
Write a Comment
User Comments (0)
About PowerShow.com