Mining the Stock Market: Which Measure is the Best ? Martin Gavrilov, Dragomir Anguelov, Piotr Indyk, Rajeev Motwani - PowerPoint PPT Presentation

About This Presentation
Title:

Mining the Stock Market: Which Measure is the Best ? Martin Gavrilov, Dragomir Anguelov, Piotr Indyk, Rajeev Motwani

Description:

Mining the Stock Market: Which Measure is the Best ? Martin Gavrilov, Dragomir Anguelov, Piotr Indyk, Rajeev Motwani Presented by Arun Qamra Main Idea Lot of interest ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 16
Provided by: Tobi91
Learn more at: https://web.ece.ucsb.edu
Category:

less

Transcript and Presenter's Notes

Title: Mining the Stock Market: Which Measure is the Best ? Martin Gavrilov, Dragomir Anguelov, Piotr Indyk, Rajeev Motwani


1
Mining the Stock Market Which Measure is the
Best ?Martin Gavrilov, Dragomir Anguelov, Piotr
Indyk, Rajeev Motwani
  • Presented by
  • Arun Qamra

2
Main Idea
  • Lot of interest in mining Time Series data
  • But little work on identifying measures suitable
    for specific class of data sets
  • This work attempts to
  • Study similarity measures suitable for stocks
  • Evaluate results

3
More specifically..
  • 500 stocks, data for one year (S P index, 1998)
  • Opening price for 252 days
  • Time Series
  • Clustering to find similar stocks
  • Variety of similarity measures

4
Evaluation Technique
  • How do you evaluate clustering results ?
  • Each stock pre-assigned to a cluster/category
  • 102 clusters (based on industry)
  • Abstracted into 62 super-clusters
  • Used as ground truth
  • Attempt to recreate this clustering

5
Feature Selection
  • Data Representation
  • Normalization
  • Dimensionality Reduction

6
Data Representation
  • Raw
  • Point in 252-dimensional space represents
    sequence i.e. stock
  • First Derivative
  • i-th coordinate is equal to difference between
    i-th and (i1)-th value of sequence

7
Normalization
  • Standard Normalization
  • Mean subtracted from all coordinates, then
    dividing vector by L2 norm
  • Piecewise Normalization
  • Split sequence into windows
  • Each window normalized separately
  • Local similarities taken into account
  • No Normalization

8
Dimensionality Reduction
  • Principal Component Analysis
  • Maps vectors into lower-dimensional space
  • Aggregation
  • Local fluctuation insignificant
  • Groups of consecutive B data points replaced by
    average
  • Hence dimensionality reduced by factor of B
  • Fourier Transform
  • Time series represented by few of its lowest
    frequencies

9
Similarity Measure
  • Use Euclidean distance

10
Clustering Method
  • Hierarchical Agglomerative Clustering
  • Hierarchical classification of objects
  • Done by series of binary mergers
  • Smallest max distance between two inter-cluster
    elements
  • Any level in hierarchy can be chosen based on
    required number of clusters

11
Comparing Clusterings
  • Similarity measure for comparing clusterings
  • Note Not symmetric

12
Precision-Recall curves
  • Precision recall curves also used for evaluation
  • To make observations independent of clustering
    algorithm
  • For each S,
  • Rank all S based on distance
  • Plot percentage of relevant stocks among i
    closest stocks against i
  • Average over all stocks

13
Results Dimensionality Reduction
  • Preprocessing causes dimensionality dispersal
  • Raw data, reduced to 10 or 5 dimensions
  • Raw data, Normalized, reduced to 50
  • FD, Normalized, reduced to 100

14
Results Clustering
  • Normalization improves results
  • Derivative improves results
  • Best results for combination of
  • Piecewise Normalization (window 15)
  • First Derivative

15
Conclusion
  • This paper
  • Identifies mining techniques specifically useful
    for Stock Market data
  • Evaluates against real data
  • Further research needed to understand behavior of
    this class of data.
  • effect of taking first derivative not understood
    well
Write a Comment
User Comments (0)
About PowerShow.com