Techniques For High Dimension Data Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Techniques For High Dimension Data Analysis

Description:

High-dimensional data poses many challenges for data analysis as it makes the calculation difficult. Also, with such data, it is challenging to have a deterministic result. This means that it is nearly impossible to find a model that can describe the relationship between the response and the predictor variable. It happens due to a lack of observations or sample size to train the model on. – PowerPoint PPT presentation

Number of Views:60
Slides: 9
Provided by: laylavictoria
Category:

less

Transcript and Presenter's Notes

Title: Techniques For High Dimension Data Analysis


1
SynergisticIT
  • The best programmers in the bay area Period!

2
Techniques For High Dimension Data Analysis
3
High-dimensional data is a dataset where the
number of features is larger than the number of
observations. In a dataset, there can be any
number of features. But this data can only be
considered high-dimensional if the number of
observations or independent size is less than the
features. High-dimensional data poses many
challenges for data analysis as it makes the
calculation difficult. Also, with such data, it
is challenging to have a deterministic result.
This means that it is nearly impossible to find a
model that can describe the relationship between
the response and the predictor variable. It
happens due to a lack of observations or sample
size to train the model on. Lets look at few
examples of high-dimensional data.
4
Common Examples Of High-Dimensional Data
5
  • Missing Values Ratio  Data columns with missing
    values dont have much useful information. So,
    you can remove the columns with missing values
    exceeding the given threshold.
  • Low Variance Filter You can remove the data
    columns containing variance lower than the given
    threshold. But normalization is needed before
    using this technique as the variance is range
    dependent.
  • High Correlation Filter Reduce the pairs of
    columns to one with a correlation coefficient
    higher than the given threshold. But as
    correlation is scale-sensitive, column
    normalization is necessary for correlation
    comparison.

6
  • Principal Component Analysis (PCA)- It is a
    statistical procedure that changes the original n
    coordinates into a new set of n coordinates of a
    dataset called principal components. The first
    step to PCA is the standardization of data.
  • Random Forests / Ensemble Trees This technique
    is extremely useful in feature selection as an
    effective classifier. One way of dimension
    reduction is to generate a carefully constructed
    large set of trees against a target attribute and
    then find the most informative subset of features
    through each attributes usage statistics
    relative to other attributes usage.
  • You can also generate a large set of shallow
    trees, with each tree trained on a small fraction
    of the number of attributes. The attribute
    selected as the best split is the most
    informative feature to retain.

7
Besides these, you can also use the Backward
feature elimination and forward feature
construction technique. However, backward feature
elimination and forward feature construction are
time-consuming and computationally expensive. So,
these are applied to data sets with a relatively
low number of input columns. Conclusion You can
acquire high-dimensional data reduction
techniques during data science training in
California. Each of the given methods is useful
in effectively reducing high-dimension data.
Dimensionality reduction not only speeds up
algorithm execution but improves model
performance. Source https//datasciencetrainingu
sa.wordpress.com/2021/08/04/techniques-for-high-di
mension-data-analysis/
8
A Picture Is Worth a Thousand Words
Write a Comment
User Comments (0)
About PowerShow.com