Nonvisual Analytical Methods

About This Presentation

Title:

Nonvisual Analytical Methods

Description:

Keeping the Big Picture in Mind. Determine what the overall goal of the data ... Figure 6.4 decision tree describing car purchase profiles of males and female ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 36

Provided by: terry111

Category:

more less

Transcript and Presenter's Notes

Title: Nonvisual Analytical Methods

1
Nonvisual Analytical Methods

Presented by
Terry Bobo
Christina Laurentia
Siew Lim

2
Introduction

Although visualization is a very powerful
technique, users sometimes use other analytical
methods
Statistical testing
Decision trees
Association rules
Neural networks
Genetic algorithms
Unlike visualization, other approaches are
employed with the user largely out of the
analyses loop.

3
Statistical Methods

The use of descriptive and inferential statistics
is by far the most standard approach to data
analysis
Preferred choice in Science, Medicine, Business
Requires the use of quantitive data

4
Statistical Methods (Cont.)

Descriptive statistics include measures such as
Mean (avg.)
Median (middle value)
Mode (most common value)
Standard deviation (measure of variance)
Range (high low values)
Data distribution

5
Assessing Group Differences

Statistics can be used for hypothesis testing
Values are predicted before analysis begins
Ex. Fictious data describing sales pattern during
the months of November and December.

6
Assessing Group Differences (Cont.)

The experimental hypothesis, H1, is tested
against the standard null, H0
H0 assumes there are no differences between the
groups H1 assumes groups are different.

7
Assesing Group Differences(Cont.)

H1 December figures are in fact higher than
the Nov. Figures.
H0 There is no real difference any difference
is due to chance.

8
Size of Statistical Effects

Important factors to consider before an outcome
is considered reliable
Probability level
Ex. 0.05 or 0.10 (significance level)
Number of observations included in sample
Number of groups being compared
The more degrees of freedom, the smaller the
ratio needed in order to reject HO.

9
Predictive Regression Analysis

Linear regression Analysis
Used to make predictions about the numeric values
of a variable within the problem space
represented in the data set.
Yields a best line fit to the data points

10
Predictive Regression Analysis(Cont.)
11
When to Use Statistical Analysis

Conditions to be met before statistically
reliable effects can be observed.
The data must be in numeric form
Divided into groups for analysis
User must have some sort of hypothesis about what
to expect to find in an analysis

12
Keeping the Big Picture in Mind

Determine what the overall goal of the data
mining activity is.
Whether a statistical test is appropriate depends
on the question being asked.
Statistics is not always the best approach.

13
Keeping the Big Picture in Mind (Cont.)

Ex. A few distributers were discovered to be
engaged in fraud, which claims constituted a
small portion of total claims

14
Segregating the data
Figure 6.4 decision tree describing car purchase
profiles of males and female
15
Using decision trees to build rules

The frame variable( 2door or 4-door) represents
the root node.
Engine type (V4 or higher) is a child node of the
root node, and so on.
When a record enters the tree, it moves down
until it reaches the point beyond which it can no
longer move.

16
Assessing rules

For each rule you can measure how often the
submitted data records are properly classified.
You can compute the error rate of the entire tree
as weighted sum of the error rates for all of the
individual leave.

17
When to use decision trees

Decision trees are useful for problems in which
the goal is to make broad categorical
classifications or predictions.
They are not useful in applications requiring
quantitative variables.

18
Association rules specific predictions about the
values of

Association rules are derived from a type of
analysis that extracts information from
coincidence.
Sometimes called market basket analysis.
This methodology allows you to discover
correlations, or co-occurrences of transactional
events.

19
The cross-correlation matrix

Association rules are derived from analyses based
on cross-correlation matrices in which the
likelihood of each event occurring in conjunction
with every other event is computed.

20
The cross-correlation matrix(cont.)
Figure 6.5 example of cross-correlation matrix
used to infer association rules about the
purchase of grocery store items
21
When to use association rules

Association rule analysis will be most useful
when you are doing exploratory analyses, looking
for interesting relationships that might exist
within a data set.

22
Neural networks

A type of computational methodology commonly used
for pattern identification and classification
Comprised of nodes that are interconnected by
excitatory and inhibitory connections
Pattern of activity are used to represented
information in the network in a distributed
fashion.
Supervised Unsupervised Learning

23
Supervised Learning

Learning occurs in a supervised mode in which
system are trained on a known set of target so
that those targets are readily identified when
presented as inputs to the system.
On each trial during supervised learning, an
input is presented to the system.
The input activates certain nodes, and the system
provides an output response based on the pattern
of activation.

24
Supervised Learning ( contd )

4. If the output does not match the desired
response during learning, the system is provided
with feedback designed to modify the incorrect
response.
5. Once the system has learned the correct
responses to the set of training inputs, the
learning mode is finished.
6. When the learning mode is finished, the
system can be used to automate pattern detection
and alert the user when incoming patterns match a
previously learned output response.

25
Unsupervised Learning

Do not require that the set of permissible output
responses and their mapping to input be defined a
priori.
In unsupervised learning, the network forms its
own set of outputs during training based on
features extracted by the network.
The most popular unsupervised learning is
Kohonens feature map.

26
Unsupervised Learning ( contd))
Figure 6.6 topographical map produced by an
unsupervised learning network
27
When to use unsupervised neural networks

The neural network approach to data mining is
most useful when you are searching for novel ways
of segmenting the data set.
This method can be used to discover subgroups of
data that defined in terms of some common
feature(s) that separate them from other portions
of the complete population.

28
Genetic Algorithms

The utility of genetic algorithms lies more
within the realm of optimization.
Genetic algorithms start with a population of
items and seek to alter and eventually optimize
their composition for the solution of a
particular problem.
The genetic material or information represented
by each individual can be passed on to subsequent
generations in a variety of ways with
optimization occurring in the process.
Three basic mechanisms in which information is
chosen, altered, and passed on in order to
achieve optimization selection, crossover, and
mutation.

29
Selection

The process of selection is analogous to the
process of natural selection that occurs in
evolution
Selection is based on the principle of survival
of the fittest in which the individuals that are
best suited for the environment are the ones that
survive to pass their genetic material on to the
next generation.

30
Crossover

Crossover occurs when two individuals chosen
randomly from the population are joined or
mated such that the resulting offspring contain
partial replications of the information contained
in each of the parents.
The offspring then become full-fledged members of
the population, competing for survival along with
the rest.

31
Mutation

Mutations can occur naturally when there is an
error in the transmission of genetic information
from parent to child.
Mutations can have either good or bad effects.

32
When to use Genetic Algorithms

Genetic algorithms are most useful in cases in
which the goal is to find an optimal solution
given a definable problem space.
Genetic algorithms are most useful in situations
in which you are combining data from several
disparate information sources and types.
There needs to be a fair amount of uniformity in
terms of data to be analyzed since all data must
be coded into vectors of the same dimensionality.

33
Christinas Questions

What are two types of Neural Networks?
What are basic mechanisms in Genetic Algorithms?
When is the best time to use Genetic Algorithms?
Answer

34
Terrys Question

Question Distinguish between H1, and HO.

35
Siews question

When is a good time to use decision trees ?

Write a Comment

User Comments (0)

About PowerShow.com

Nonvisual Analytical Methods - PowerPoint PPT Presentation

Nonvisual Analytical Methods

Keeping the Big Picture in Mind. Determine what the overall goal of the data ... Figure 6.4 decision tree describing car purchase profiles of males and female ... – PowerPoint PPT presentation