Title: Nonvisual Analytical Methods
1Nonvisual Analytical Methods
- Presented by
- Terry Bobo
- Christina Laurentia
- Siew Lim
2Introduction
- Although visualization is a very powerful
technique, users sometimes use other analytical
methods - Statistical testing
- Decision trees
- Association rules
- Neural networks
- Genetic algorithms
- Unlike visualization, other approaches are
employed with the user largely out of the
analyses loop.
3Statistical Methods
- The use of descriptive and inferential statistics
is by far the most standard approach to data
analysis - Preferred choice in Science, Medicine, Business
- Requires the use of quantitive data
4Statistical Methods (Cont.)
- Descriptive statistics include measures such as
- Mean (avg.)
- Median (middle value)
- Mode (most common value)
- Standard deviation (measure of variance)
- Range (high low values)
- Data distribution
5Assessing Group Differences
- Statistics can be used for hypothesis testing
- Values are predicted before analysis begins
- Ex. Fictious data describing sales pattern during
the months of November and December.
6Assessing Group Differences (Cont.)
- The experimental hypothesis, H1, is tested
against the standard null, H0 - H0 assumes there are no differences between the
groups H1 assumes groups are different.
7Assesing Group Differences(Cont.)
- H1 December figures are in fact higher than
the Nov. Figures. - H0 There is no real difference any difference
is due to chance.
8Size of Statistical Effects
- Important factors to consider before an outcome
is considered reliable - Probability level
- Ex. 0.05 or 0.10 (significance level)
- Number of observations included in sample
- Number of groups being compared
- The more degrees of freedom, the smaller the
ratio needed in order to reject HO.
9Predictive Regression Analysis
- Linear regression Analysis
- Used to make predictions about the numeric values
of a variable within the problem space
represented in the data set. - Yields a best line fit to the data points
10Predictive Regression Analysis(Cont.)
11When to Use Statistical Analysis
- Conditions to be met before statistically
reliable effects can be observed. - The data must be in numeric form
- Divided into groups for analysis
- User must have some sort of hypothesis about what
to expect to find in an analysis
12Keeping the Big Picture in Mind
- Determine what the overall goal of the data
mining activity is. - Whether a statistical test is appropriate depends
on the question being asked. - Statistics is not always the best approach.
13Keeping the Big Picture in Mind (Cont.)
- Ex. A few distributers were discovered to be
engaged in fraud, which claims constituted a
small portion of total claims
14Segregating the data
Figure 6.4 decision tree describing car purchase
profiles of males and female
15Using decision trees to build rules
- The frame variable( 2door or 4-door) represents
the root node. - Engine type (V4 or higher) is a child node of the
root node, and so on. - When a record enters the tree, it moves down
until it reaches the point beyond which it can no
longer move.
16Assessing rules
- For each rule you can measure how often the
submitted data records are properly classified. - You can compute the error rate of the entire tree
as weighted sum of the error rates for all of the
individual leave.
17When to use decision trees
- Decision trees are useful for problems in which
the goal is to make broad categorical
classifications or predictions. - They are not useful in applications requiring
quantitative variables.
18Association rules specific predictions about the
values of
- Association rules are derived from a type of
analysis that extracts information from
coincidence. - Sometimes called market basket analysis.
- This methodology allows you to discover
correlations, or co-occurrences of transactional
events.
19The cross-correlation matrix
- Association rules are derived from analyses based
on cross-correlation matrices in which the
likelihood of each event occurring in conjunction
with every other event is computed.
20The cross-correlation matrix(cont.)
Figure 6.5 example of cross-correlation matrix
used to infer association rules about the
purchase of grocery store items
21When to use association rules
- Association rule analysis will be most useful
when you are doing exploratory analyses, looking
for interesting relationships that might exist
within a data set.
22Neural networks
- A type of computational methodology commonly used
for pattern identification and classification - Comprised of nodes that are interconnected by
excitatory and inhibitory connections - Pattern of activity are used to represented
information in the network in a distributed
fashion. - Supervised Unsupervised Learning
23Supervised Learning
- Learning occurs in a supervised mode in which
system are trained on a known set of target so
that those targets are readily identified when
presented as inputs to the system. - On each trial during supervised learning, an
input is presented to the system. - The input activates certain nodes, and the system
provides an output response based on the pattern
of activation.
24Supervised Learning ( contd )
- 4. If the output does not match the desired
response during learning, the system is provided
with feedback designed to modify the incorrect
response. - 5. Once the system has learned the correct
responses to the set of training inputs, the
learning mode is finished. - 6. When the learning mode is finished, the
system can be used to automate pattern detection
and alert the user when incoming patterns match a
previously learned output response.
25Unsupervised Learning
- Do not require that the set of permissible output
responses and their mapping to input be defined a
priori. - In unsupervised learning, the network forms its
own set of outputs during training based on
features extracted by the network. - The most popular unsupervised learning is
Kohonens feature map.
26Unsupervised Learning ( contd))
Figure 6.6 topographical map produced by an
unsupervised learning network
27When to use unsupervised neural networks
- The neural network approach to data mining is
most useful when you are searching for novel ways
of segmenting the data set. - This method can be used to discover subgroups of
data that defined in terms of some common
feature(s) that separate them from other portions
of the complete population.
28Genetic Algorithms
- The utility of genetic algorithms lies more
within the realm of optimization. - Genetic algorithms start with a population of
items and seek to alter and eventually optimize
their composition for the solution of a
particular problem. - The genetic material or information represented
by each individual can be passed on to subsequent
generations in a variety of ways with
optimization occurring in the process. - Three basic mechanisms in which information is
chosen, altered, and passed on in order to
achieve optimization selection, crossover, and
mutation.
29Selection
- The process of selection is analogous to the
process of natural selection that occurs in
evolution - Selection is based on the principle of survival
of the fittest in which the individuals that are
best suited for the environment are the ones that
survive to pass their genetic material on to the
next generation.
30Crossover
- Crossover occurs when two individuals chosen
randomly from the population are joined or
mated such that the resulting offspring contain
partial replications of the information contained
in each of the parents. - The offspring then become full-fledged members of
the population, competing for survival along with
the rest.
31Mutation
- Mutations can occur naturally when there is an
error in the transmission of genetic information
from parent to child. - Mutations can have either good or bad effects.
32When to use Genetic Algorithms
- Genetic algorithms are most useful in cases in
which the goal is to find an optimal solution
given a definable problem space. - Genetic algorithms are most useful in situations
in which you are combining data from several
disparate information sources and types. - There needs to be a fair amount of uniformity in
terms of data to be analyzed since all data must
be coded into vectors of the same dimensionality.
33Christinas Questions
- What are two types of Neural Networks?
- What are basic mechanisms in Genetic Algorithms?
When is the best time to use Genetic Algorithms?
Answer
34Terrys Question
- Question Distinguish between H1, and HO.
35Siews question
- When is a good time to use decision trees ?