Nonvisual Analytical Methods - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Nonvisual Analytical Methods

Description:

Keeping the Big Picture in Mind. Determine what the overall goal of the data ... Figure 6.4 decision tree describing car purchase profiles of males and female ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 36
Provided by: terry111
Category:

less

Transcript and Presenter's Notes

Title: Nonvisual Analytical Methods


1
Nonvisual Analytical Methods
  • Presented by
  • Terry Bobo
  • Christina Laurentia
  • Siew Lim

2
Introduction
  • Although visualization is a very powerful
    technique, users sometimes use other analytical
    methods
  • Statistical testing
  • Decision trees
  • Association rules
  • Neural networks
  • Genetic algorithms
  • Unlike visualization, other approaches are
    employed with the user largely out of the
    analyses loop.

3
Statistical Methods
  • The use of descriptive and inferential statistics
    is by far the most standard approach to data
    analysis
  • Preferred choice in Science, Medicine, Business
  • Requires the use of quantitive data

4
Statistical Methods (Cont.)
  • Descriptive statistics include measures such as
  • Mean (avg.)
  • Median (middle value)
  • Mode (most common value)
  • Standard deviation (measure of variance)
  • Range (high low values)
  • Data distribution

5
Assessing Group Differences
  • Statistics can be used for hypothesis testing
  • Values are predicted before analysis begins
  • Ex. Fictious data describing sales pattern during
    the months of November and December.

6
Assessing Group Differences (Cont.)
  • The experimental hypothesis, H1, is tested
    against the standard null, H0
  • H0 assumes there are no differences between the
    groups H1 assumes groups are different.

7
Assesing Group Differences(Cont.)
  • H1 December figures are in fact higher than
    the Nov. Figures.
  • H0 There is no real difference any difference
    is due to chance.

8
Size of Statistical Effects
  • Important factors to consider before an outcome
    is considered reliable
  • Probability level
  • Ex. 0.05 or 0.10 (significance level)
  • Number of observations included in sample
  • Number of groups being compared
  • The more degrees of freedom, the smaller the
    ratio needed in order to reject HO.

9
Predictive Regression Analysis
  • Linear regression Analysis
  • Used to make predictions about the numeric values
    of a variable within the problem space
    represented in the data set.
  • Yields a best line fit to the data points

10
Predictive Regression Analysis(Cont.)
11
When to Use Statistical Analysis
  • Conditions to be met before statistically
    reliable effects can be observed.
  • The data must be in numeric form
  • Divided into groups for analysis
  • User must have some sort of hypothesis about what
    to expect to find in an analysis

12
Keeping the Big Picture in Mind
  • Determine what the overall goal of the data
    mining activity is.
  • Whether a statistical test is appropriate depends
    on the question being asked.
  • Statistics is not always the best approach.

13
Keeping the Big Picture in Mind (Cont.)
  • Ex. A few distributers were discovered to be
    engaged in fraud, which claims constituted a
    small portion of total claims

14
Segregating the data
Figure 6.4 decision tree describing car purchase
profiles of males and female
15
Using decision trees to build rules
  • The frame variable( 2door or 4-door) represents
    the root node.
  • Engine type (V4 or higher) is a child node of the
    root node, and so on.
  • When a record enters the tree, it moves down
    until it reaches the point beyond which it can no
    longer move.

16
Assessing rules
  • For each rule you can measure how often the
    submitted data records are properly classified.
  • You can compute the error rate of the entire tree
    as weighted sum of the error rates for all of the
    individual leave.

17
When to use decision trees
  • Decision trees are useful for problems in which
    the goal is to make broad categorical
    classifications or predictions.
  • They are not useful in applications requiring
    quantitative variables.

18
Association rules specific predictions about the
values of
  • Association rules are derived from a type of
    analysis that extracts information from
    coincidence.
  • Sometimes called market basket analysis.
  • This methodology allows you to discover
    correlations, or co-occurrences of transactional
    events.

19
The cross-correlation matrix
  • Association rules are derived from analyses based
    on cross-correlation matrices in which the
    likelihood of each event occurring in conjunction
    with every other event is computed.

20
The cross-correlation matrix(cont.)
Figure 6.5 example of cross-correlation matrix
used to infer association rules about the
purchase of grocery store items
21
When to use association rules
  • Association rule analysis will be most useful
    when you are doing exploratory analyses, looking
    for interesting relationships that might exist
    within a data set.

22
Neural networks
  • A type of computational methodology commonly used
    for pattern identification and classification
  • Comprised of nodes that are interconnected by
    excitatory and inhibitory connections
  • Pattern of activity are used to represented
    information in the network in a distributed
    fashion.
  • Supervised Unsupervised Learning

23
Supervised Learning
  • Learning occurs in a supervised mode in which
    system are trained on a known set of target so
    that those targets are readily identified when
    presented as inputs to the system.
  • On each trial during supervised learning, an
    input is presented to the system.
  • The input activates certain nodes, and the system
    provides an output response based on the pattern
    of activation.

24
Supervised Learning ( contd )
  • 4. If the output does not match the desired
    response during learning, the system is provided
    with feedback designed to modify the incorrect
    response.
  • 5. Once the system has learned the correct
    responses to the set of training inputs, the
    learning mode is finished.
  • 6. When the learning mode is finished, the
    system can be used to automate pattern detection
    and alert the user when incoming patterns match a
    previously learned output response.

25
Unsupervised Learning
  • Do not require that the set of permissible output
    responses and their mapping to input be defined a
    priori.
  • In unsupervised learning, the network forms its
    own set of outputs during training based on
    features extracted by the network.
  • The most popular unsupervised learning is
    Kohonens feature map.

26
Unsupervised Learning ( contd))
Figure 6.6 topographical map produced by an
unsupervised learning network
27
When to use unsupervised neural networks
  • The neural network approach to data mining is
    most useful when you are searching for novel ways
    of segmenting the data set.
  • This method can be used to discover subgroups of
    data that defined in terms of some common
    feature(s) that separate them from other portions
    of the complete population.

28
Genetic Algorithms
  • The utility of genetic algorithms lies more
    within the realm of optimization.
  • Genetic algorithms start with a population of
    items and seek to alter and eventually optimize
    their composition for the solution of a
    particular problem.
  • The genetic material or information represented
    by each individual can be passed on to subsequent
    generations in a variety of ways with
    optimization occurring in the process.
  • Three basic mechanisms in which information is
    chosen, altered, and passed on in order to
    achieve optimization selection, crossover, and
    mutation.

29
Selection
  • The process of selection is analogous to the
    process of natural selection that occurs in
    evolution
  • Selection is based on the principle of survival
    of the fittest in which the individuals that are
    best suited for the environment are the ones that
    survive to pass their genetic material on to the
    next generation.

30
Crossover
  • Crossover occurs when two individuals chosen
    randomly from the population are joined or
    mated such that the resulting offspring contain
    partial replications of the information contained
    in each of the parents.
  • The offspring then become full-fledged members of
    the population, competing for survival along with
    the rest.

31
Mutation
  • Mutations can occur naturally when there is an
    error in the transmission of genetic information
    from parent to child.
  • Mutations can have either good or bad effects.

32
When to use Genetic Algorithms
  • Genetic algorithms are most useful in cases in
    which the goal is to find an optimal solution
    given a definable problem space.
  • Genetic algorithms are most useful in situations
    in which you are combining data from several
    disparate information sources and types.
  • There needs to be a fair amount of uniformity in
    terms of data to be analyzed since all data must
    be coded into vectors of the same dimensionality.

33
Christinas Questions
  • What are two types of Neural Networks?
  • What are basic mechanisms in Genetic Algorithms?
    When is the best time to use Genetic Algorithms?
    Answer

34
Terrys Question
  • Question Distinguish between H1, and HO.

35
Siews question
  • When is a good time to use decision trees ?
Write a Comment
User Comments (0)
About PowerShow.com