Parameterizing Random Test Data According to Equivalence Classes PowerPoint PPT Presentation

presentation player overlay
1 / 29
About This Presentation
Transcript and Presenter's Notes

Title: Parameterizing Random Test Data According to Equivalence Classes


1
Parameterizing Random Test Data According to
Equivalence Classes
  • Chris Murphy, Gail Kaiser, Marta Arias
  • Columbia University

2
What is random testing?
  • This is not part of the talk!!!!
  • Random testing is the notion of using random
    input to test the application
  • As opposed to using pre-determined and manually
    selected equivalence classes or partitions

3
Introduction
  • We are investigating the quality assurance of
    Machine Learning (ML) applications
  • Currently we are concerned with a real-world
    application for potential future use in
    predicting electrical device failures
  • Using ranking instead of classification
  • Our concern is not whether an algorithm predicts
    well but whether an implementation operates
    correctly

4
Data Set Options
  • Real-world data sets
  • Not always accessible/available
  • May not necessarily contain the separation or
    combination of traits that we desire to test
  • Hand-generation of data
  • Only useful for small tests
  • Random testing
  • Limited by the lack of a reliable test oracle
  • ML applications of interest fall into the
    category of non-testable programs

5
Motivation
  • Without a reliable test oracle, we can only
  • Look for obvious faults
  • Consider intermediate results
  • Detect discrepancies in the specification
  • We need to restrict some properties of random
    test data generation

6
Our Solution
  • Parameterized Random Test Data Generation
  • Automatically generate random data sets, but
    parameterized to control the range and
    characteristics of those random values
  • Parameterization allows us to create a hybrid
    between equivalence class partitioning and random
    testing

7
Overview
  • Machine Learning Background
  • Data Generation Framework
  • Findings and Results
  • Evaluation and Observations
  • Conclusions and Future Work

8
Machine Learning Fundamentals
  • Data sets consist of a number of examples, each
    of which has attributes and a label
  • In the first phase (training), a model is
    generated that attempts to generalize how
    attributes relate to the label
  • In the second phase (validation), the model is
    applied to a previously-unseen data set with
    unknown labels to produce a classification (or,
    in our case, a ranking)

9
Problems Faced in Testing
  • The testing input should be based on the problem
    domain
  • Need to consider a way to mimic all of the traits
    of the real-world data sets
  • Also need to keep in mind that we do not have a
    reliable test oracle

10
Analyzing the Problem Domain
  • Consider properties of data sets in general
  • Data set size number of attributes and examples
  • Range of values attributes and labels
  • Precision of floating-point numbers
  • Whether values can repeat
  • Consider properties of real-world data sets in
    the domain of interest
  • How alphanumeric attributes are to be interpreted
  • Whether data values might be missing

11
Equivalence Classes
  • Data sizes of different orders of magnitude
  • Repeating vs. non-repeating attribute values
  • Missing vs. no-missing attribute values
  • Categorical vs. non-categorical data
  • 0/1 labels vs. non-negative integer labels
  • Predictable vs. non-predictable data sets
  • Used data set generator to parameterize test case
    selection criteria

12
How Data Are Generated
  • M attributes and N examples
  • No-repeat mode
  • Generate a list of integers from 1 to MN and
    then randomly permute them
  • Repeat mode
  • Each value in the data set is simply a random
    integer between 1 and MN
  • Tool ensures at least one set of repeating numbers

13
Generating Labels
  • Specify percentage of positive examples to
    include in the data set
  • positive examples have a label of 1
  • negative examples have a label of 0
  • Data generation framework guarantees that the
    number of positive examples comes out to be the
    right number, even though the values are randomly
    placed throughout the data set
  • Labels are never unknown/missing

14
Categorical Data
  • For some alphanumeric attributes, data
    pre-processing is used to expand K distinct
    values to K attributes
  • Same as in real-world ranking application
  • Input parameter to data generation tool is of the
    format (a1, a2, ..., aK-1, aK, m)
  • a1 through aK represent the percentage
    distribution of those values for the categorical
    attribute
  • m is the percentage of unknown values

15
Data Set Generator - Parameters
  • of examples
  • of attributes
  • positive examples (label 1)
  • missing
  • any categorical data
  • repeat/no-repeat modes

16
Sample Data Sets
  • 10 examples, 10 attributes, 40 positive
    examples, 20 missing, repeats allowed

27,81,88,59, ?,16,88, ?,41, ?,0 15,70,91,41, ?,
3, ?, ?, ?,64,0 82, ?,51,47, ?, 4, 1,99,
?,51,0 22,72,11, ?,96,24,44,92, ?,11,1 57,77,
?,86,89,77,61,76,96,98,1 76,11, 4,51,43,
?,79,21,28, ?,0 6,33, ?, ?,52,63,94,75,
8,26,0 77,36,91, ?,47, 3,85,71,35,45,1 ?,17,15,
2,90,70, ?, 7,41,42,0 8,58,42,41,74,87,68,68,
1,15,1
35, 3,20,41,91, ?,32,11,43, ?,1 19,50,11,57,36,94,
?,96, 7,23,1 24,36,36,79,78,33,34, ?,32, ?,0
?,15, ?,19,65,80,17,78,43, ?,0 40,31,89,50,83,55,2
5, ?, ?,45,1 52, ?, ?, ?, ?,39,79,82,94,
?,0 86,45, ?, ?,74,68,13,66,42,56,0
?,53,91,23,11, ?,47,61,79, 8,0 77,11,34,44,92,
?,63,62,51,51,1 21, 1,70,14,16,40,63,94,69,83,0
17
The Testing Framework
  • Data set generator
  • Model comparison
  • Ranking comparison includes metrics like
    normalized equivalence and AUCs
  • Tracing options for generating and comparing
    outputs of debugging statements

18
MartiRank and SVM
  • MartiRank was specifically designed for the
    real-world device failure application
  • Seeks to find the sequence of attributes to
    segment and sort the data to produce the best
    result
  • SVM is typically a classification algorithm
  • Seeks to find a hyperplane that separates
    examples from different classes
  • SVM-Light has a ranking mode based on the
    distance from the hyperplane

19
Findings
  • Testing approach and framework were developed for
    MartiRank then applied to SVM
  • Only the findings most related to parameterized
    random testing are presented here
  • More details and case studies about the testing
    of MartiRank can be found in our tech report

20
Issue 1 Repeating Values
  • One version of MartiRank did not use stable
    sorting

... 91,41,19, 3,57,11,20,64,0 36,73,47,
3,85,71,35,45,1 ... ... ... ...
stable
... 91,41,19, 3,57,11,20,64,0 ... ... ... 36,73,47
, 3,85,71,35,45,1 ...
... 36,73,47, 3,85,71,35,45,1 91,41,19,
3,57,11,20,64,0 ... ... ... ...
unstable
21
Issue 2 Sparse Data Sets
  • Not specifically addressed in specification

41,91, ?,32,11,43, ?,1 57,36,94, ?,96,
7,23,1 79,78,33,34, ?,31, ?,0 19,65,80,17,78,46,
?,0 50,83,55,25, ?, ?,45,1 ?, ?,39,79,82,94, ?,0
41,91, ?,32,11,43, ?,1 19,65,80,17,78,46,
?,0 79,78,33,34, ?,31, ?,0 ?, ?,39,79,82,94,
?,0 50,83,55,25, ?, ?,45,1 57,36,94, ?,96, 7,23,1
sort around missing values
put missing values at end
randomly insert missing values
41,91, ?,32,11,43, ?,1 19,65,80,17,78,46, ?,0 ?,
?,39,79,82,94, ?,0 57,36,94, ?,96,
7,23,1 79,78,33,34, ?,31, ?,0 50,83,55,25, ?,
?,45,1
41,91, ?,32,11,43, ?,1 50,83,55,25, ?,
?,45,1 19,65,80,17,78,46, ?,0 79,78,33,34, ?,31,
?,0 ?, ?,39,79,82,94, ?,0 57,36,94, ?,96, 7,23,1
22
Issue 3 Categorical Data
  • Discovered that refactoring had introduced a bug
    into an important calculation
  • A global variable was being used incorrectly
  • This bug did not appear in any of the tests only
    with repeating values or only with missing values
  • However, categorical data necessarily has
    repeating values and may have missing

23
Issue 4 Permuted Input Data
  • Randomly permuting the input data led to
    different models (and then different rankings)
    generated by SVM-Light
  • Caused by chunking data for use by an
    approximating variant of optimization algorithm

24
Observations
  • Parameterized random testing allowed us to
    isolate the traits of the data sets
  • These traits may appear in real-world data but
    not necessarily in the desired combinations
  • Algorithms failure to address specific data set
    traits can lead to discrepancies

25
Related Work Machine Learning
  • There has been much research into applying
    Machine Learning techniques to software testing,
    but not the other way around
  • Reusable real-world data sets and Machine
    Learning frameworks are available for checking
    how well a Machine Learning algorithm predicts,
    but not for testing its correctness

26
Related Work Random Testing
  • Parameterization generally refers to specifying
    data type or range of values
  • Our work differs from that of Thénevod-Fosse et
    al. 91 on structural statistical testing,
    which focuses on path selection and coverage
    testing, not system testing
  • Also differs from uniform statistical testing
    because although we do select random data over a
    uniform distribution, we parameterize it
    according to equivalence classes

27
Limitations and Future Work
  • Test suite adequacy for coverage not addressed or
    measured
  • Could also consider non-deterministic Machine
    Learning algorithms
  • Can also include mutation testing for
    effectiveness of data sets
  • Should investigate creating large data sets that
    correlate to real-world data

28
Conclusion
  • Our contribution is an approach that combines
    parameterization and randomness to control the
    properties of very large data sets
  • Critical for limiting the scope of individual
    tests and for pinpointing specific issues related
    to the traits of the input data

29
Parameterizing Random Test Data According to
Equivalence Classes
  • Chris Murphy, Gail Kaiser, Marta Arias
  • Columbia University
Write a Comment
User Comments (0)
About PowerShow.com