This chapter uses MS Excel and Weka - PowerPoint PPT Presentation

1 / 171
About This Presentation
Title:

This chapter uses MS Excel and Weka

Description:

A Supervised technique that generalizes a set of numeric data by ... Bagging. Boosting. Instance Typicality. Part IV. Intelligent Systems. Rule-Based Systems ... – PowerPoint PPT presentation

Number of Views:263
Avg rating:3.0/5.0
Slides: 172
Provided by: Richard1376
Category:
Tags: bagging | chapter | excel | uses | weka

less

Transcript and Presenter's Notes

Title: This chapter uses MS Excel and Weka


1
  • This chapter uses MS Excel and Weka

2
Statistical Techniques
  • Chapter 10

3
10.1 Linear Regression Analysis
4
10.1 Linear Regression Analysis
  • A Supervised technique that generalizes a set of
    numeric data by creating a math equation relating
    one or more ,nput variables to a single output
    variable.
  • With linear regression we attemp to model
    vairation in a dependent variable as a linear
    combination of one or more independent variable
  • Linear regression is appro when the relation
    betwee the dependent and the independent
    variables are nearly linear

5
Simple Linear Regression(slope-intercept form)
6
Simple Linear Regression(least squares criterion)
7
Multiple Linear Regression with Excel
8
Try to estimate the value of a building
9
A Regression Equation for the District Office
Building Data
10
(No Transcript)
11
10.1 Linear Regression Analysis
  • How accurate are the results
  • Use scatterplot diagram, and the line for the
    formula
  • Which ind vars are linearly related to dep vars.
    Use the stats?
  • Coefficient determination1, no difference
    between actual (in the table) and computed values
    for dependent variable.(reps corrolation between
    actual and computed values)
  • Standard error for the estimate of dep var.

12
F stat for the regression analysis
  • Used to establish, if the coeff. deter. Is
    significant.
  • Look up f critical values (459) from one-tailed F
    tables in stat books using v1(number of ind vars,
    4), v2 (no of instance no of vars, 11-56)
  • Regression equation is able to correctly
    determine assesed values of office buildings that
    are part of the training data

13
(No Transcript)
14
Regression Trees
15
(No Transcript)
16
Regression Tree
  • Essentially a desicion tree with leaf node with
    numeric variables
  • The value at an individual leaf node is numeric
    average of the output attribute for all instances
    passing through the tree to the leaf node
    posititon
  • Regresion trees are more accurate than linear
    regression, when data is nonlinear
  • But is more difficult to interpret
  • Sometime regression trees are combined with
    linear regression to form model trees

17
Model Trees
  • Regression tree linear regression
  • Each leaf node represents a linear regression
    quation instead of an average value
  • Model trees simplify regession trees by reducing
    the number of nodes in the tree.
  • More complex tree means less linear relationship
    between dep and ind vars.

18
(No Transcript)
19
10.2 Logistic Regression
20
Logistic Regression
  • Using linear regresion to model problems with
    observed outcome restricted to 2 values (e.g.
    yes/no) is sriously flawed. Value restriction
    placed on output var is not observed in the
    regression equation, Linear regression produce
    straight line unbounded onboth ends.
  • Therefor the linear equation must be transform to
    restric output to 0,1, Thus regression equation
    can be thought of as producing a probablity of
    occurence or nonoccurence of a measured event.
  • Logistic regression applies logaithmic transform.

21
Transforming the Linear Regression Model
  • Logistic regression is a nonlinear regression
    technique that associates a conditional
    probability with each data instance.
  • 1 denotes observaton of one class (yes)
  • 0 denotes observation of another class (no)
  • Thus a conditional proabality of seeing class
    associatied with y1 (yes) p(y1x), given the
    values in the feature vector x

22
The Logistic Regression Model
Determine the coefficients in x, (axc) using an
iterative method (tries to minimize the sum of
logarithms of predicted probablities) Convergence
occurs when logarithmic summation is close to 0
or when it doesnt change from iteration to
iteration
23
(No Transcript)
24
Logistic Regression An Example
Credit card Example CreditCardPromotionNet
file. LifeIns Pro is output
CreditCardIns and Sex are most influantion
attribs.
25
(No Transcript)
26
Logistic Regression
  • Classify a new instance using logistic regression
  • income35K
  • Credit card insurance1
  • Sex0
  • Age39
  • P(y1x)0.999

27
10.3 Bayes Classifier
  • Supervised classification tech, categorical
    output attrib
  • All input vars are independent, of equal
    importance
  • P(HE) likelihood of H (dependent var
    representing a predicted class)
  • P(EH) conditional probability of H is true given
    evidence E (computed from training data)
  • P(H) apriori probability, denotes probability of
    H before the presentation of evidence E (computed
    from training data)

28
Bayes Classifier An Example
Credit card promotion data set Sex is output
29
(No Transcript)
30
The Instance to be Classified
  • Magazine Promotion Yes
  • Watch Promotion Yes
  • Life Insurance Promotion No
  • Credit Card Insurance No
  • Sex ?
  • 2 hypothesis, sexfemale, sexmale

31
(No Transcript)
32
Computing The Probability For Sex Male
33
Conditional Probabilities for Sex Male
  • P(magazine promotion yes sex
    male) 4/6
  • P(watch promotion yes sex male) 2/6
  • P(life insurance promotion no sex male)
    4/6
  • P(credit card insurance no sex male) 4/6
  • P(E sex male) (4/6) (2/6) (4/6) (4/6)
    8/81

34
The Probability for SexMale Given Evidence E
  • P(sex male E) ? 0.0593 / P(E)

35
The Probability for SexFemale Given Evidence E
  • P(sex female E) ? 0.0281 / P(E)
  • P(sex male E) gt P(sex female E)
  • The instance is most likely a male credit card
    customer

36
Zero-Valued Attribute Counts
Problem with Bayes is when of the counts are 0,
to solve this problem a small constant to
numerator/dominator n/d becomes
k is 0.5 for an attrib with 2 possible
values Example P(E sex male)
(3/4)(2/4)(1/4)(3/4) 9/128 P(E sex male)
(3.5/5)(2.5/5)(1.5/5)(3.5/5) 0.0176
37
(No Transcript)
38
Missing Data
  • With Bayes classifier missing data items are
    ignored.

39
Missing Data
  • Example

40
Numeric Data
41
Numeric Data
Probability Density Function, (attribute values
are assumed to be normally distributed)
  • where
  • e the exponential function
  • m the class mean for the given numerical
    attribute
  • s the class standard deviation for the
    attribute
  • x the attribute value

42
Numeric Data
  • Magazine Promotion Yes
  • Watch Promotion Yes
  • Life Insurance Promotion No
  • Credit Card Insurance No
  • Age 45
  • Sex ?
  • P(Esexmale) . P(age45sexmale)
  • s 7.69 ? 37, x45
  • P(age45sexmale) 1/(.) 0.03
  • P(sexmaleE) 0.0018/P(E)
  • P(sexfemaleE) 0.0016/P(E)
  • Instance belong to male

43
10.4 Clustering Algorithms
44
Agglomerative Clustering
  • Place each instance into a separate partition.
  • Until all instances are part of a single cluster
  • a. Determine the two most similar clusters.
  • b. Merge the clusters chosen into a single
    cluster.
  • 3. Choose a clustering formed by one of the step
    2 iterations as a final result.

45
Agglomerative Clustering An Example
46
(No Transcript)
47
(No Transcript)
48
Agglomerative Clustering
  • Final step of the Algorithm is to choose final
    clustering among all. (Requires heuristics)
  • Use similarity measure for creating clusters,
    compare average within-cluster similarity with
    overall similarity of all instances in dataset
    (domain similarity)
  • This technique can be best used to eliminate
    clusterings rather than to choose a final result

49
Agglomerative Clustering
  • Final step of the Algorithm is to choose final
    clustering among all. (Requires heuristics)
  • Use within-cluster similarity measure and
    within-cluster similarities of pairwise-combined
    clusters in the cluster set. Look for the highest
    similarity
  • This technique can be best used to eliminate
    clusters rather than to choose a final result

50
Agglomerative Clustering
  • Final step of the Algorithm is to choose final
    clustering among all. (Requires heuristics)
  • Use previous 2 techniques to eliminate some of
    the clusterings
  • Feed each remaining clustering to a rule
    generator
  • The clustering with best defining rules is
    chosen.
  • (4th tech) Bayesian Information Criterion

51
Conceptual Clustering
  • Create a cluster with the first instance as its
    only member.
  • For each remaining instance, take one of two
    actions at each tree level.
  • a. Place the new instance into an existing
    cluster.
  • b. Create a new concept cluster having the new
    instance as its only member.

52
Data for Conceptual Clustering
53
(No Transcript)
54
Expectation Maximization
  • Guess initial values for the five parameters.
  • Until a termination criterion is achieved
  • a. Use the probability density function for
    normal distributions to compute the cluster
    probability for each instance.
  • b. Use the probability scores assigned to each
    instance in step 2(a) to re-estimate the
    parameters.

55
The EM Algorithm An Example
56
(No Transcript)
57
10.5 Heuristics or Statistics?
58
Query and Visualization Techniques
  • Query tools
  • OLAP tools
  • Visualization tools

59
Machine Learning and Statistical Techniques
  1. Statistical techniques typically assume an
    underlying distribution for the data whereas
    machine learning techniques do not.
  2. Machine learning techniques tend to have a human
    flavor.
  3. Machine learning techniques are better able to
    deal with missing and noisy data.
  4. Most machine learning techniques are able to
    explain their behavior.
  5. Statistical techniques tend to perform poorly
    with large-sized data.

60
Specialized Techniques
  • Chapter 11

61
11.1 Time-Series Analysis
  • Time-series Problems Prediction applications
    with one or more time-dependent attributes.

62
An Example with Linear Regression
  • The Stock Index Dataset

63
(No Transcript)
64
Linear Regression Equations for the Stock Index
Dataset
65
(No Transcript)
66
A Neural Network Example
67
(No Transcript)
68
Categorical Attribute Prediction
69
(No Transcript)
70
General Considerations
  • Test and modify created models as new data
    becomes available.
  • Try one or more data transformations if less
    than optimal results are obtained.
  • Exercise caution when predicting future outcome
    with training data having several predicted
    fields.
  • Try a nonlinear model if a linear model offers
    poor results.
  • Use unsupervised clustering to determine if
    input attribute values allow the output
    attribute to cluster into meaningful categories.

71
11.2 Mining the Web
72
Web-Based Mining General Issues
  • Clickstreams
  • Extended Common Log File Format
  • Session Files
  • User Sessions
  • Pageviews
  • Cookies

73
(No Transcript)
74
Data Mining for Web Site Evaluation
  • Sequence miners are special data mining programs
    able to discover frequently accessed Web pages
    that occur in the same order.

75
Data Mining for Personalization
76
(No Transcript)
77
(No Transcript)
78
Data Mining for Web Site Adaptation
  • The index synthesis problem Given a Web site and
    a visitor access log, create new index pages
    containing collections of links to related but
    currently unlinked pages.

79
11.3 Mining Textual Data
  • Train Create an attribute dictionary.
  • Filter Remove common words.
  • Classify Classify new documents.

80
11.4 Improving Performance
  • Bagging
  • Boosting
  • Instance Typicality

81
(No Transcript)
82
(No Transcript)
83
Part IV
  • Intelligent Systems

84
Rule-Based Systems
  • Chapter 12

85
12.1 Exploring Artificial Intelligence
86
(No Transcript)
87
Nearest Neighbor Heuristic
  • When conducting a state-space search, always move
    to the next closest state.

88
(No Transcript)
89
(No Transcript)
90
The Water Jug Problem
91
(No Transcript)
92
(No Transcript)
93
Depth-First SearchA-B-E-F-C-G-I-J-H-D
94
Breadth-First SearchA-B-C-D-E-F-G-H-I-J
95
(No Transcript)
96
(No Transcript)
97
(No Transcript)
98
Backward Chaining
  • Creating a Goal Tree

99
(No Transcript)
100
Expert Systems
101
(No Transcript)
102
Developing an Expert System
103
(No Transcript)
104
Structuring A Rule-Based System
  • Form 1040 Tax Dependency

105
(No Transcript)
106
(No Transcript)
107
Choosing a Data Mining Technique
108
(No Transcript)
109
Managing Uncertaintyin Rule-Based Systems
  • Chapter 13

110
13.1 Uncertainty Sources and Solutions
111
Sources of Uncertainty
  • Rule 1Large Package Rule
  • IF package size is large
  • THEN send package UPS

112
Sources of Uncertainty
  • Rule Antecedent
  • Rule Confidence
  • Combining Uncertain Information

113
General Methods for Dealing with Uncertainty
  • Probability-Based Methods
  • Heuristic Methods

114
Probability-Based Methods
  • Objective Probability
  • Experimental Probability
  • Subjective Probability

115
Heuristic Methods
  • Certainty Factors
  • Fuzzy Logic

116
13.2 Fuzzy Rule-Based Systems
117
Fuzzy Sets
  • A set associated with a linguistic value that
    gives the degree of membership for a numerical
    value.

118
(No Transcript)
119
Fuzzy Reasoning An Example
  1. Fuzzification
  2. Rule Inference
  3. Rule Composition
  4. Defuzzification

120
(No Transcript)
121
(No Transcript)
122
(No Transcript)
123
13.3 A Probability-Based Approach to Uncertainty
124
Bayes Theorem
125
(No Transcript)
126
Multiple Evidence with Bayes Theorem
127
Multiple Evidence with Bayes Theorem
128
Likelihood RatiosNecessity and Sufficiency
129
General Considerations
  • P(HE) P(HE) must sum to 1.
  • Conditional independence between multiple
    pieces of evidence must be assumed.
  • Prior Probabilities are often unobtainable.
  • Large amounts of data must be gathered to obtain
    reasonable estimates for conditional
    probabilities.

130
Intelligent Agents
  • Chapter 14

131
14.1 Characteristics of Intelligent Agents
  • Situatedness
  • Autonomy
  • Adaptivity
  • Sociability

132
14.2 Types of Agents
  • Anticipatory agents
  • Filtering agents
  • Semiautonomous agents
  • Find-and-retrieve agents
  • User agents
  • Monitor and Surveillance agents
  • Data Mining agents
  • Proactive agents
  • Cooperative agents

133
14.3 Integrating Data Mining, Expert Systems and
Intelligent Agents
134
(No Transcript)
135
The iDA Software
  • Appendix A

136
Datasets for Data Mining
  • Appendix B

137
Decision Tree Attribute Selection
  • Appendix C

138
Computing Gain Ratio
139
Computing Gain(A)
140
Computing Info(I)
141
Computing Info(I,A)
142
Computing Split Info(A)
143
(No Transcript)
144
(No Transcript)
145
Statistics for Performance Evaluation
  • Appendix D

146
D.1 Single-Valued Summary Statistics
147
Computing the Mean
where µ is the mean value n is the number of data
items xi is the ith data item
148
Computing the Variance

where s2 is the variance µ is the population
mean n is the number of data items xi is the ith
data item
149
D.2 The Normal Distribution
150
The Normal Curve
where f(x) is the height of the curve
corresponding to values of x e is the base of
natural logarithms approximated by 2.718282 m is
the arithmetic mean for the data s is the
standard deviation
151
D.3 Comparing Supervised Learner Models
  • Comparing Models with Independent Test Data
  • Pairwise Comparison with a Single Test Set

152
Comparing Models with Independent Test Data
Two independent test sets, set A containing n1
elements and set B with n elements Error rate E1
and variance v1 for model M1 on test set A Error
rate E2 and variance v2 for model M2 on test set B
153
Pairwise Comparison with a Single Test Set
154
Computing Joint Variance for a Single Test Set
155
Pairwise Comparison with a Single Test Set
156
D.4 Confidence Intervals for Numeric Output
157
D.5 Comparing Models with Numeric Output
  • Independent Test Sets
  • Pairwise Comparison with a Single Test Set
  • Overall Comparison with a Single Test Set

158
Comparing Models with Independent Test Sets
where mae1 is the mean absolute error for model
M1 mae2 is the mean absolute error for model
M2 v1 and v2 are variance scores associated with
M1 and M2 n1 and n2 are the number of instances
within each respective test set
159
Pairwise Comparison with a Single Test Set
where mae1 is the mean absolute error for
model M1 mae2 is the mean absolute error for
model M2 V12 is the joint variance computed
with the formula defined in Equation D.5 n is
the number of test set instances
160
Overall Comparison with a Single Test Set
where maej is the mean absolute error for model
j ei is the absolute value of the computed value
minus the actual value for instance i n is the
number of test set instances
161
Overall Comparison with a Single Test Set
where ? is either the average or the larger of
the variance scores for each model n is the
total number of test set instances
162
Excel Pivot Tables Office 97
  • Appendix E

163
(No Transcript)
164
(No Transcript)
165
(No Transcript)
166
(No Transcript)
167
(No Transcript)
168
(No Transcript)
169
(No Transcript)
170
(No Transcript)
171
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com