Title: Learning Classifier Systems
1Learning Classifier Systems
- Navigating the fitness landscape?
- Why use evolutionary computation?
- Whats the concept of LCS?
- Early pioneers
- Competitive vs Grouped Classifiers
- Beware the Swampy bits!
- Niching
- Selection for mating and effecting
- Balance exploration with exploitation.
- Balance the pressures
- Zeroth level classifier system
- The X-factor
- Alphabet soup
- New Slants - piecewise linear approximators
- Why don't LCS rule the world?
- Simplification schemes
- Cognitive Classifiers
- Neuroscience Inspirations
- Application Domains
2Representation
- Genetic information can be any symbol.
- Require dictionary and rules to manipulate the
symbols. - Some symbols make the search space easier to
explore/exploit - Some symbols are easier to store, manipulate and
test - Different problems may be suited to different
symbol sets
3States
- Environmental state is passed to the LCS via the
message - Often the environmental state is preprocessed to
create the message - Values are normalised
- Out of bound data removed
- Known irrelevant conditions removed
- Missing values addressed
4Multiple Representations
- Binary, ternary, Grey or enumerated.
- Integer, real, floatingpoint or mantisssa
- Rank, order, series, histogram or array
- Bounded, ellipsoidal
- Horn clauses and second-order logic
- S-type expressions, Gene expressions
- Hybrid fuzzy sets, neural networks
- Piecewise linear approximation
- Problem specific
5Encoding
- Binary (including gray) is a very crude method to
solve real valued problems. - Simply, divide the range of interest by the
number of intervals encoded by the binary
representation and determine the real number that
the binary interval represents. e.g. 0000
represents 0.0, 1111 represents 6.23 and 0001
represents 6.23/15. - The more bits, the more accuracy in your
solution, but the limitations of this encoding
are obvious.
6Schema
7Non-optimum Niching
- rule x lt 4 ? 0 00
- rule x lt 3 ? 0 00 0111
- rule x lt 5 ? 0 ???
8Binary enumeration for Niching
- Can use enumeration
- 0 ? 0000000
- 1 ? 0000001
- 2 ? 0000011
- 3 ? 0000111
- 4 ? 0001111
- 5 ? 0011111
- 6 ? 0111111
- 7 ? 1111111
- rule x lt 4 ? 0 00
- rule x lt 3 ? 0 00
- rule x lt 5 ? 0 00
- Useful in discretised environments
- Trades increased search space less compactness
for better Niching
9Integer and Real Encodings
- Match encoding to the environmental message
using upper and lower bounds - could use centre and spread, but this assumes a
gaussian distribution and recombination more
difficult to implement - For each allele a, lb x ub, to give match.
- Could use lt instead of but lcs determines
the correct bound automatically - 0 x 5 is equivalent to 0 x lt 5.01 or
- 0 x 4.99 is equivalent to 0 x lt 5
10Mutating at the Limits
- Crossover point can either be between alleles or
in the middle of an allele. - Mutation increases/decreases either or both of
the two bounds. - Repair is occasionally needed to check that
lower bound lt upper bound - Note that most bounds have a limit,
- e.g. WBC 0 a 10
- We probabilistically decide to mutate the general
allele 0 x 10 lower bound - We decrease by 10 of range to -1
- we then repair back to 0.
- We increase by 10 of range to 1
- we do not repair it as valid!
- Thus some alphabets have a specificity bias
11Hyper Partitioning
- We have a sparse search space with only two
classes to identify 0 1 - Its real numbered so we decide to use bounds
e.g. 0 x 10 - We form Hypercubes with the number of dimensions
the number of conditions - Approximates actual niches, maybe problems
. 1 . 0 N(x) S
. 1 . 0 N(x) S
12Oblique domains
- We have a search space with only two classes to
identify 0 1 - Its real numbered so we decide to use bounds
e.g. 0 x 10 - We form Hypercubes / Hyperrectangles are not
often suited to oblique domains - Imagine sine wave domains..
. 1 . 0 S
13Hyper-ellipsoidal
- The general ellipsoid, also called a triaxial
ellipsoid, is a quadratic surface which is given
in Cartesian coordinates by -
- where the semi-axes are of lengths a, b, and c.
Wolfram maths - N-dimensional ellipsoids can be used to more
effectively represent oblique domains - Implementation and analysis becomes harder
- Kernel-based, ellipsoidal conditions in the
real-valued XCS classifier system Butz, M.V.
Proceedings of the Genetic and Evolutionary
Computation Conference (GECCO-2005) pp.
1835-1842. - Hyper-ellipsoidal conditions in XCS Rotation,
linear approximation, and solution structure
Butz, M. V., Lanzi, Pier-Luca, Wilson, S. W.
Proceedings of the Genetic and Evolutionary
Computation Conference (GECCO-2006) pp. 1457-1464
14Horn clause logic
- A Horn clause is a clause with at most one
positive literal. - A rule 1 positive literal, at least 1 negative
literal. A rule has the form "P1 V P2 V ... V
Pk V Q". This is logically equivalent to
"P1P2 ... Pk gt Q" thus, an if-then
implication with any number of conditions but one
conclusion. Examples "man(X) V mortal(X)" (All
men are mortal) - A fact or unit 1 positive literal, 0 negative
literals. Examples "man(socrates)", (Everyone is
an ancestor of themselves (in the trivial
sense).) - A negated goal 0 positive literals, at least 1
negative literal. In virtually all
implementations of Horn clause logic, the negated
goal is the negation of the statement to be
proved the knowledge base consists entirely of
facts and goals. The statement to be proven,
therefore, called the goal, is therefore a single
unit or the conjuction of units an existentially
quantified variable in the goal turns into a free
variable in the negated goal. E.g. If the goal to
be proven is "exists (X) male(X)
ancestor(elizabeth,X)" (show that there exists a
male descendent of Elizabeth) the negated goal
will be "male(X) V ancestor(elizabeth,X)". - The null clause 0 positive and 0 negative
literals. Appears only as the end of a resolution
proof.
15Horn clause logic
- A Horn clause is a clause with at most one
positive literal. - a V b V c V V t V u (only u positive)
or - (a b c t) ? u (equivalent
implication) - A definite clause is a Horn clause that has
exactly one positive literal. - A Horn clause without a positive literal is
called a goal. - Horn clauses express a subset of statements of
first-order logic. - Prolog is built on top of Horn clauses.
- Prolog programs are comprised of definite clauses
and any question in Prolog is a goal. - Strict set of operations and useful scaling
properties. - FOIL Produces Horn clauses from data expressed
as relations - Quinlan, J.R. (1990), "Learning Logical
Definitions from Relations", Machine Learning 5,
239-266. - http//www.cs.cmu.edu/afs/cs/project/ai-repository
/ai/areas/learning /systems/0.html
16Fuzzy
- Use of Fuzzy sets to encode membership of message
to classifier - LCS encodes membership function of each allele
- Casillas, J. Carse, B. Bull, L. (2007) Fuzzy
XCS a Michigan Genetic Fuzzy System. IEEE
Transactions on Fuzzy Systems 15(4) 536-550.
Membership 1 COLD
WARM HOT Temperature
17Neural
- Use of Neural Networks to encode membership of
message to classifier - LCS encodes one NN of each allele.
- Number of rules and composition of rules learnt.
- Three optimised NNs avoid, seek and orientate
- Jacob Hurst, Matt Studley UWE
18S-Expressions
Lisp like expressions For people who speak in
brackets
( (- sqrt(- ( b b) ( ( 2 2) ( a c)))) b) (
2 a))
Genetic Programming inspired (see work by P-L
Lanzi) Logical functions AND, OR
,NOT Terminals 0, 1, , NULL Exponential/Logs
e, ln, log, Hyperbolic Sine, Cos,
Tan Mathematical SQRT, POW Tailored Value
at, Address of, ... Need to tailor match and
reproduction NB is ternary alphabet a subset?
http//www.genetic-programming.com/
19S-Expressions
Lisp like expressions For people who speak in
brackets
( (- sqrt(- ( b b) ( ( 2 2) ( a c)))) b) (
2 a))
Bloat? Is a LCS with S-expressions not just
GP? How to tailor functions without introducing
bias? How to identify building blocks of
Subexpressions? When are two Subexpressions
equivalent? Is trade-off between reduced problem
search space to increased alphabet search space
worth it?
http//www.genetic-programming.com/