SAT and CSP competitions - PowerPoint PPT Presentation

About This Presentation

Title:

SAT and CSP competitions

Description:

Price of entry for the competition? ... Competitions require lots of work. Organizers get limited (academic) reward ... Competitions ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 46

Provided by: NIC53

Category:

more less

Transcript and Presenter's Notes

Title: SAT and CSP competitions

1
SAT and CSP competitions benchmark
librariessome lessons learnt?

Toby Walsh
NICTA UNSW
Sydney, Australia

2
Whats the best way to benchmark systems?
3
Outline

Benchmark libraries
Founding CSPLib.org
Competitions
SAT competition judge
TPTP competition judge

4
Why?

Why did I set up CSPLib.org
I needed problems against which to benchmark my
latest inference techniques
Zebra and random problems dont cut it!
I thought it would help unify and advance the CP
community

5
Random problems

ve
Easy to generate
Hard (if chosen from phase transition)
Impossible to cheat
You can solve 1000 variable random 3SAT problems
at l/n4.2, Ill be impressed

6
Random problems

-ve
Lack structures found in real world
Unrepresentative
E.g. random 3SAT either have many solutions or
none
Different methods work well on them
Random SAT forward looking algorithms
Industrial SAT backward looking algorithms

7
Why?

Thesis every mature field has a benchmark
library
Deduction started in 1960s
TPTP set up in 1993
SAT started in 1960s
SAT DIMACS challenge in 1992
SATLib set up in 1999
CP started in 1970s
CSPLib set up in 1998

8
Why?

Thesis every mature field has a benchmark
library
Spatial and temporal reasoning started in early
80s (or before?)
Its been approximately 30 years so its about
time you guys set one up!

9
Benchmark libraries

CSPLib.org
Over 35k unique visitors
Still not everything Id want it to be
But state of the art for experimentation is now
much better than it was
I havent seen a zebra for a very long time

10
An ideal library

Desiderata taken from
CSPLib a benchmark library for constraints,
Proc. CP-99

11
An ideal library

Location
On the web and easy to find
TPTP.org
CSPLib.org
SATLib.org
QBFLib.org
http//elib.zib.de/pub/mp-testdata/tsp/tsplib/tspl
ib.html
http//mat.gsia.cmu.edu/COLOR/instances.html

12
An ideal library

Easy to use
Tools to make benchmarking as painless as
possible
tptp2X,
Diverse
To help prevent over-fitting

13
An ideal library

Large
Growing continuously
Again helps to prevent over-fitting
Extensible
To new problems or domains

14
An ideal library

Complete
One stop for your problems
Topical
For instance, it should report current best
solutions found

15
An ideal library

Independent
Not tied to a particular solver or proprietary
input language
Mix of difficulties
Hard and easy problems
Solved and open problems
With perhaps even a difficulty index?

16
An ideal library

Accurate
It should be trusted
Used
A valued resource for the community

17
Problem format

Lo-tech or hi-tech?

18
Lo-tech formats

DIMACS format used in SATLib
c a simple example
p cnf 3 2
1 -1 0
1 2 3 0
This represents x v -x, x or y or z

19
Lo-tech formats

DIMACS format used in SATLib
ve
All programming languages can read integers!
Small amount of extensibility built in (e.g. QBF)
-ve
Larger extensions are problematic (e.g. beyond
CNF to arbitrary Boolean circuits)

20
Hi-tech formats

CP competition
ltinstancegt
ltpresentation
name"4-queens"
description"This problem involves placing
4 queens on a chessboard"
nbSolutions"at least 1"
format"XCSP1.1 (XML CSP Representation
1.1)"
/gt
ltdomains nbDomains"1"gt
ltdomain name"dom0" nbValues"4"
values"1..4" /gt
lt/domainsgt
ltvariables nbVariables"4"gt
ltvariable name"X0" domain"dom0"/gt
lt/variablesgt
ltrelations nbRelations"3"gt
ltrelation
name"rel0" domain"dom0 dom0
nbConflicts"10
conflicts"(1,1)(1,2)(2,1)(2,2)(2,3)(3,2)(
3,3)(3,4)(4,3)(4,4)" /gt

21
Hi-tech formats

XML
ve
Easy to extend
Parsing tools can be provided
-ve
Complex and verbose
Computers can parse terse structures easily

22
No-tech formats

CSPLib
Problems are specified in natural language
No agreement at that time for an input language
One focus was on how you model a problem
Today there is more consensus on modelling
languages like Zinc

23
No-tech formats

CSPLib
Problems are specified in natural language
But you can still provide in one place
Input data
Results
Code
Parsers

24
Getting problems

Submit them yourself
Initially, you must do this so library has some
critical mass first time people look at it
But it becomes tiresome and unrepresentative to
do so continually
Ask at every talk
Tried for several years but it (almost) never
worked

25
Getting problems

Need some incentive
Offer money?
Price of entry for the competition?
If you have a competition, users will submit
problems that their solver is good at?

26
Competitions
27
Libraries Competitions

You can have a library without a competition
But you cant have a competition without a library

28
Libraries Competitions

Libraries then competition
TPTP then CASC
Easy and safe!
Libraries and competition
Planning
RoboCup

29
Increasing complexity

Constraints
1st year, binary extensional
2nd year, limited number of globals
3rd year, unlimited
Planning
Increasing complexity
Time, metrics, uncertainty,

30
Benefits

Gets ideas implemented
Rewards engineering
Progress needs both science and engineering!
Puts it all together

31
Benefits

Gives greater importance to important low-level
issues
In SAT
Watched literals
VSIDS

32
Benefits

Witness the progress in SAT
1985, 10s vars
1995, 100s vars
2005, 1000s vars
Not just Moores law at play!

33
Pitfalls

Competitions require lots of work
Organizers get limited (academic) reward
One solution is to organize also competition
special issues

34
Pitfalls

Competitions encourage incremental improvements
Dont have them too often!
You may discover a local minimum
E.g. MDPs for speech recognition
Give out best new solver prize?

35
The Chaff story

Industrial problems, SAT UNSAT instances
2008, 1st MiniSAT (son of zChaff)
2007, 1st RSAT (son of MiniSAT)
2006, 1st MiniSAT
2005, 1st SatELite GTI (MiniSATpreprocessor)
2004, 1st zChaff (Forklift from 2003 was better)
2003, 1st Forklift
2002, 1st zChaff

36
Other issues

Man-power
Organizers
One is not enough?
Judges
All rules need interpretation
Compute-power
Find a friendly cluster

37
Other issues

Multiple tracks
SAT/UNSAT
Random/industrial/crafted
Certificate/Uncertificated

38
Other issues

Holding problems back if possible
Release some problems so competitors can ensure
solver compliance
But hold most back so competition is blind!

39
Other issues

Multiple phases
Too many solvers for all to compete with long
timeouts
First phase to test correctness
Second phase to throw out the slow solvers (who
cost you many timeouts)
Third phase to differentiate between better
solvers

40
Other issues

Reward function
ltcompleted, average time, gt
solution purse speed purse
Points for each problem divided between those
solvers that solve it
Getting buy in from competitors
It will (and should) evolve over time!

41
Other issues

Prizes
Give out many!
Good for peoples CVs
Good motivator for future years

42
Other issues

Open or closed source?
Open to share progress
Closed to get the best
Last years winner
Condition of entry
To see progress is being made!

43
Other issues

Smallest unsolved problem
Give a prize!
Timing
Run during the conference
Creates a buzz so people enter next year
Get a slot in program to discuss results
Get a slot in banquet to give out prizes

44
Conclusions

Benchmark libraries
When an area is several decades old, why wouldnt
you have one?
Competitions
Designed well, held not too frequently, with
buy-in from the community, why wouldnt you?

45
Questions