Organizing Open Online Computational Problem Solving Competitions

About This Presentation

Title:

Organizing Open Online Computational Problem Solving Competitions

Description:

Match-Level Neutrality Dominated by heuristic approaches Compensation ... of stress levels that a stress testing plan can ... sp satisfies the binary search tree ... – PowerPoint PPT presentation

Number of Views:76

Avg rating:3.0/5.0

Slides: 43

Provided by: KarlL150

Learn more at: http://www.ccs.neu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Organizing Open Online Computational Problem Solving Competitions

1
Organizing Open Online Computational Problem
Solving Competitions

By Ahmed Abdelmeged

In 2011, researchers from the Harvard Catalyst
Project were investigating the potential of
crowdsourcing genome-sequencing algorithms.

So, they collected few million sequencing
problems and developed an electronic judge that
evaluates sequencing algorithms by how well they
solve these problems.

And, they set up a two-week open online
competition on TopCoder with a total prize pocket
of 6000.

The results were astounding!

6
-- Nature Biotechnology, 31(2)pp. 108111, 2013.

... A two-week online contest ... produced over
600 submissions ... . Thirty submissions exceeded
the benchmark performance of the US National
Institutes of Healths MegaBLAST. The best
achieved both greater accuracy and speed (1,000
times greater).

We want to lower the barrier to entry for
establishing such competitions by having
meaningful competitions where participants
assist the admin in evaluating their peers.

8
Thesis Statement

Semantic games of interpreted logic sentences
provide a useful foundation to organize
computational problem solving communities.

9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
Open online competitions have been quite
successful in organizing computational problem
solving communities.
15
... A two-week online contest ... produced over
600 submissions ... . Thirty submissions exceeded
the benchmark performance of the US National
Institutes of Healths MegaBLAST. The best
achieved both greater accuracy and speed (1,000
times greater).
-- Nature Biotechnology, 31(2)pp. 108111, 2013.
16
Lets take a closer look at

state-of-the-art approaches to organize an open
online competition for solving computational
problems.
MAX-SAT as a sample problem.

17
MAXimum SATisfiability (MAX-SAT) problem

Input a boolean formula in the Conjunctive
Normal Form (CNF).
Output an assignment satisfying the maximum
number of clauses.

18
The Omniscient Admin Approach

A trusted admin prepares a thorough benchmark
of MAX-SAT problem instances together with their
correct solutions.
This benchmark is used to evaluate individual
MAX-SAT algorithms submitted by participants.

19
The Teaching Admin Approach

Admin prepares a thorough benchmark of MAX-SAT
problems and their model solutions.
Benchmark used to evaluate individual MAX-SAT
algorithms submitted by participants.

20
Cons

Overhead to collect and solve problems.
What if, admin incorrectly solves some problems?

21
The Open Benchmark Approach

Admin maintains an open benchmark of problems and
their solutions.
Participants may object to any of the solutions
before the competition starts.

22
Cons

Over-fitting Participants may tailor their
algorithms for the benchmark.

23
The Learning Admin Approach

An admin prepares a set of MAX-SAT problems and
keeps track of the best solution produced by one
of the algorithms submitted by participants.
Pioneered by the FoldIt team.

24
Cons

Works for optimization problems. Not clear how to
apply to other computational problems. TQBF for
example.

25
Wouldnt it be great if

we had a sports-like OOCs where admin referees
the competition with minimal overhead?

26
However,
27
Research Question

How to organize a meaningful open online
computational problem solving competition where
participants assist in the evaluation of their
opponents?

28
Research Question

How to organize a meaningful, sports-like, open
online computational problem solving competition
where the admin only referees the competition
with minimal overhead?

29
Simpler Version

meaningful, Two-Party Competitions.
Admin provides neither benchmark problems nor
their solutions.

30
Attempt I

Each participant prepares a benchmark of problems
and solve their opponents benchmark problems.
Admin checks solutions.
Checking the correctness of a MAX-SAT problem
solution can be an overhead to the admin.

31
Attempt II

Each participant prepares a benchmark of problems
and their solutions.
Each participant solves their opponents
problems.
Admin compares both solutions for each problem
to determine the winner.
Admin has to correctly compare solutions.
Admin cannot assume any of the solutions to be
correct.

32
Attempt II

Each participant prepares a benchmark of problems
and their model solutions.
Each participant solves their opponents
problems.
Admin only compares solutions to model
solutions.

33
But,

Participants are incentivized to provide the
wrong model solution.
Admin should compare solutions without trusting
any of them.

34
Thesis

Semantic games of interpreted logic sentences
provide a useful foundation to organize
computational problem solving communities.

35
Semantic Games

A Semantic Game (SG) is a constructive debate of
the correctness of an interpreted logic sentence
(a.k.a claim) between two distinguished parties
the verifier which asserts that the claim holds,
and the falsifier which asserts that the claim
does not hold.

36
A Two-Party, SG-Based MAX-SAT Competition (I)
?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f)fsat(v,f)

Participants develop functions to
Provide side preference.
Provide values for quantified variables based on
values of variables in scope.

37
A Two-Party, SG-Based MAX-SAT Competition (II)
?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f)fsat(v,f)

Admin chooses sides for players based on their
side preference.
Let Pv be the verifier and Pf be the falsifier.

38
A Two-Party, SG-Based MAX-SAT Competition (III)
?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f)fsat(v,f)

Admin gets value provided by Pf for f.
Admin checks f ? CNFs. If false, Pf loses.
Admin gets value provided by Pv for v.
Admin checks v ? assignments(f). If false, Pv
loses.

39
A Two-Party, SG-Based MAX-SAT Competition (IV)
?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f)fsat(v,f)

Admin gets value provided by Pf for f.
Admin checks f ? assignments(f). If false, Pf
loses.
Admin evaluates fsat(f,f)fsat(v,f). If true Pv
wins, otherwise Pf wins.

40
Rationale (I)
?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f)fsat(v,f)

Controllable admin overhead.

?f ? CNFs ?v ? assignments(f). satisfies-max(v,f)
41
Rationale (II)

Correct there is a winning strategy for
verifiers of true claims and falsifiers of false
claims. Regardless of the opponents actions.

42
Rationale (III)

Objective.
Systematic.
Learning chances.

43
SG-Based Two-Party Competitions

We let participants debate the correctness of an
interpreted predicate logic sentence specifying
the computational problem of interest, assuming
that participants choose to take opposite sides.

44
Out-of-The-Box, SG-Based, Two-Party MAX-SAT
Competition

?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f) fsat(v,f)
1. Falsifier provides a CNF formula f.
2. Verifier provides an assignment v.
3. Falsifier provides an assignment f.
4. Admin evaluates fsat(f,f) fsat(v,f). If
true, verifier wins. Otherwise, falsifier wins.

Pros and cons of
out-of-the-box, SG-based, two-Party competitions
solve our
meaningful, Two-Party Competitions.
Admin provides neither benchmark problems nor
their solutions.

46
Pro (I) Systematic

The rules of an SG are systematically derived
from the syntax of its underlying claim.
SGs are also defined for other logics.

47
Rules of SG(?f, A?, v, f)
f Move Next Game
?x ?(x) f provides x0 SG(??x0/x, A?, v, f)
? ? ? f chooses ???, ? SG(??, A?, v, f)
?x ?(x) v provides x0 SG(??x0/x, A?, v, f)
? ? ? v chooses ???, ? SG(??, A?, v, f)
? N/A SG(??, A?, f, v)
P(t0) v wins if p(t0) holds, o/w f wins v wins if p(t0) holds, o/w f wins
The Game of Language Studies in
Game-Theoretical Semantics and Its Applications
-- Kulas and Hintikka, 1983
48
Pro (II) Objective

Competition result is based on skills that are
precisely defined in the competition definition.

49
Pro (III) Correct

Competition result is based on demonstrated
possession (or lack of) skill.
Incorrectly solved problems by admin or opponent
cannot worsen participants rank.
There is a winning strategy for verifiers of true
claims and falsifiers of false claims. Regardless
of the opponents actions.

50
Pro (III) Correct

There is a winning strategy for verifiers of true
claims and falsifiers of false claims, regardless
of the opponents actions.

51
Pro (IV) Controllable Admin Overhead

Admin overhead is to implement the structure
interpreting the logic statement specifying a
computational problem.
It is always possible to scrap functionality out
of the interpreting structure at the cost of
adding complexity to the logic statement.

52
Pro (V) Learning

Losers can learn from SG traces.

53
Pro (VI) Automatable

Participants can codify their strategies for
playing SGs.
Efficient and thorough evaluation.
Codified strategies are useful bi-products.
Controlled information flow.

54
Challenges (I)

Participants must take opposing sides!
Neutrality is lost with forcing.

55
Con (II) Not Thorough

Unlike sports-games, a single game is not
thorough enough.

56
Con (III) Issues Scaling to N-Party Competitions

In sports, tournaments are used to scale
two-party games to n-party competitions.

57
Challenges (II)

Scaling to N-Party Competition using a
tournament, yet
Avoid Collusion Potential especially in the
context of open online competitions where Sybil
identities are common and games are too fast to
spectate!
Ensure that participants get the same chance.

58
Challenges (II)

Scaling to N-Party Competition using a
tournament, yet

59
Issue (II) Neutrality

Do participants get the same chance?
We have to force sides on participants.
We may have vastly different number of verifiers
and falsifiers.

60
Issue (II) Correctness and Neutrality

We have to force sides on participants.
Yet, we cannot penalize forced losers for
competition correctness.
Weve to ensure that all participants get the
same chance even though we may have vastly
different number of verifiers and falsifiers.

61
Contributions

Computational Problem Solving Labs (CPSLs).
Simplified Semantic Games (SSGs).
Provably Collusion-Resistant SSG-Tournament
Design.

62
Computational Problem Solving Labs (CPSLs)
63
CPSLs

A structured interaction space centered around a
claim.
Community members contribute by submitting their
strategies for playing an SSG of the labs claim.
Submitted strategies, compete in a provably
collusion resistant tournament of simplified
semantic games.

Control, Thorough and Efficient Evaluation.

65
Codified Strategies

Efficient and thorough evaluation.
Useful by-products.
Controlled information flow.

66
CPSLs (II)

A structured interaction space centered around a
claim.
Community members contribute by submitting their
strategies for playing an SSG of the labs claim.
Once a new strategy is submitted in a CPSL, it
competes against the strategies submitted by
other members in a provably collusion resistant
tournament of simplified semantic games.

67
Highest Safe Rung Problem

The Highest Safe Rung (HSR) problem is to find
the largest number (n) of stress levels that a
stress testing plan can examine using (q) tests
and (k) copies of the product under test.
k 1, n q, linear search
k gt q, n 2q, binary search
1 lt k lt q, n ?, ?

1
2
k
...
n
...
2
1 (safe)
68
Computational Problem Solving Lab - Highest Safe
Rung
Welcome ....
Highest Safe Rung Admin Page
Log out
1
Description
The Highest Safe Rung (HSR) problem is to find
the largest number of stress levels that a stress
testing plan can examine using (q) tests and (k)
copies of the product under test.
2
Claim
... "HSR() forall Integer q forall Integer k
...
3
Game Traces

Publish
Hide
4
Standings
Save
69
Computational Problem Solving Lab - Highest Safe
Rung
Highest Safe Rung
Welcome Sc1
Log out
1
Description
6
Standings
The Highest Safe Rung (HSR) problem is to find
the largest number of stress levels that a stress
testing plan can examine using (q) tests and (k)
copies of the product under test.
Rank Member Latest contribution of faults Chosen side
1 ... ... verifier
2 ... ... verifier
3 ... ... ...
20 ... ... ...
22 ... ... ...
21 ... ... ...
22 Sc1 1/1/2014 ...
23 ... ... ...
24 ... ... ...
25 ... ... falsifier
2
Download claim specification
3
Download strategy skeleton
4
Download traces of past games
Upload new Strategy
5
See all
70
Claim Specification
71
Simplified Semantic Games
72
SG Rules
73
SSGs

Simpler use auxiliary games to replace moves
for conjunctions and disjunctions.
Thoroughness potential participants can provide
several values for quantified variables.

74
SSG Rules
75
HSR Claim Specification
class HSRClaim public static final String
FORMULAS new String HSR() forall
Integer q forall Integer k exists Integer n
HSRnqk(n, k, q) and ! exists Integer m greater
(m, n) and HSRnqk(m, q, k) HSRnqk(Integer n,
Integer q, Integer k) exists SearchPlan sp
correct(sp, n, q, k) public static boolean
greater(Integer n, Integer m) return n gt m
public static interface SearchPlan public
static class ConclusionNode implements
SearchPlan Integer hsr public static class
TestNode implements SearchPlan Integer testRung
SearchPlan yes // What to do when the jar
breaks. SearchPlan no // What to do when the
jar does not break . public static boolean
correct(SearchPlan sp, Integer n, Integer q,
Integer k) // sp satisfies the binary search
tree property , has n leaves , of depth at most
q, all root-to-leaf paths have at most k yes
branches . ...
76
Strategy Specification

One function per quantified variable.

77
HSR Strategy Skeleton
class HSRStrategy public static
IterableltIntegergt HSR_q() ... public static
IterableltIntegergt HSR_k(Integer q) ... public
static IterableltIntegergt HSR_n(Integer q, Integer
k) ... public static IterableltIntegergt
HSR_m(Integer q, Integer k, Integer
n) ... public static IterableltSearchPlangt
HSRnqk_sp(Integer n, Integer q, Integer
k) ...
78
(No Transcript)
79
Semantic Game Tournaments
80
Tournament Design

Scheduler
Neutral.
Ranking function
Correct and anonymous.
Can mask scheduler deficiencies.

81
Ranking Functions

Input beating function representing output of
several games.
Output a total preorder of participants.

82
Beating Functions (of SG Tournaments)

bP(pw, pl, swc, slc, sw) sum of all gains of pw
against pl while pw choosing side swc , pl
choosing side slc and pw taking side sw.
More complex.

83
Ranking Functions (Correctness)

Non-Negative Regard for Wins.
Non-Positive Regard for Losses.

84
Non-Negative Regard For Wins (NNRW)
Px
Additional wins cannot worsen Pxs rank w.r.t.
other participants.
Wins
Faults
85
Non-Positive Regard For Losses (NPRL)
Implies
Px
Additional faults cannot improve Pxs rank
w.r.t. other participants.
Wins
Faults
86
Ranking Functions (Anonymity)

Output ranking is independent of participant
identities.
Ranking function ignores participants
identities.
Participants also ignore their opponents
identities.

87
Limited Collusion Effect

Slightly weaker notion than anonymity.
What you want in practice.
A participant Py can choose to lose on purpose
against another participant Px, but that wont
make Px get ahead of any other participant Pz.

88
Limited Collusion Effect (LCE)
Px
Games outside Pxs control cannot worsen Pxs
rank w.r.t. other participants.
Wins
Faults
89
Discovery

A useful design principle for ranking functions.
Under NNRW, NPRL LCE LFB
LFB is quite unusual to have.
LFB lends itself to implementation.

90
Locally Fault Based (LFB)
Relative rank of Px and Py depends only on
faults made by either Px or Py.
Px
Py
Wins
Faults
Faults
Wins
91
Locally Fault Based (LFB)
Relative rank of Px and Py can depends only on
games faults made by either Px or Py.
Px
Py
Wins
Faults
Faults
Wins
92
Locally Fault Based (LFB)
93
Collusion Resistant Ranking Functions
94
Beating Functions

Represent outcome of a set of SSGs
bP(pw, pl, swc, slc, sw) sum of all gains of pw
against pl while pw choosing side swc , pl
choosing side slc and pw taking side swc.

95
Beating Functions (Operations)

bPwpx games px wins.
bPlpx games px loses.
bPflpx games px loses while not forced.
bPcpx bPwpx bPflpx games px controls.
Can add them, bP0 is the identity element.

96
Ranking Functions

Take a beating function to a ranking
Ranking a total pre-order.

97
Limited Collusion Effect

There is no way pys rank can be improved w.r.t.
pxs rank behind px back.

98
Non-Negative Regard for Wins

An extra win cannot worsen pxs rank.

99
Non-Positive Regard for Losses

An extra loss cannot improve pxs rank.

100
Local Fault Based

Relative rank of px w.r.t. py only depends on
faults made by either px or py.

101
Main Result
102
Visual Proof
103
Fault Counting Ranking Function

Players are ranked according to the number of
faults they make. The less the number of faults
the higher the rank.
Satisfies the NNRW, NPRL, LFB and LCE properties.

104
Semantic Game Tournament Design

For every pair of players
If choosing different sides, play a single SG.
If choosing same sides, play two SGs where they
switch sides.

105
Tournament Properties

Our tournament is neutral.

106
Neutrality

Each player plays nv nf - 1 SGs in their chosen
side, those are the only games it may make faults.

107
Related Work

Rating and Ranking Functions
Tournament Scheduling
Match-Level Neutrality

108
Rating and Ranking Functions (I)

Dominated by heuristic approaches
Elo ratings.
Whos 1?
There are axiomatization of rating functions in
the field of Paired Comparison Analysis.
LCE not on radar.
Independence of Irrelevant Matches (IIM) is
frowned upon.

109
Rating and Ranking Functions (II)

Rubinstein1980
points system (winner gets a point) characterized
as
Anonymity ranks are independent of the names of
participants.
Positive responsiveness to the winning relation
which means that changing the results of a
participant p from a loss to a win, guarantees
that ps rank would improve.
IIM relative ranking of two participants is
independent of matches in which neither is
involved.
beating functions are restricted to complete,
asymmetric relations.

110
Tournament Scheduling

Neutrality is off radar.
Maximizing winning chances for certain players.
Delayed confrontation.

111
Match-Level Neutrality

Dominated by heuristic approaches
Compensation points.
Pie rule.

112
Conclusion

Semantic games of interpreted logic sentences
provide a useful foundation to organize
computational problem solving communities.

113
(No Transcript)
114
Future Work

Problem decomposition labs.
Social Computing.
Evaluating Thoroughness.

115
Questions?
116
Thank You!
117
(No Transcript)
118
N-Party SG-Based Competitions

A tournament of two-party SG-based competitions

119
N-Party SG-Based Competitions Challenges (I)

Collusion potential especially in the context of
open online competitions.

120
N-Party SG-Based Competitions Challenges (II)

Neutrality.
Two-party SG-Based competitions are not neutral
when one party is forced.

121
(No Transcript)
122
Rationale (4) Anonymous
123
Rationale (Objective)

While constructively debating the correctness of
an interpreted predicate logic sentence
specifying a computational problem, participants
provide and solve instances of that computational
problem.

124

?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f) fsat(v,f)

125

?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f) fsat(v,f)

126
Semantic Games
127
A meaningful competition is

Correct
Anonymous
Neutral
Objective
Thorough

128
Correctness

Rank is based on demonstrated possession (or lack
of) skill.
Suppose that we let participants create
benchmarks of MAX-SAT problems and their
solutions to evaluate their opponents.
Participants would be incentivised to provide the
wrong solutions.

129
Anonymous

Rank is independent of identities.
There is a potential for collusion among
participants. This potential arise from the
direct communication between participants. This
potential is aggravated by the open online nature
of competitions.

130
Neutral

The competition does not give an advantage to any
of the participants.
For example, a seeded tournament where the seed
(or the initial ranking) can affect the final
ranking is not considered neutral.

131
Objective