Title: Organizing Open Online Computational Problem Solving Competitions
1Organizing Open Online Computational Problem
Solving Competitions
2- In 2011, researchers from the Harvard Catalyst
Project were investigating the potential of
crowdsourcing genome-sequencing algorithms.
3- So, they collected few million sequencing
problems and developed an electronic judge that
evaluates sequencing algorithms by how well they
solve these problems.
4- And, they set up a two-week open online
competition on TopCoder with a total prize pocket
of 6000.
5- The results were astounding!
6-- Nature Biotechnology, 31(2)pp. 108111, 2013.
- ... A two-week online contest ... produced over
600 submissions ... . Thirty submissions exceeded
the benchmark performance of the US National
Institutes of Healths MegaBLAST. The best
achieved both greater accuracy and speed (1,000
times greater).
7- We want to lower the barrier to entry for
establishing such competitions by having
meaningful competitions where participants
assist the admin in evaluating their peers.
8Thesis Statement
- Semantic games of interpreted logic sentences
provide a useful foundation to organize
computational problem solving communities.
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14Open online competitions have been quite
successful in organizing computational problem
solving communities.
15... A two-week online contest ... produced over
600 submissions ... . Thirty submissions exceeded
the benchmark performance of the US National
Institutes of Healths MegaBLAST. The best
achieved both greater accuracy and speed (1,000
times greater).
-- Nature Biotechnology, 31(2)pp. 108111, 2013.
16Lets take a closer look at
- state-of-the-art approaches to organize an open
online competition for solving computational
problems. - MAX-SAT as a sample problem.
17MAXimum SATisfiability (MAX-SAT) problem
- Input a boolean formula in the Conjunctive
Normal Form (CNF). - Output an assignment satisfying the maximum
number of clauses.
18The Omniscient Admin Approach
- A trusted admin prepares a thorough benchmark
of MAX-SAT problem instances together with their
correct solutions. - This benchmark is used to evaluate individual
MAX-SAT algorithms submitted by participants.
19The Teaching Admin Approach
- Admin prepares a thorough benchmark of MAX-SAT
problems and their model solutions. - Benchmark used to evaluate individual MAX-SAT
algorithms submitted by participants.
20Cons
- Overhead to collect and solve problems.
- What if, admin incorrectly solves some problems?
21The Open Benchmark Approach
- Admin maintains an open benchmark of problems and
their solutions. - Participants may object to any of the solutions
before the competition starts.
22Cons
- Over-fitting Participants may tailor their
algorithms for the benchmark.
23The Learning Admin Approach
- An admin prepares a set of MAX-SAT problems and
keeps track of the best solution produced by one
of the algorithms submitted by participants. - Pioneered by the FoldIt team.
24Cons
- Works for optimization problems. Not clear how to
apply to other computational problems. TQBF for
example.
25Wouldnt it be great if
- we had a sports-like OOCs where admin referees
the competition with minimal overhead?
26However,
27Research Question
- How to organize a meaningful open online
computational problem solving competition where
participants assist in the evaluation of their
opponents?
28Research Question
- How to organize a meaningful, sports-like, open
online computational problem solving competition
where the admin only referees the competition
with minimal overhead?
29Simpler Version
- meaningful, Two-Party Competitions.
- Admin provides neither benchmark problems nor
their solutions.
30Attempt I
- Each participant prepares a benchmark of problems
and solve their opponents benchmark problems. - Admin checks solutions.
- Checking the correctness of a MAX-SAT problem
solution can be an overhead to the admin.
31Attempt II
- Each participant prepares a benchmark of problems
and their solutions. - Each participant solves their opponents
problems. - Admin compares both solutions for each problem
to determine the winner. - Admin has to correctly compare solutions.
- Admin cannot assume any of the solutions to be
correct.
32Attempt II
- Each participant prepares a benchmark of problems
and their model solutions. - Each participant solves their opponents
problems. - Admin only compares solutions to model
solutions.
33But,
- Participants are incentivized to provide the
wrong model solution. - Admin should compare solutions without trusting
any of them.
34Thesis
- Semantic games of interpreted logic sentences
provide a useful foundation to organize
computational problem solving communities.
35Semantic Games
- A Semantic Game (SG) is a constructive debate of
the correctness of an interpreted logic sentence
(a.k.a claim) between two distinguished parties
the verifier which asserts that the claim holds,
and the falsifier which asserts that the claim
does not hold.
36A Two-Party, SG-Based MAX-SAT Competition (I)
?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f)fsat(v,f)
- Participants develop functions to
- Provide side preference.
- Provide values for quantified variables based on
values of variables in scope.
37A Two-Party, SG-Based MAX-SAT Competition (II)
?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f)fsat(v,f)
- Admin chooses sides for players based on their
side preference. - Let Pv be the verifier and Pf be the falsifier.
38A Two-Party, SG-Based MAX-SAT Competition (III)
?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f)fsat(v,f)
- Admin gets value provided by Pf for f.
- Admin checks f ? CNFs. If false, Pf loses.
- Admin gets value provided by Pv for v.
- Admin checks v ? assignments(f). If false, Pv
loses.
39A Two-Party, SG-Based MAX-SAT Competition (IV)
?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f)fsat(v,f)
- Admin gets value provided by Pf for f.
- Admin checks f ? assignments(f). If false, Pf
loses. - Admin evaluates fsat(f,f)fsat(v,f). If true Pv
wins, otherwise Pf wins.
40Rationale (I)
?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f)fsat(v,f)
- Controllable admin overhead.
?f ? CNFs ?v ? assignments(f). satisfies-max(v,f)
41Rationale (II)
- Correct there is a winning strategy for
verifiers of true claims and falsifiers of false
claims. Regardless of the opponents actions.
42Rationale (III)
- Objective.
- Systematic.
- Learning chances.
43SG-Based Two-Party Competitions
- We let participants debate the correctness of an
interpreted predicate logic sentence specifying
the computational problem of interest, assuming
that participants choose to take opposite sides.
44Out-of-The-Box, SG-Based, Two-Party MAX-SAT
Competition
- ?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f) fsat(v,f) - 1. Falsifier provides a CNF formula f.
- 2. Verifier provides an assignment v.
- 3. Falsifier provides an assignment f.
- 4. Admin evaluates fsat(f,f) fsat(v,f). If
true, verifier wins. Otherwise, falsifier wins.
45- Pros and cons of
- out-of-the-box, SG-based, two-Party competitions
solve our - meaningful, Two-Party Competitions.
- Admin provides neither benchmark problems nor
their solutions.
46Pro (I) Systematic
- The rules of an SG are systematically derived
from the syntax of its underlying claim. - SGs are also defined for other logics.
47Rules of SG(?f, A?, v, f)
f Move Next Game
?x ?(x) f provides x0 SG(??x0/x, A?, v, f)
? ? ? f chooses ???, ? SG(??, A?, v, f)
?x ?(x) v provides x0 SG(??x0/x, A?, v, f)
? ? ? v chooses ???, ? SG(??, A?, v, f)
? N/A SG(??, A?, f, v)
P(t0) v wins if p(t0) holds, o/w f wins v wins if p(t0) holds, o/w f wins
The Game of Language Studies in
Game-Theoretical Semantics and Its Applications
-- Kulas and Hintikka, 1983
48Pro (II) Objective
- Competition result is based on skills that are
precisely defined in the competition definition.
49Pro (III) Correct
- Competition result is based on demonstrated
possession (or lack of) skill. - Incorrectly solved problems by admin or opponent
cannot worsen participants rank. - There is a winning strategy for verifiers of true
claims and falsifiers of false claims. Regardless
of the opponents actions.
50Pro (III) Correct
- There is a winning strategy for verifiers of true
claims and falsifiers of false claims, regardless
of the opponents actions.
51Pro (IV) Controllable Admin Overhead
- Admin overhead is to implement the structure
interpreting the logic statement specifying a
computational problem. - It is always possible to scrap functionality out
of the interpreting structure at the cost of
adding complexity to the logic statement.
52Pro (V) Learning
- Losers can learn from SG traces.
53Pro (VI) Automatable
- Participants can codify their strategies for
playing SGs. - Efficient and thorough evaluation.
- Codified strategies are useful bi-products.
- Controlled information flow.
54Challenges (I)
- Participants must take opposing sides!
- Neutrality is lost with forcing.
55Con (II) Not Thorough
- Unlike sports-games, a single game is not
thorough enough.
56Con (III) Issues Scaling to N-Party Competitions
- In sports, tournaments are used to scale
two-party games to n-party competitions.
57Challenges (II)
- Scaling to N-Party Competition using a
tournament, yet - Avoid Collusion Potential especially in the
context of open online competitions where Sybil
identities are common and games are too fast to
spectate! - Ensure that participants get the same chance.
58Challenges (II)
- Scaling to N-Party Competition using a
tournament, yet
59Issue (II) Neutrality
- Do participants get the same chance?
- We have to force sides on participants.
- We may have vastly different number of verifiers
and falsifiers.
60Issue (II) Correctness and Neutrality
- We have to force sides on participants.
- Yet, we cannot penalize forced losers for
competition correctness. - Weve to ensure that all participants get the
same chance even though we may have vastly
different number of verifiers and falsifiers.
61Contributions
- Computational Problem Solving Labs (CPSLs).
- Simplified Semantic Games (SSGs).
- Provably Collusion-Resistant SSG-Tournament
Design.
62Computational Problem Solving Labs (CPSLs)
63CPSLs
- A structured interaction space centered around a
claim. - Community members contribute by submitting their
strategies for playing an SSG of the labs claim.
- Submitted strategies, compete in a provably
collusion resistant tournament of simplified
semantic games.
64- Control, Thorough and Efficient Evaluation.
65Codified Strategies
- Efficient and thorough evaluation.
- Useful by-products.
- Controlled information flow.
66CPSLs (II)
- A structured interaction space centered around a
claim. - Community members contribute by submitting their
strategies for playing an SSG of the labs claim.
- Once a new strategy is submitted in a CPSL, it
competes against the strategies submitted by
other members in a provably collusion resistant
tournament of simplified semantic games.
67Highest Safe Rung Problem
- The Highest Safe Rung (HSR) problem is to find
the largest number (n) of stress levels that a
stress testing plan can examine using (q) tests
and (k) copies of the product under test. - k 1, n q, linear search
- k gt q, n 2q, binary search
- 1 lt k lt q, n ?, ?
1
2
k
...
n
...
2
1 (safe)
68Computational Problem Solving Lab - Highest Safe
Rung
Welcome ....
Highest Safe Rung Admin Page
Log out
1
Description
The Highest Safe Rung (HSR) problem is to find
the largest number of stress levels that a stress
testing plan can examine using (q) tests and (k)
copies of the product under test.
2
Claim
... "HSR() forall Integer q forall Integer k
...
3
Game Traces
Publish
Hide
4
Standings
Save
69Computational Problem Solving Lab - Highest Safe
Rung
Highest Safe Rung
Welcome Sc1
Log out
1
Description
6
Standings
The Highest Safe Rung (HSR) problem is to find
the largest number of stress levels that a stress
testing plan can examine using (q) tests and (k)
copies of the product under test.
Rank Member Latest contribution of faults Chosen side
1 ... ... verifier
2 ... ... verifier
3 ... ... ...
20 ... ... ...
22 ... ... ...
21 ... ... ...
22 Sc1 1/1/2014 ...
23 ... ... ...
24 ... ... ...
25 ... ... falsifier
2
Download claim specification
3
Download strategy skeleton
4
Download traces of past games
Upload new Strategy
5
See all
70Claim Specification
71Simplified Semantic Games
72SG Rules
73SSGs
- Simpler use auxiliary games to replace moves
for conjunctions and disjunctions. - Thoroughness potential participants can provide
several values for quantified variables.
74SSG Rules
75HSR Claim Specification
class HSRClaim public static final String
FORMULAS new String HSR() forall
Integer q forall Integer k exists Integer n
HSRnqk(n, k, q) and ! exists Integer m greater
(m, n) and HSRnqk(m, q, k) HSRnqk(Integer n,
Integer q, Integer k) exists SearchPlan sp
correct(sp, n, q, k) public static boolean
greater(Integer n, Integer m) return n gt m
public static interface SearchPlan public
static class ConclusionNode implements
SearchPlan Integer hsr public static class
TestNode implements SearchPlan Integer testRung
SearchPlan yes // What to do when the jar
breaks. SearchPlan no // What to do when the
jar does not break . public static boolean
correct(SearchPlan sp, Integer n, Integer q,
Integer k) // sp satisfies the binary search
tree property , has n leaves , of depth at most
q, all root-to-leaf paths have at most k yes
branches . ...
76Strategy Specification
- One function per quantified variable.
77HSR Strategy Skeleton
class HSRStrategy public static
IterableltIntegergt HSR_q() ... public static
IterableltIntegergt HSR_k(Integer q) ... public
static IterableltIntegergt HSR_n(Integer q, Integer
k) ... public static IterableltIntegergt
HSR_m(Integer q, Integer k, Integer
n) ... public static IterableltSearchPlangt
HSRnqk_sp(Integer n, Integer q, Integer
k) ...
78(No Transcript)
79Semantic Game Tournaments
80Tournament Design
- Scheduler
- Neutral.
- Ranking function
- Correct and anonymous.
- Can mask scheduler deficiencies.
81Ranking Functions
- Input beating function representing output of
several games. - Output a total preorder of participants.
82Beating Functions (of SG Tournaments)
- bP(pw, pl, swc, slc, sw) sum of all gains of pw
against pl while pw choosing side swc , pl
choosing side slc and pw taking side sw. - More complex.
83Ranking Functions (Correctness)
- Non-Negative Regard for Wins.
- Non-Positive Regard for Losses.
84Non-Negative Regard For Wins (NNRW)
Px
Additional wins cannot worsen Pxs rank w.r.t.
other participants.
Wins
Faults
85Non-Positive Regard For Losses (NPRL)
Implies
Px
Additional faults cannot improve Pxs rank
w.r.t. other participants.
Wins
Faults
86Ranking Functions (Anonymity)
- Output ranking is independent of participant
identities. - Ranking function ignores participants
identities. - Participants also ignore their opponents
identities.
87Limited Collusion Effect
- Slightly weaker notion than anonymity.
- What you want in practice.
- A participant Py can choose to lose on purpose
against another participant Px, but that wont
make Px get ahead of any other participant Pz.
88Limited Collusion Effect (LCE)
Px
Games outside Pxs control cannot worsen Pxs
rank w.r.t. other participants.
Wins
Faults
89Discovery
- A useful design principle for ranking functions.
- Under NNRW, NPRL LCE LFB
- LFB is quite unusual to have.
- LFB lends itself to implementation.
90Locally Fault Based (LFB)
Relative rank of Px and Py depends only on
faults made by either Px or Py.
Px
Py
Wins
Faults
Faults
Wins
91Locally Fault Based (LFB)
Relative rank of Px and Py can depends only on
games faults made by either Px or Py.
Px
Py
Wins
Faults
Faults
Wins
92Locally Fault Based (LFB)
93Collusion Resistant Ranking Functions
94Beating Functions
- Represent outcome of a set of SSGs
- bP(pw, pl, swc, slc, sw) sum of all gains of pw
against pl while pw choosing side swc , pl
choosing side slc and pw taking side swc.
95Beating Functions (Operations)
- bPwpx games px wins.
- bPlpx games px loses.
- bPflpx games px loses while not forced.
- bPcpx bPwpx bPflpx games px controls.
- Can add them, bP0 is the identity element.
96Ranking Functions
- Take a beating function to a ranking
- Ranking a total pre-order.
97Limited Collusion Effect
- There is no way pys rank can be improved w.r.t.
pxs rank behind px back.
98Non-Negative Regard for Wins
- An extra win cannot worsen pxs rank.
99Non-Positive Regard for Losses
- An extra loss cannot improve pxs rank.
100Local Fault Based
- Relative rank of px w.r.t. py only depends on
faults made by either px or py.
101Main Result
102Visual Proof
103Fault Counting Ranking Function
- Players are ranked according to the number of
faults they make. The less the number of faults
the higher the rank. - Satisfies the NNRW, NPRL, LFB and LCE properties.
104Semantic Game Tournament Design
- For every pair of players
- If choosing different sides, play a single SG.
- If choosing same sides, play two SGs where they
switch sides.
105Tournament Properties
- Our tournament is neutral.
106Neutrality
- Each player plays nv nf - 1 SGs in their chosen
side, those are the only games it may make faults.
107Related Work
- Rating and Ranking Functions
- Tournament Scheduling
- Match-Level Neutrality
108Rating and Ranking Functions (I)
- Dominated by heuristic approaches
- Elo ratings.
- Whos 1?
- There are axiomatization of rating functions in
the field of Paired Comparison Analysis. - LCE not on radar.
- Independence of Irrelevant Matches (IIM) is
frowned upon.
109Rating and Ranking Functions (II)
- Rubinstein1980
- points system (winner gets a point) characterized
as - Anonymity ranks are independent of the names of
participants. - Positive responsiveness to the winning relation
which means that changing the results of a
participant p from a loss to a win, guarantees
that ps rank would improve. - IIM relative ranking of two participants is
independent of matches in which neither is
involved. - beating functions are restricted to complete,
asymmetric relations.
110Tournament Scheduling
- Neutrality is off radar.
- Maximizing winning chances for certain players.
- Delayed confrontation.
111Match-Level Neutrality
- Dominated by heuristic approaches
- Compensation points.
- Pie rule.
112Conclusion
- Semantic games of interpreted logic sentences
provide a useful foundation to organize
computational problem solving communities.
113(No Transcript)
114Future Work
- Problem decomposition labs.
- Social Computing.
- Evaluating Thoroughness.
115Questions?
116Thank You!
117(No Transcript)
118N-Party SG-Based Competitions
- A tournament of two-party SG-based competitions
119N-Party SG-Based Competitions Challenges (I)
- Collusion potential especially in the context of
open online competitions.
120N-Party SG-Based Competitions Challenges (II)
- Neutrality.
- Two-party SG-Based competitions are not neutral
when one party is forced.
121(No Transcript)
122Rationale (4) Anonymous
123Rationale (Objective)
- While constructively debating the correctness of
an interpreted predicate logic sentence
specifying a computational problem, participants
provide and solve instances of that computational
problem.
124- ?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f) fsat(v,f)
125- ?f ? CNFs ?v ? assignments(f)?f ? assignments(f).
fsat(f,f) fsat(v,f)
126Semantic Games
127A meaningful competition is
- Correct
- Anonymous
- Neutral
- Objective
- Thorough
128Correctness
- Rank is based on demonstrated possession (or lack
of) skill. - Suppose that we let participants create
benchmarks of MAX-SAT problems and their
solutions to evaluate their opponents. - Participants would be incentivised to provide the
wrong solutions.
129Anonymous
- Rank is independent of identities.
- There is a potential for collusion among
participants. This potential arise from the
direct communication between participants. This
potential is aggravated by the open online nature
of competitions.
130Neutral
- The competition does not give an advantage to any
of the participants. - For example, a seeded tournament where the seed
(or the initial ranking) can affect the final
ranking is not considered neutral.
131Objective
- Ranks are exclusively based on skills that are
precisely defined in the competition definition.
Such as solving MAX-SAT problems.
132Thorough
- Ranks are based on solving several MAX-SAT
problems.
133Thesis
- Semantic games of interpreted logic sentences
provide a useful foundation to organize
computational problem solving communities.
134Semantic Games
- Thoroughness means that the competition result is
based on a wide enough range of skills that
participants demonstrate during the competition.
135(No Transcript)