Title: Shailendra Mishra
1Shailendra Mishra
2Case Study Collusion detection in multi-player
Games
- Consider the following problem
- (A) Commits an identity theft.
- (A) Acquires (n) credit cards as a result of the
identity theft. - (A) Goes to an online gaming site uses the credit
cards to play online poker with his (s) friends. - (A) loses all his money to his (s) friends.
- The online gaming company has to now pay his
friends. - Analysis of the problem
- Assume, for a moment that (A) didnt commit
identity theft. - (A) is playing a fair game with his friends or
otherwise. - The results of this game, generate a stream of
outcomes of wins and losses by (A) to any no. of
his friends where 1lt i lt s. - The problem is to detect whether the pattern of
wins and losses are genuine or not. - More formally, we are asking
- When is a certain number of a particular
subsequence unlikely to be fortuitous.
3Modeling the Collusion Detection Problem
- Let T be an ordered sequence of events.
- Let W be the window observation of size w within
which the analysis is confined. - Formally consider an alphabet ? of cardinality
? . - Consider an event sequence T t1, t2,tn of
length n over ?. - We then define an episode over ? as follows
- Single pattern S s1s2s3sm of length m
- Set of patterns S1,S2,,Sd .
- Set of all distinct permutations of S where
ordering within window of observation doesnt
matter.
4Formal Statement of the Problem
- Assume event sequence is generated by a memory
less Bernoulli or Markov source. - Lets restate our problem formally we are
interested in finding ??(n, w, m) that represents
the number of windows containing atleast one
occurrence of S, when sliding the window n events
over T. - To address this
- Compute the Expected value ?? (n, w, m).
- Compute Var(?? (n, w, m).
- Show that ?? (n, w, m) converges to a normal
distribution. - Allows us to set a threshold ?(n, m, w) s.t for a
given confidence level ? that P(? (n, w, m) gt
?(n, w, m)) lt ?. - Implies, For ?(n, w, m) occurrences of such
windows, probability that such a number is
generated by randomness is highly unlikely.
5Formulation of equivalent Pattern Matching Problem
- Given an alphabet ? a1, a2, , a ? and a
pattern Ss1s2sm of length m. - Search occurrences of S as subsequence within a
window W of size w in another sequence known as
the event sequence T t1t2..tn of length n. - A valid occurrence of S in T corresponds to a
set of integers i1, i2,..,im such that the
following hold - 1 lti1 lt i2 lt lt im lt n
- ti1 s1, ti2 s2, tim sm
- im i1 lt w
- We now estimate ?? (n, w, m, S, ?) which
represents (windows) that contains atleast one
occurrence of S, when sliding window over n
consecutive events in event sequence T over
alphabet ?.
6Theorams Results Gwadera, Attalah
Szpankowski (Purdue)
- Consider a memoryless source with pi being the
probability of generating symbol ai e ?. - Also, assume P(S) ?m i1 pi
- Result -1 Probability that a window of size w
contains atleast one occurrence of episode S. - For all m and w gt m we have
- P?(w, m) P(S) ? w-m i0 ? ? k0mnk ?qknk
- where qk 1-pk
- Result -2
- Let now m be fixed and i ? j gt pi ? pj, then
for any e gt 0 - P?(w, m) 1 - P(S) ? m i1 ? (1-pi)w /pi ?j?I m
1/(pj-pi) O(ew) - where w -gt 8
7Computation of Bounds
- Assume a memoryless source, then for x O(1), we
have - limn-gt8P?? (n, w, m)-E(?? (n, w, m))/v(Var(P(??
(n, w, m)) lt x - 1/2p?-8x exp(-t2/2)dt for a fixed m and w.
- Now lets establish the threshold for ?(n, m, w).
- First we find an a0 for a given ß s.t
- ß ? a0 8 exp(-t2/2)dt P N(0, 1) gt a0
- Where N(0, 1) is the standard normal
distribution. - We set the threshold
- ?(n, w, m) E(?? (n, w, m) vVar(?? (n, w, m)
- As long as we are in the region where central
limit theoram applies - P?? (n, w, m) gt ?(n, w, m) lt ß
8A
9Shailendra Mishra
10SQL Standards update
- Pattern Matching Proposal Version 12 of the
review draft has been circulated. - Participants Coral8 ltsome partsgt, IBM, Oracle,
Streambase. - BEA systems also reviewed the draft.
- Status 12th version of the draft is ready and
has been circulated. - Objective - Submit a working draft to ANSI SQL
- Discussing a streams language proposal with IBM
- Participants IBM ORACLE
- Status Exchanged Docs. Regarding language
specifications - Objective - Submit a working draft to ANSI SQL
- Discussing convergence language proposal with
Streambase - Participants IBM Streambase
- Status Discussing convergence proposal for the
last 6 months - Objective - Submit a paper to Transactions on
Databases (TODS)
11Pattern Query With ONE ROW PER MATCH
- SELECT a_symbol, a_tstamp, / start time /,
a_price, / start price /, - max_c_tstamp, / inflection time /,
last_c_price, / low price /, - max_f_tstamp, / end time /, last_c_price, /
end price /, Matchno - FROM Ticker MATCH_RECOGNIZE (PARTITION BY Symbol
- MEASURES A.Symbol AS a_symbol, A.Tstamp AS
a_tstamp, - A.Price AS a_price, MAX (C.Tstamp) AS
max_c_tstamp, - LAST (C.Price) AS last_c_price, MAX (F.Tstamp) AS
max_f_tstamp - MATCH_NUMBER AS matchno
- ONE ROW PER MATCH
- AFTER MATCH SKIP PAST LAST ROW
- MAXIMAL MATCH
- PATTERN (A B C D E F)
- DEFINE B AS (B.price lt PREV(B.price)),
- C AS (C.price lt PREV(C.price)),
- D AS (D.Price gt PREV(D.price)),
- E AS (E.Price gt PREV(E.Price)),
- F AS (F.Price gt PREV(F.price)
- AND F.price gt A.price))
12Pattern Query With All ROWs PER MATCH
SELECT a_symbol, a_tstamp, / start time /,
a_price, / start price /, max_c_tstamp, /
inflection time /, last_c_price, / low price
/, max_f_tstamp, / end time /, last_c_price,
/ end price /, Matchno FROM Ticker
MATCH_RECOGNIZE (PARTITION BY Symbol MEASURES
A.Symbol AS a_symbol, A.Tstamp AS
a_tstamp, A.Price AS a_price, MAX (C.Tstamp) OVER
() AS max_c_tstamp, LAST (C.Price) OVER () AS
last_c_price, MAX (F.Tstamp) OVER () AS
max_f_tstamp MATCH_NUMBER AS matchno CLASSIFIER
AS classy AFTER ROW PER MATCH AFTER MATCH SKIP
PAST LAST ROW MAXIMAL MATCH PATTERN (A B C D E
F) DEFINE B AS (B.price lt PREV(B.price)), C AS
(C.price lt PREV(C.price)), D AS (D.Price gt
PREV(D.price)), E AS (E.Price gt
PREV(E.Price)), F AS (F.Price gt
PREV(F.price) AND F.price gt A.price))
13MATCH_RECOGNIZE syntax
The full syntax of the MATCH_RECOGNIZE clause is
as under PARTITION BY optional MEASURES -
optional, but we expect this will always be
used ONE ROW ALL ROWS PER MATCH default
to ONE ROW AFTER MATCH SKIP TO NEXT ROW PAST
LAST ROW TO ltvariablegt TO LASTltvariablegt
TO FIRST ltvariablegt - default AFTER MATCH
SKIP PAST LAST ROW MAXIMAL INCREMENTAL
MATCH - defaults to MAXIMAL MATCH PERMUTE
optional PERMUTE EXPAND - optional PATTERN
mandatory SUBSET optional DEFINE
mandatory CLASSIFIER - optional (ALL ROWS PER
MATCH only) MATCH_NUMBER - optional
14A