Voting Problems and Computer Science Applications - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

Voting Problems and Computer Science Applications

Description:

Hillary Clinton. Ties are allowed. 20. Rankings ... Clinton Clinton Obama Obama. Richardson Kucinich Edwards Richardson ... Obama Clinton. 26. Consensus Rankings ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Voting Problems and Computer Science Applications


1
Voting Problems and Computer Science Applications
Fred Roberts, Rutgers University
2
  • What do Mathematics and Computer Science have to
    do with Voting?

3
  • Have you used Google lately?

4
(No Transcript)
5
Have you used Google lately?
Did you know that Google has something to do with
voting?
6
  • Have you tried buying a book on online lately?

7
(No Transcript)
8
(No Transcript)
9
  • Have you tried buying a book on online lately?
  • Did you get a message saying If you are
    interested in this book, you might want to look
    at the following books as well?

Did you know that has something to do with voting?
10
  • Have you ever heard of v-sis?

11
  • Have you ever heard of v-sis?
  • Its a cancer-causing gene.
  • Computer scientists helped discover how it works?
  • How did they do it?
  • The answer also has something to do with voting.

Cancer cell
12
  • Some connections between Computer Science and
    Voting are clearly visible.
  • Some people are working on plans to allow us to
    vote from home over the Internet.

13
Electronic Voting
  • Security Risks in Electronic Voting
  • Could someone put on a denial of service
    attack?
  • That is, could someone flood your computer and
    those of other likely voters with so much spam
    that you couldnt succeed in voting?

14
Electronic Voting
  • Security Risks in Electronic Voting
  • How can we prevent random loss of connectivity
    that would prevent you from voting?
  • How can your vote be kept private?
  • How can you be sure your vote is counted?
  • What will prevent you from selling your vote to
    someone else?

15
Electronic Voting
  • Security Risks in Electronic Voting
  • These are all issues in modern computer science
    research.
  • However, they are not what I want to talk about.
  • I want to talk about how ideas about voting
    systems can solve problems of computer science.

16
How do Elections Work?
  • Typically, everyone votes for their first choice
    candidate.
  • The votes are counted.
  • The person with the most votes wins.
  • Or, sometimes, if no one has more than half the
    votes, there is a runoff.

17
But do we necessarily get the best candidate
that way?

18
Sometimes Having More Information about Voters
Preferences is Very Helpful
  • Sometimes it is helpful to have voters rank order
    all the candidates
  • From their top choice to their bottom choice.

19
Rankings

Dennis Kucinich
Bill Richardson
John Edwards
Ties are allowed
Hillary Clinton
Barack Obama
20
Rankings
  • What if we have four voters and they give us the
    following rankings? Who should win?
  • Voter 1 Voter 2 Voter 3 Voter 4
  • Clinton Clinton Obama Obama
  • Richardson Kucinich Edwards Richardson
  • Edwards Edwards Richardson Kucinich
  • Kucinich Richardson Kucinich Edwards
  • Obama Obama Clinton Clinton

21
Rankings
  • What if we have four voters and they give us the
    following rankings?
  • There is one added candidate.
  • Who should win?
  • Voter 1 Voter 2 Voter 3 Voter 4
  • Clinton Clinton Obama Obama
  • Gore Gore Gore Gore
  • Richardson Kucinich Edwards Richardson
  • Edwards Edwards Richardson Kucinich
  • Kucinich Richardson Kucinich Edwards
  • Obama Obama Clinton Clinton

22
Rankings
  • Voter 1 Voter 2 Voter 3 Voter 4
  • Clinton Clinton Obama Obama
  • Gore Gore Gore Gore
  • Richardson Kucinich Edwards Richardson
  • Edwards Edwards Richardson Kucinich
  • Kucinich Richardson Kucinich Edwards
  • Obama Obama Clinton Clinton
  • Maybe someone who is everyones second choice is
    the best choice for winner.
  • Point We can learn something from ranking
    candidates.

23
Consensus Rankings
  • How should we reach a decision in an election if
    every voter ranks the candidates?
  • What decision do we want?
  • A winner
  • A ranking of all the candidates that is in some
    sense a consensus ranking
  • This would be useful in some applications
  • Job candidates are ranked by each interviewer
  • Consensus ranking of candidates
  • Make offers in order of ranking
  • How do we find a consensus ranking?

24
Consensus Rankings
These two rankings are very close Clinton Obam
a Obama Clinton Edwards Edwards Kucinich Kuc
inich Richardson Richardson
25
Consensus Rankings
These two rankings are very far
apart Clinton Obama Richardson Kucinich Edwar
ds Edwards Kucinich Richardson Obama Clinton

26
Consensus Rankings
  • This suggests we may be able to make precise how
    far apart two rankings are.
  • How do we measure the distance between two
    rankings?

27
Consensus Rankings
  • Kemeny-Snell distance between rankings twice the
    number of pairs of candidates i and j for
    which i is ranked above j in one ranking and
    below j in the other the number of pairs that
    are ranked in one ranking and tied in another.
  • a b
  • x y-z
  • y x
  • z
  • On x,y 2
  • On x,z 2
  • On y,z 1
  • d(a,b) 5.

28
Consensus Rankings
  • One well-known consensus method
  • Kemeny-Snell medians Given set
  • of rankings, find ranking minimizing
  • sum of distances to other rankings.
  • Kemeny-Snell medians are having
  • surprising new applications in CS.

John Kemeny, pioneer in time sharing in CS
29
Consensus Rankings
  • Kemeny-Snell median Given rankings a1, a2, ,
    ap, find a ranking x so that
  • d(a1,x) d(a2,x) d(ap,x)
  • is as small as possible.
  • x can be a ranking other than a1, a2, , ap.
  • Sometimes just called Kemeny median.

30
Consensus Rankings
  • a1 a2 a3
  • Fish Fish Chicken
  • Chicken Chicken Fish
  • Beef Beef Beef
  • Median a1.
  • If x a1
  • d(a1,x) d(a2,x) d(a3,x) 0 0 2 2
  • is minimized.
  • If x a3, the sum is 4.
  • For any other x, the sum is at least 1 1 1
    3.

31
Consensus Rankings
  • a1 a2 a3
  • Fish Chicken Beef
  • Chicken Beef Fish
  • Beef Fish Chicken
  • Three medians a1, a2, a3.
  • This is the voters paradox situation.

32
Consensus Rankings
  • a1 a2 a3
  • Fish Chicken Beef
  • Chicken Beef Fish
  • Beef Fish Chicken
  • Note that sometimes we wish to minimize
  • d(a1,x)2 d(a2,x)2 d(ap,x)2
  • A ranking x that minimizes this is called a
    Kemeny-Snell mean.
  • In this example, there is one mean the ranking
    declaring all three alternatives tied.

33
Consensus Rankings
  • a1 a2 a3
  • Fish Chicken Beef
  • Chicken Beef Fish
  • Beef Fish Chicken
  • If x is the ranking declaring Fish, Chicken
  • and Beef tied, then
  • d(a1,x)2 d(a2,x)2 d(ap,x)2
  • 32 32 32 27.
  • Not hard to show this is minimum.

34
Consensus Rankings
  • Theorem (Bartholdi, Tovey, and Trick, 1989
    Wakabayashi, 1986) Computing the Kemeny-Snell
    median of a set of rankings is an NP-complete
    problem.

35
Consensus Rankings
  • Okay, so what does this have to do with practical
    computer science questions?

36
Consensus Rankings
  • I mean really practical computer science
    questions.

37
(No Transcript)
38
Google Example
  • Google is a search engine
  • It searches through web pages and rank orders
    them.
  • That is, it gives us a ranking of web pages from
    most relevant to our query to least relevant.

39
Meta-search
  • There are other search engines besides Google.
  • Wouldnt it be helpful to use several of them and
    combine the results?
  • This is meta-search.
  • It is a voting problem
  • Combine page rankings from several search engines
    to produce one consensus ranking
  • Dwork, Kumar, Naor, Sivakumar (2000)
    Kemeny-Snell medians good in spam resistance in
    meta-search (spam by a page if it causes
    meta-search to rank it too highly)
  • Approximation methods make this computationally
    tractable

40
(No Transcript)
41
Collaborative Filtering
  • Recommending books or movies
  • Combine book or movie ratings by various people
  • This too is voting
  • Produce a consensus ordered list of books or
    movies to recommend
  • Freund, Iyer, Schapire, Singer (2003) Boosting
    algorithm for combining rankings.
  • Related topic Recommender Systems

42
Meta-search and Collaborative Filtering
  • A major difference from the election situation
  • In elections, the number of voters is large,
    number of candidates is small.
  • In CS applications, number of voters (search
    engines) is small, number of candidates (pages)
    is large.
  • This makes for major new complications and
    research challenges.

43
  • Have you ever heard of v-sis?
  • Its a cancer-causing gene.
  • Computer scientists helped discover how it works?
  • How did they do it?
  • The answer also has something to do with voting.

44
Large Databases and Inference
  • Decision makers consult massive data sets.
  • The study of large databases and gathering of
    information from them is a major topic in modern
    computer science.
  • We will give an example from the field of
    Bioinformatics.
  • This lies at the interface between Computer
    Science and Molecular Biology

45
Large Databases and Inference
  • Real biological data often in form of sequences.
  • GenBank has over 7 million sequences comprising
    8.6 billion bases.
  • The search for similarity or patterns has
    extended from pairs of sequences to finding
    patterns that appear in common in a large number
    of sequences or throughout the database
    consensus sequences
  • Emerging field of Bioconsensus applies
    consensus methods to biological databases.

46
Large Databases and Inference
Why look for such patterns? Similarities between
sequences or parts of sequences lead to the
discovery of shared phenomena. For example, it
was discovered that the sequence for platelet
derived factor, which causes growth in the body,
is 87 identical to the sequence for v-sis, that
cancer-causing gene. This led to the discovery
that v-sis works by stimulating growth.
47
Large Databases and Inference
DNA Sequences A DNA sequence is a sequence of
bases A Adenine, G Guanine, C
Cytosine, T Thymine Example ACTCCCTATAATGCGCC
A
48
Large Databases and Inference
Example Bacterial Promoter Sequences studied by
Waterman (1989) RRNABP1 ACTCCCTATAATGCGCCA TNA
A GAGTGTAATAATGTAGCC UVRBP2
TTATCCAGTATAATTTGT SFC
AAGCGGTGTTATAATGCC Notice that if we are looking
for patterns of length 4, each sequence has the
pattern TAAT.
49
Large Databases and Inference
Example Bacterial Promoter Sequences studied by
Waterman (1989) RRNABP1 ACTCCCTATAATGCGCCA TNA
A GAGTGTAATAATGTAGCC UVRBP2
TTATCCAGTATAATTTGT SFC
AAGCGGTGTTATAATGCC Notice that if we are looking
for patterns of length 4, each sequence has the
pattern TAAT.
50
Large Databases and Inference
Example However, suppose that we add another
sequence M1 RNA AACCCTCTATACTGCGCG The
pattern TAAT does not appear here. However, it
almost appears, since the pattern TACT appears,
and this has only one mismatch from the pattern
TAAT.
51
Large Databases and Inference
Example However, suppose that we add another
sequence M1 RNA AACCCTCTATACTGCGCG The
pattern TAAT does not appear here. However, it
almost appears, since the pattern TACT appears,
and this has only one mismatch from the pattern
TAAT. So, in some sense, the pattern TAAT is
a good consensus pattern.
52
Large Databases and Inference
Example We make this precise using best mismatch
distance. Consider two sequences a and b with b
longer than a. Then d(a,b) is the smallest
number of mismatches in all possible alignments
of a as a consecutive subsequence of b.
53
Large Databases and Inference
Example a 0011, b 111010 Possible
Alignments 111010 111010 111010 0011 0011
0011 The best-mismatch distance is 2, which is
achieved in the third alignment.
54
  • Large Databases and Inference
  • Smith-Waterman Method from Bioinformatics
  • Let S be a finite alphabet of size at least 2 and
    ? be a finite collection of sequences of length
    L with entries from S.
  • Let F(?) be the set of sequences of length k ?
    2 that are our consensus patterns. (Assume L ?
    k.)
  • Let ? a1, a2, , an.
  • One way to define F(?) is as follows.
  • Let d(a,b) be the best-mismatch distance.
  • Then let F(?) consist of all those sequences x
    for which the sum of the distances to elements of
    ? is as small as possible.
  • That is, find x so that
  • d(a1,x) d(a2,x) d(an,x)
  • is as small as possible.

55
  • Large Databases and Inference
  • We call such an F a Smith-Waterman consensus.
  • (This is a special case of a more general
    Smith-Waterman consensus method.)
  • Notice that this consensus is the same as the
    consensus we used in voting.
  • Example
  • An alphabet used frequently is the
    purine/pyrimidine alphabet R,Y, where R A
    (adenine) or G (guanine) and Y C (cytosine) or
    T (thymine).
  • For simplicity, it is easier to use the digits
    0,1 rather than the letters R,Y.
  • Thus, let S 0,1, let k 2. Then the
    possible pattern sequences are 00, 01, 10, 11.

56
  • Large Databases and Inference
  • Suppose a1 111010, a2 111111. How do we find
    F(a1,a2)?
  • We have
  • d(a1,00) 1, d(a2,00) 2
  • d(a1,01) 0, d(a2,01) 1
  • d(a1,10) 0, d(a2,10) 1
  • d(a1,11) 0, d(a2,11) 0
  • It follows that 11 is the consensus pattern,
    according to Smith-Watermans consensus.

57
  • Example
  • Let S 0,1, k 3, and consider F(a1,a2,a3)
    where
  • a1 000000, a2 100000, a3 111110. Possible
    pattern sequences are 000, 001, 010, 011, 100,
    101, 110, 111.
  • d(a1,000) 0, d(a2,000) 0, d(a3,000) 2,
  • d(a1,001) 1, d(a2,001) 1, d(a3,001) 2,
  • d(a1,100) 1, d(a2,100) 0, d(a3,100) 1,
    etc.
  • The sum of distances from 000 is smaller than the
    sum of distances from 001 and the same as the sum
    of distances from 100. So, 001 is not a
    consensus.
  • It is easy to check that 000 and 100 minimize the
    sum of distances.
  • Thus, these are the two Smith-Waterman
    consensus sequences.

58
Large Databases and Inference
  • Other Topics in Bioconsensus
  • Alternative phylogenies (evolutionary trees) are
    produced using different methods and we need to
    choose a consensus tree.
  • Alternative taxonomies (classifications) are
    produced using different models and we need to
    choose a consensus taxonomy.
  • Alternative molecular sequences are produced
    using different criteria or different algorithms
    and we need to choose a consensus sequence.
  • Alternative sequence alignments are produced and
    we need to choose a consensus alignment.

59
Large Databases and Inference
  • Other Topics in Bioconsensus
  • Several recent books on bioconsensus.
  • Day and McMorris 2003
  • Janowitz, et al. 2003
  • Bibliography compiled by Bill Day In molecular
    biology alone, hundreds of papers using consensus
    methods in biology.
  • Large database problems in CS are being
    approached using methods of bioconsensus having
    their origin in the theory of voting and
    elections.

60
Software Hardware Measurement
  • A statement involving scales of
  • measurement is considered meaningful if its
  • truth or falsity is unchanged under acceptable
    transformations of all scales involved.
  • Example It is meaningful to say that I weigh
    more than my daughter.
  • That is because if it is true in kilograms, then
    it is also true in pounds, in grams, etc.
  • Even meaningful to say I weigh twice as much as
    my daughter.
  • Not meaningful to say the temperature today is
    twice as much as it was yesterday.
  • Could be true in Fahrenheit, false in Centigrade.

61
Software Hardware Measurement
  • Measurement theory has studied what statements
    you can make after averaging scores.
  • Think of averaging as a consensus method.
  • One general principle To say that the average
    score of one set of tests is greater than the
    average score of another set of tests is not
    meaningful (it is meaningless) under certain
    conditions.
  • This is often the case if the averaging procedure
    is to take the arithmetic mean If s(xi) is score
    of xi, i 1, 2, , n, then arithmetic mean is
  • ?is(xi)/n s(x1) s(x2) s(xn)/n
  • Long literature on what averaging methods lead to
    meaningful conclusions.

62
Software Hardware Measurement
  • A widely used method in hardware measurement
  • Score a computer system on different benchmarks.
  • Normalize score relative to performance of one
    base system
  • Average normalized scores
  • Pick system with highest average.
  • Fleming and Wallace (1986) Outcome can depend on
    choice of base system.
  • Meaningless in sense of measurement theory
  • Leads to theory of merging normalized scores

63
Software Hardware Measurement
  • Hardware Measurement

BENCHMARK
E
F
G
H
I
P R O C E S S O R
R
M
Z
Data from Heath, Comput. Archit. News (1984)
64
Software Hardware Measurement
  • Normalize Relative to Processor R

BENCHMARK
E
F
G
H
I
P R O C E S S O R
R
M
Z
65
Software Hardware Measurement
  • Take Arithmetic Mean of Normalized Scores

Arithmetic Mean
BENCHMARK
E
F
G
H
I
P R O C E S S O R
1.00
R
1.01
M
1.07
Z
66
Software Hardware Measurement
  • Take Arithmetic Mean of Normalized Scores

Arithmetic Mean
BENCHMARK
E
F
G
H
I
P R O C E S S O R
1.00
R
1.01
M
1.07
Z
Conclude that processor Z is best
67
Software Hardware Measurement
  • Now Normalize Relative to Processor M

BENCHMARK
E
F
G
H
I
P R O C E S S O R
R
M
Z
68
Software Hardware Measurement
  • Take Arithmetic Mean of Normalized Scores

Arithmetic Mean
BENCHMARK
E
F
G
H
I
1.32
P R O C E S S O R
R
1.00
M
1.08
Z
69
Software Hardware Measurement
  • Take Arithmetic Mean of Normalized Scores

Arithmetic Mean
BENCHMARK
E
F
G
H
I
1.32
P R O C E S S O R
R
1.00
M
1.08
Z
Conclude that processor R is best
70
Software and Hardware Measurement
  • So, the conclusion that a given machine is best
    by taking arithmetic mean of normalized scores is
    meaningless in this case.
  • Above example from Fleming and Wallace (1986),
    data from Heath (1984)
  • Sometimes, geometric mean is helpful.
  • Geometric mean is
  • ? ?is(xi)

?
n
71
Software Hardware Measurement
  • Normalize Relative to Processor R

Geometric Mean
BENCHMARK
E
F
G
H
I
P R O C E S S O R
R
1.00
.86
M
.84
Z
Conclude that processor R is best
72
Software Hardware Measurement
  • Now Normalize Relative to Processor M

BENCHMARK
Geometric Mean
E
F
G
H
I
P R O C E S S O R
R
1.17
1.00
M
.99
Z
Still conclude that processor R is best
73
Software and Hardware Measurement
  • In this situation, it is easy to show that the
    conclusion that a given machine has highest
    geometric mean normalized score is a meaningful
    conclusion.
  • Even meaningful A given machine has geometric
    mean normalized score 20 higher than another
    machine.
  • Fleming and Wallace give general conditions under
    which comparing geometric means of normalized
    scores is meaningful.
  • Research area what averaging procedures make
    sense in what situations? Large literature.
  • Note There are situations where comparing
    arithmetic means is meaningful but comparing
    geometric means is not.

74
Software and Hardware Measurement
  • Message from measurement theory to computer
    science (and DM)
  • Do not perform arithmetic operations on data
    without paying attention to whether the
    conclusions you get are meaningful.

75
Concluding Comment
  • In recent years, interplay between
  • computer science/mathematics
  • and biology has transformed major
  • parts of biology into an information science.
  • Led to major scientific breakthroughs in biology
    such as sequencing of human genome.
  • Led to significant new developments in CS, such
    as database search.
  • The interplay between CS and methods of the
    social sciences such as the theory of voting and
    elections is not nearly as far along.

76
Concluding Comment
  • However, the interplay between computer
    science/mathematics and the social sciences has
    already developed a unique momentum of its own.
  • One can expect many more exciting outcomes as
    partnerships between computer scientists/
    mathematicians and social scientists expand and
    mature.

77
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com