Title: Welcome to Intro to Bioinformatics
1Welcome to Intro to Bioinformatics
2Bioinformatics in Space
3Bioinformatics in Space
Tribbles
Trogs
Warning! Highly dangerous!
Cute and harmless.
4Welcome to the Intergalactic Detention Center
- Please answer the following questions
- 1. Like broccoli
- 2. Floss every brushing
- 3. Enjoy ballet
- 4. Always pair socks
- 5. Liked Moby Dick
- 6. Eat the Maraschino cherry
1...10
5Responses to questionnaire
T1 T2 T3 T4 T5 T6
T7 . . .
1. Broccoli 2. Floss 3. Ballet 4. Pair socks 5
. Moby Dick
6. Maraschino . . .
9.2 1.6 4.0 5.2 2.2
9.1 1.0 . . .
2.2 1.9 1.0 4.6 7.6
9.8 1.0 . . .
8.3 3.1 2.4 6.1 9.3
9.2 1.0 . . .
9.6 5.5 1.3 8.4 9.8
9.0 1.0 . . .
4.2 2.1 1.0 4.1 5.2
4.4 1.0 . . .
6.4 8.9 7.1 3.3 1.9
2.0 1.0 . . .
6817. MacArthurs Park
1.2 1.5 5.1 3.4 1.1
1.7 9.9 . . .
You need a plan
6A Plan
- Release all Tribbles / Trogs
- Note outcome for each individual
- Integrate identities into results
- Figure out which questions/answers informative
7Responses to questionnaire
T1 T2 T3 T4 T5 T6
T7 . . .
1. Broccoli 2. Floss 3. Ballet 4. Pair socks 5
. Moby Dick
6. Maraschino . . .
9.2 1.6 4.0 5.2 2.2 9.1
1.0 . . .
2.2 1.9 1.0 4.6 7.6 9.8
1.0 . . .
8.3 3.1 2.4 6.1 9.3 9.2
1.0 . . .
9.6 5.5 1.3 8.4 9.8 9.0
1.0 . . .
4.2 2.1 1.0 4.1 5.2 4.4
1.0 . . .
6.4 8.9 7.1 3.3 1.9 2.0
1.0 . . .
6817. MacArthurs Park
1.2 1.5 5.1 3.4 1.1 1.7
9.9 . . .
Tribbles
Trogs
(what now?)
8Responses to questionnaire
T1 T2 T3 T4 T5 T6
T7 Mean
1. Broccoli 2. Floss 3. Ballet 4. Pair socks 5
. Moby Dick
6. Maraschino . . .
9.2 1.6 4.0 5.2 2.2 9.1
1.0 6.4 2.2
2.2 1.9 1.0 4.6 7.6 9.8
1.0 6.0 1.3
8.3 3.1 2.4 6.1 9.3 9.2
1.0 8.2 2.2
9.6 5.5 1.3 8.4 9.8 9.0
1.0 9.2 2.6
4.2 2.1 1.0 4.1 5.2 4.4
1.0 4.4 1.4
6.4 8.9 7.1 3.3 1.9 2.0
1.0 4.4 3.7
6817. MacArthurs Park
1.2 1.5 5.1 3.4 1.1 1.7
9.9 1.8 5.5
Tribbles
Trogs
9Which questions are informative?Which can be
used to predict class?
The responses to which questions are correlated
with class?
10Which questions are informative?Which can be
used to predict class?
Strategy
- Calculate correlation for each question
- Look for questions with largest correlations
with class
Implementation
11Which questions are informative?Which can be
used to predict class?
Strategy
- Calculate correlation for each question
- Look for questions with largest correlations
with class
Implementation
1...10
?µ s s
Correlation
-
s2 S (s - µ)2 / (N-1)s sqrt(s)
12Which questions are informative?Which can be
used to predict class?
Strategy
- Calculate correlation for each question
- Look for questions with largest correlations
with class
Implementation
?µ s s
Correlation
(S s)/ N - (S s)/N
sqrt(S (s - µ)2 / (N-1) sqrt(S (s - µ)2 /
(N-1))
13Which questions are informative?Which can be
used to predict class?
Implementation
?µ s s
Correlation
(S s)/ N - (S s)/N
sqrt(S (s - µ)2 / (N-1) sqrt(S (s - µ)2 /
(N-1))
Read_Responses_To_Question()
numerator Mean(_at_tribble_scores)
Mean(_at_trog_scores)
denominator StDev(_at_tribble_scores)
StDev(_at_trog_scores)
correlation numerator / denominator
push _at_question_info, question_number,
correlation
14Which questions are informative?Which can be
used to predict class?
Implementation
?µ s s
Correlation
(S s)/ N - (S s)/N
sqrt(S (s - µ)2 / (N-1) sqrt(S (s - µ)2 /
(N-1))
while ()
Read_Responses_To_Question()
numerator Mean(_at_tribble_scores)
Mean(_at_trog_scores)
denominator StDev(_at_tribble_scores)
StDev(_at_trog_scores)
correlation numerator / denominator
push _at_question_info, question_number,
correlation
15Which questions are informative?Which can be
used to predict class?
Implementation
sub Mean my _at_scores _at__
Grab Tribble or Trog scores
my s_sum 0 Start
S at 0 my N 0
Need to count N foreach my score (_at_sc
ores) s_sum s_sum score
N N 1 return s_sum /
N mean (S s)/ N
16Which questions are informative?Which can be
used to predict class?
Results
Question Correlation
3497 1.76
281 1.72 1114
1.71
Are these questions good predictors of class?
Suppose there are NO good predictors of class
17(Interlude)
NEWS! Precinct in Harrisonburg has voted for the
winning senatorial candidate every time for the
past ten elections!
(Probability if by chance
(1/2) (1/2) (1/2)
(1/2)10
1/1024 ? 1/1000
Suppose there are 1000 precincts in Virginia
(BLAST from the past) E (probability) (number
of combinations)
Beware the fallacy of the unlikely result!
18Which questions are informative?Which can be
used to predict class?
Results
Question Correlation
3497 1.76
281 1.72 1114
1.71
Are these questions good predictors of class?
Suppose there are NO good predictors of class
what would be the expected correlation?
19Which questions are informative?How to test
class predictors?
Choice 1 Rerun time with the different (?) reali
ty that Tribbles are no different from Trogs
Choice 2 Use random data
20Random responses to questionnaire
T1 T2 T3 T4 T5 T6
T7 . . .
1. Broccoli 2. Floss 3. Ballet 4. Pair socks 5
. Moby Dick
6. Maraschino . . .
9.2 -1600 331/3 99 3.14159 -0
1.0 . . .
6817. MacArthurs Park
Random doesnt mean crazy
21Random responses to questionnaire
T1 T2 T3 T4 T5 T6
T7 . . .
1. Broccoli 2. Floss 3. Ballet 4. Pair socks 5
. Moby Dick
6. Maraschino . . .
9.2 1.6 4.0 5.2 2.2
9.1 1.0 . . .
2.2 1.9 1.0 4.6 7.6
9.8 1.0 . . .
8.3 3.1 2.4 6.1 9.3
9.2 1.0 . . .
9.6 5.5 1.3 8.4 9.8
9.0 1.0 . . .
4.2 2.1 1.0 4.1 5.2
4.4 1.0 . . .
6.4 8.9 7.1 3.3 1.9
2.0 1.0 . . .
6817. MacArthurs Park
1.2 1.5 5.1 3.4 1.1
1.7 9.9 . . .
Maybe but
22Random responses to questionnaire
T1 T2 T3 T4 T5 T6
T7 . . .
1. Broccoli 2. Floss 3. Ballet 4. Pair socks 5
. Moby Dick
6. Maraschino . . .
9.2 1.6 4.0 5.2 2.2
9.1 1.0 . . .
2.2 1.9 1.0 4.6 7.6
9.8 1.0 . . .
8.3 3.1 2.4 6.1 9.3
9.2 1.0 . . .
9.6 5.5 1.3 8.4 9.8
9.0 1.0 . . .
4.2 2.1 1.0 4.1 5.2
4.4 1.0 . . .
6.4 8.9 7.1 3.3 1.9
2.0 1.0 . . .
6817. MacArthurs Park
1.2 1.5 5.1 3.4 1.1
1.7 9.9 . . .
Keep the data, shuffle the players
23Which questions are informative?How to test
class predictors?
Choice 1 Rerun time with the different (?) reali
ty that Tribbles are no different from Trogs
Choice 2 Use random data
Choice 3 Shuffle data
24Which questions are informative?How to test
class predictors?
10000 1000 100 10 0
of questions with better correlations
5 of shuffled responses
2.0 1.5 1.0
0.5 0 -0.5
Correlation
25Which questions are informative?How to test
class predictors?
10000 1000 100 10 0
of questions with better correlations
1 of shuffled responses
2.0 1.5 1.0
0.5 0 -0.5
Correlation
26Which questions are informative?How to test
class predictors?
10000 1000 100 10 0
Actual responses
of questions with better correlations
1 of shuffled responses
2.0 1.5 1.0
0.5 0 -0.5
Correlation