Homework 1 and 2 review session - PowerPoint PPT Presentation

About This Presentation

Title:

Homework 1 and 2 review session

Description:

Title: PowerPoint Presentation Author: Kirill Last modified by: Kyrylo Bessonov Created Date: 8/16/2006 12:00:00 AM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 18

Provided by: Kiri70

Category:

more less

Transcript and Presenter's Notes

Title: Homework 1 and 2 review session

1
Homework 1 and 2 review session

Presented by
Kirill Bessonov
November 2012

2
HW1 classical Q A (GenomeGraphs) (1)

First two questions were on Bioconductor
libraries. There are BioC 608 packages
To get citations on particular library use
citation("library_name")
You were asked to get genomic data on specific
gene
library(GenomeGraphs)
download the whole database of Ensemble IDs
ensembl_Human_Genes useMart("ensembl",dataset"h
sapiens_gene_ensembl")
get info on gene form the database on the
Ensemble ID
gene lt- makeGene(id "ENSG00000115145",
type"ensembl_gene_id", biomart
ensembl_Human_Genes )
get info on transcript
transcript lt- makeTranscript(id
"ENSG00000115145", type"ensembl_gene_id",
biomart ensembl_Human_Genes)
gdPlot ( list("gene"gene, "transcripts"transcrip
t))
retrieve info from the database displaying first
25 entries
getBM(c("ensembl_gene_id", "hgnc_symbol",
"description"), filterc("with_exon_transcript",
"with_protein_id", "with_transcript_variation"),va
lueslist(TRUE, TRUE, TRUE), ensembl_Human_Genes
)125,

3
HW1 classical Q A (GenomeGraphs) (2)

What is the gene name (i.e. hgnc_symbol) and
function represented by the Ensembl ID -
ENSG00000115145?
geneInfogetBM(c("ensembl_gene_id",
"hgnc_symbol", "description"), filterc("with_exon
_transcript", "with_protein_id",
"with_transcript_variation"),valueslist(TRUE,
TRUE, TRUE), ensembl_Human_Genes )
gt geneInfogeneInfoensembl_gene_id
"ENSG00000115145",
ensembl_gene_id hgnc_symbol
description
4829 ENSG00000115145 STAM2 signal
transducing adaptor molecule (SH3 domain and ITAM
motif) 2
How many exons does the ensemble id
ENSG00000115145 has? 51 exons
attr(gene, "ens")
ensembl_gene_id ensembl_transcript_id
ensembl_exon_id exon_chrom_start exon_chrom_end
rank strand biotype
1 ENSG00000115145 ENST00000263904
ENSE00001351655 153032117 153032506
1 -1 protein_coding
ENSG00000115145 ENST00000263904
ENSE00002888710 153006659 153006743
2 -1 protein_coding
48 ENSG00000115145 ENST00000494589
ENSE00002785037 153004538 153004636
3 -1 protein_coding
49 ENSG00000115145 ENST00000494589
ENSE00002808134 153003676 153003822
4 -1 protein_coding
50 ENSG00000115145 ENST00000494589
ENSE00002929781 153001402 153001471
5 -1 protein_coding
51 ENSG00000115145 ENST00000494589
ENSE00001828491 153000503 153000527
6 -1 protein_coding

4
HW1 classical Q A (GenomeGraphs) (3)

Execute the following command. How many
chromosomes do you see?
25 chromosomes. 22 autosomal pairs, 1 sex pair
and one mitochondrial chromosome
Why the number of chromosomes in this Ensembl
dataset is greater than 23 chromosome pairs? What
does MT, X and Y refer to?
Because of the MT chromosome, since X and Y can
be grouped to a single pair
gt getBM("chromosome_name","","",
ensembl_Human_Genes)c(122,433435),1
1 "1" "10" "11" "12" "13" "14" "15" "16" "17"
"18" "19" "2" "20" "21" "22" "3" "4" "5" "6"
"7" "8" "9" "MT" "X" "Y"

5
HW2 Pairwise alignments (classical QA)
6
HW2 Pairwise alignments (classical QA) Q1

Please align globally using NeedlemanWunsch
algorithm the following DNA sequences. Use
The following scoring rules a) gap -5 b) match
between two bases 5 c) mismatch between two
bases 3

7
HW2 Pairwise alignments (classical QA) Q3

Do local protein alignment using BLOSUM 62 matrix
on the HEAGAWGHEE and PAWHAE sequence. The
scoring rules are a) gap -8 matches and
mismatches are given in BLOSUM 62 matrix.

8
HW2 Pairwise alignments (classical QA) Q5

Produce a dot plot of Human and Mouse p53
proteins from previous question and paste the
plot below.
Complete the lines of R code to get the dot
plot.
Are both proteins similar?
Yes, very similar since we see clear diagonal
corresponding to gt90 of sequences length
Where is/are the region(s) of greatest variation
occur?
Between 50-100

9
HW2 Pairwise alignments (classical QA) Q7

What global alignment score do you get for the
two p53 proteins, when you use the BLOSUM62 alignm
ent matrix, a gap opening penalty of -10 and a
gap extension penalty of -0.5? Answer score of
1556
query("p53_HUMAN", "ACP04637")
p53_HUMAN_seq getSequence(p53_HUMAN)
query("p53_MOUSE", "ACP02340")
p53_MOUSE_seq getSequence(p53_MOUSE)
globalAlign lt- pairwiseAlignment(p53_HUMAN_seq,
p53_MOUSE_seq, substitutionMatrix "BLOSUM62",
gapOpening -10, gapExtension -0.5)
Errors the R-code was not stated and the ID of
proteins were not given such as Uniprot ID P04637

10
HW2 Computer Style Implementation of NW
algorithm in R
11
HW2 Computer style (NW algorithm) 1

Given the pseudo-code implement NW algorithm in R
Algorithm has two parts
Calculation of the alignment F-matrix
Finding the optimal path(s) through the matrix

for to length(A) F(i,0) ? di for j0 to
length(B) F(0,j) ? dj for i1 to length(A)
for j1 to length(B) Match ?
F(i-1,j-1) S(Ai, Bj) Delete ? F(i-1, j)
d Insert ? F(i, j-1) d F(i,j) ?
max(Match, Insert, Delete)
d gap penalty score i and j positions in A
B sequences
12
HW2 Computer style (NW algorithm) 2

Fmatrix function(A,B)
fmatrix matrix(0, nrow (nchar(A)1) , ncol
nchar(B)1)
d -8 this is gap penalty
for(i in 0 nchar(A))
fmatrixi1,1 d i populates initial
row with gap penalty
for(j in 0 nchar(B))
fmatrix1,j1 d i
for(i in 1 nchar(A))
for(j in 1 nchar(B))
score rules(A,B) get me sccore for the
pair of aa or nt
match fmatrixi,j score
delete fmatrixi,j1 d
insert fmatrixi1,j d
fmatrixi1,j1 max(match,delete,insert
)
colnames(fmatrix) strsplit( paste(" " , B,
sep""), "")1

13
HW2 Computer style (NW algorithm) 3

rules function(A,B)
s.matrix lt- matrix(rep(0,16), nrow 4, ncol4,
byrowTRUE, dimnames list(c("A","C","G","T"),c
("A","C","T","G")))
s.matrix"A", c(2,-1,-1,-1)
s.matrix"C", c(-1,2,-1,-1)
s.matrix"T", c(-1,-1,2,-1)
s.matrix"G", c(-1,-1,-1,2)

gt s.matrix A C T G A 2 -1 -1 -1 C -1 2 -1
-1 G -1 -1 2 -1 T -1 -1 -1 2
14
HW2 Computer style (NW algorithm) 4

Check the F-matrix
fmatrixFmatrix("ATCG", "TG")
T G
-32 -32 -32
A -8 -16 -24
T -16 -6 -14
C -24 -14 -4
G -32 -22 -12
Start finding the optimal path(s) through the
matrix
AlignmentA ""
AlignmentB ""
i nchar(A) 1
j nchar(B) 1
while(i gt 1 j gt 1)
CurrentScore fmatrixi,j get score
at current position of F-matrix

15
HW1 Computer style (NW algorithm) 5

Selecting the bottom right cell and starting to
trace-back the path of optimal alignment
AlignmentA ""
AlignmentB ""
while(i gt 1 j gt 1)
CurrentScore fmatrixi,j
ScoreDiag fmatrixi - 1, j - 1
ScoreUp fmatrixi, j - 1
ScoreLeft fmatrixi - 1, j
considering the score came from diagonal
if (CurrentScore ScoreDiag
s.matrixsubstr(A,i,i), substr(B,j,j)) )
AlignmentA paste(substr(A,i-1,i-1),Alig
nmentA, sep "")
AlignmentB paste(substr(B,j-1,j-1),Alig
nmentB, sep "")
i i - 1
j j - 1

Which cell of the F-matrix I am now?
On diagonal path previous next cell
16
HW2 Computer style (NW algorithm) 6

considering if the score comes from left
(introducing a gap)
else if(CurrentScore ScoreLeft d)
AlignmentA paste(substr(A,i-1,i-1),AlignmentA
, sep "")
AlignmentB paste( "-", AlignmentB, sep
"")
i i - 1
considering if the score comes from upper cell
(introducing a gap)
else if(CurrentScore ScoreUp d)
AlignmentA paste( "-", AlignmentA, sep "")
AlignmentB paste(substr(B,j-1,j-1),
AlignmentB, sep "")
j j 1
print(AlignmentA)
print(AlignmentB)
finalScore cat("Final score ",fmatrix(nchar(A)
1),(nchar(B)1))

17
HW2 Computer style (NW algorithm) 7

The scoring matrices could have been accessed
though character indices not requiring conversion
and making code faster
How one would output more than one BEST possible
alignments?
Please use more comments in your R-code
Would be nice to see trace-backs visually
Also the scoring rules were not stated clearly

Write a Comment

User Comments (0)