Algorithms for Splicing Junction - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Algorithms for Splicing Junction

Description:

Stop codon. etc. Gene: Transcription : DNA to RNA. RNA ... including Start codon, Donor sites, Acceptor sites and Stop codon. What's the current status? ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 17
Provided by: maishe
Category:

less

Transcript and Presenter's Notes

Title: Algorithms for Splicing Junction


1
Algorithms for Splicing Junction Donor
Recognition in Genomic DNA Sequences
Maisheng Yin Department of Computer Science New
Jersey Institute of Technology
yinm_at_homer.njit.edu
Jason T. L. Wang Department of Computer
Science New Jersey Institute of
Technology jason_at_homer.njit.edu
2
  • Gene
  • Genetic information-containing elements
  • Distributed to each cell when cell divides
  • Made of deoxyribonucleic acid --DNA
  • Gene
  • Transcription DNA to RNA
  • RNA Splicing Remove Intons--mRNA
  • mRNA translation--Protein

3
Gene Transcription and RNA Splicing
4
Consensus sequences for the 5 and 3 splicing
sites used in RNA splicing
5
  • Molecular biologists approach ------rate
    limiting
  • Bioinformatics approach
  • Using computer program to predict different
    gene components
  • including Start codon, Donor sites, Acceptor
    sites and Stop codon

6
  • Available Programs
  • GeneID, GeneParser, GenLang and GRAIL2, etc
  • Performance of these programs
  • Burset, M., and Guigo, R. Evaluation of Gene
    Structure Prediction Programs.
  • Genomics 34, 353-367, 1996

7
DNA sequences for this research
ftp.ics.uci.edu/pub/machine-learning-dat
abases/molecular-biology
8
(No Transcript)
9
Donor site information
Position -3 -2 -1 0 1 2 3
4 5 C
T Nucleotide
or A G G T or A G T
A A

CAGGTTAGT 3 in 550 CAGGTAAGT
12 in 550 AAGGTTAGT 6 in
550 AAGGTAGAT 4 in 550
10
The Donor Pattern Motif Model
A donor site contains 10 motifs
Mi(p, n), i (i
1, 2, ..., 10) denotes motif number ( see the
following) p (1 ? p ? 4) denotes the
motif start position in the donor site,
p 1 means the -3 position in
Fig. 1. n (6 ? n ? 9) denotes
the motif length.
11
Learning Process Building up the motif Library
for each donor site in the
Learning Set do begin
extract each motifs Mi(p, n),
1 ? i ? 10, 1 ? p ? 4, 6 ? n ? 9
insert M(p, n) into the
appropriate motif set
SMp, in the motif library end
12
The Motif Model and the Motif Library is a very
good donor data representative, and it can be
used for further study.
Percentage of motifs from true donor group and
false donor group that can be found in the
motive library. Positive motifs from the 200
true donor group. Negative motifs from the 900
false donor group.
13
Training Process Group Discriminant Analysis
Motif score
The positive donor training sequences are
a group of sequences with one donor site
in each sequence at the known position P.
for each sequence Seq in the Positive
and Negative Training Groups do
begin extract each motif Mi(p,n)
search the motif set SMp,n for Mi(p,n)
if Mi(p,n) found let S be
the score for Mi(p,n) S S
2n if Seq is from the Positive
Training Group write the score S
to the positive score board else
write the score S to the negative score
board let Pmin the minimum score in
the positive score board
let Nmax the maximum score in the negative
score board if Pmin lt Nmax
swap (Pmin, Nmax) end

Donor score
Criteria
Lpos max Lp, Un max 640, 768 768
Uneg min Lp, Un min 640, 768
640.
14
Input candidate donor sites group Output
label each candidate donor site sequence
as True Donor or False Donor or Unknown for
each candidate donor site CD do begin
let S be the score for CD S
0 for motif number i 1 to i 10
do begin extract
motif Mi(p,n) search the motif
set SMp,n for Mi(p,n) if
Mi(p,n) found S S 2n
end if S gt Pmin
label CD as True Donor else
if S lt Nmax label CD as False
donor else label CD as
Unknown end
15
Detection of Donor Sites in DNA Sequences
  • Principle
  • Input DNA sequences
  • Scan along the sequence for
  • GT
  • If GT found, extract Donor
  • candidate
  • Donor classification
  • Repeat until getting to the end
  • of DNA

16
CONCLUSION
We Developed the algorithms for splicing
junction donor classification and detection.
Using our Motif model we can discover the
degenerate pattern features of the splicing
junction sites to a great degree. Based on this
model, our donor detection algorithm can
correctly recognize 93 of the total donor sites
in the test group. More than 91 of the donor
sites detected by our algorithm are correct.
These precision rates are higher than the best
existing donor classification algorithm. This
research made a very important progress toward
our development of a full gene structure
detection algorithm.
Write a Comment
User Comments (0)
About PowerShow.com