Master of Science - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Master of Science

Description:

Margaret H. Dunham and Donya Quick Southern Methodist University ... Codon Group of 3 nucleotides. Amino acids have many codings. 3/14/08, UMKC. 10. Protein ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 44
Provided by: dream1
Learn more at: http://lyle.smu.edu
Category:
Tags: codon | master | science

less

Transcript and Presenter's Notes

Title: Master of Science


1
TCGR A Novel DNA/RNA Visualization Technique

Margaret H. Dunham and Donya Quick Southern
Methodist University Dallas, Texas
75275 mhd_at_engr.smu.edu
Some slides presented at IEEE BIBE 2006
2
Outline
  • Introduction
  • TCGR
  • EMM
  • miRNA Prediction using TCGR/EMM
  • Conclusion / Future Work

3
Outline
  • Introduction
  • Background
  • CGR/FCGR
  • Motivation
  • Research Objective
  • TCGR
  • EMM
  • miRNA Prediction using TCGR/EMM
  • Conclusion / Future Work

4
DNA
  • Deoxyribonucleic Acid
  • Basic building blocks of organisms
  • Located in nucleus of cells
  • Composed of 4 nucleotides
  • Adenine (A)
  • Cytosine (C)
  • Guanine (G)
  • Thymine (T)
  • Two strands bound together
  • Contains genetic information

Image source http//www.visionlearning.com/librar
y/module_viewer.php?mid63
5
Nucleotide Bases
http//www.people.virginia.edu/rjh9u/gif/bases.gi
f
6
Transcription
  • During transcription, DNA is converted in mRNA
  • RNA is processed and noncoding regions removed
  • Coding regions are converted in protein
  • Enzyme (RNA Polymerase) that starts transcription
    by binding to DNA code

7
Transcription
http//ghs.gresham.k12.or.us/science/ps/sci/ibbio/
chem/nucleic/chpt15/transcription.gif
8
RNA
  • Ribonucleic Acid
  • Contains A,C,G but U (Uracil) instead of T
  • Single Stranded
  • May fold back on itself
  • Needed to create proteins
  • Move around cells can act like a messenger
  • mRNA moves out of nucleus to other parts of cell

9
Translation
  • Synthesis of Proteins from mRNA
  • Nucleotide sequence of mRNA converted in amino
    acid sequence of protein
  • Four nucleotides
  • Twenty amino acids
  • Codon Group of 3 nucleotides
  • Amino acids have many codings

10
Central Dogma DNA -gt RNA -gt Protein
CCTGAGCCAACTATTGATGAA
CCUGAGCCAACUAUUGAUGAA
PEPTIDE
www.bioalgorithms.info chapter 6 Gene
Prediction
11
  • http//www.time.com/time/magazine/article/0,9171,1
    541283,00.html

12
Human Genome
  • Scientists originally thought there would be
    about 100,000 genes
  • Appear to be about 20,000
  • WHY?
  • Almost identical to that of Chimps. What makes
    the difference?
  • Answers appear to lie in the noncoding regions of
    the DNA (formerly thought to be junk)

13
More Questions
  • If each cell in an organism contains the same DNA
  • How does each cell behave differently?
  • Why do cells behave differently during childhood
    development?
  • What causes some cells to act differently such
    as during disease?
  • DNA contains many genes, but only a few are being
    transcribed why?
  • One answer - miRNA

14
miRNA
  • Short (20-25nt) sequence of noncoding RNA
  • Single strand
  • Previously assumed to be garbage
  • Impact/Prevent translation of mRNA
  • Bind to target areas in mRNA Problem is that
    this binding is not perfect (particularly in
    animals)
  • mRNA may have multiple (nonoverlapping) binding
    sites for one miRNA

15
miRNA Functions
  • Causes some cancers
  • Embryo Development
  • Cell Differentiation
  • Cell Death
  • Prevents the production of a protein that causes
    lung cancer
  • Control brain development in zebra fish
  • Associated with HIV

16
Outline
  • Introduction
  • Background
  • CGR/FCGR
  • Motivation
  • Research Objective
  • TCGR
  • EMM
  • miRNA Prediction using TCGR/EMM
  • Conclusion / Future Work

17
Chaos Game Representation (CGR)
  • Scatter plot showing occurrence of patterns of
    nucleotides.

University of the Basque Country
http//insilico.ehu.es/genomics/my_words/
18
Frequency CGR (FCGR)
  • Shows the frequencies of oligonucleotides using a
    color scheme normalized to the distribution of
    frequency of occurrence of associated patterns.

19
Chaos Game Representation (CGR)
FCGR
  • 2D technique to visually see the distribution of
    subpatterns
  • Our technique is based on the following
  • Generate totals for each subpattern
  • Scale totals to a 0,1 range. (Note scaling can
    be a problem)
  • Convert range to red/blue
  • 0-0.5 White to Blue
  • 0.5-1 Blue to Red

20
FCGR
Figures courtesy of Eamonn Keogh, UCR
21
FCGR Example
Homo sapiens all mature miRNA Patterns of
length 3
UUC
GUG
22
Outline
  • Introduction
  • Background
  • CGR/FCGR
  • Motivation
  • Research Objective
  • TCGR
  • EMM
  • miRNA Prediction using TCGR/EMM
  • Conclusion / Future Work

23
Motivation
2000bp Flanking Upstream Region mir-258.2 in C
elegans
a) All 2000 bp b) First 240 bp b)
Last 240 bp
24
Research Objectives
  • Identify, develop, and implement algorithms which
    can be used for identifying potential miRNA
    functions.
  • Create an online tool which can be used by other
    researchers to apply our algorithms to new data.

25
Outline
  • Introduction
  • CGR/FCGR
  • miRNA
  • Motivation
  • Research Objective
  • TCGR
  • EMM
  • miRNA Prediction using TCGR/EMM
  • Conclusion / Future Work

26
Temporal CGR (TCGR)
  • Temporal version of Frequency CGR
  • In our context temporal means the starting
    location of a window
  • 2D Array
  • Each Row represents counts for a particular
    window in sequence
  • First row first window
  • Last row last window
  • We start successive windows at the next character
    location
  • Each Column represents the counts for the
    associated pattern in that window
  • Initially we have assumed order of patterns is
    alphabetic
  • Size of TCGR depends on sequence length and
    subpattern lengt
  • As sequence lengths vary, we only examine
    complete windows
  • We only count patterns completely contained in
    each window.

27
TCGR Example
A C G T Pos 0-8 2 3 3 1 Pos 1-9
1 3 3 2 Pos 34-42 2 4 2 1
A C G T Pos 0-8 0.4 0.6 0.6 0.2 Pos
1-9 0.2 0.6 0.6 0.4 Pos 34-42 0.4 0.8 0.4 0.2
28
TCGR Example (contd)
  • TCGRs for Sub-patterns of length 1, 2, and 3

29
TCGR Example (contd)
A C G T
acgtgcacg cgtgcacgt tccggaacc ccggaacca
ccacgtcga
Window 0 Pos 0-8 Window 1 Pos
1-9 Window 17 Pos 17-25 Window 18
Pos 18-26 Window 34 Pos 34-42
30
TCGR Viruses miRNA(Window9 Pattern123)
Epstein Barr Human Cytomegalovirus
Kaposi sarc Herpesvirus Mouse
Gammaherpesvirus



Pattern1 Pattern2 Pattern3
31
TCGR Mature miRNA(Window5 Pattern3)
32
Outline
  • Introduction
  • CGR/FCGR
  • miRNA
  • Motivation
  • Research Objective
  • TCGR
  • EMM
  • miRNA Prediction using TCGR/EMM
  • Conclusion / Future Work

33
EMM Overview
  • Time Varying Discrete First Order Markov Model
  • Nodes are clusters of real world states.
  • Learning continues during prediction phase.
  • Learning
  • Transition probabilities between nodes
  • Node labels (centroid of cluster)
  • Nodes are added and removed as data arrives

34
EMM Definition
  • Extensible Markov Model (EMM) at any time t, EMM
    consists of an MC with designated current node,
    Nn, and algorithms to modify it, where algorithms
    include
  • EMMCluster, which defines a technique for
    matching between input data at time t 1 and
    existing states in the MC at time t.
  • EMMIncrement algorithm, which updates MC at time
    t 1 given the MC at time t and clustering
    measure result at time t 1.
  • EMMDecrement algorithm, which removes nodes from
    the EMM when needed.

35
EMM Cluster
  • Find closest node to incoming event.
  • If none close create new node
  • Labeling of cluster is centroid of members in
    cluster
  • O(n)

36
EMM Increment
lt18,10,3,3,1,0,0gt lt17,10,2,3,1,0,0gt lt16,9,2,3,1,0,
0gt lt14,8,2,3,1,0,0gt lt14,8,2,3,0,0,0gt lt18,10,3,3,1,
1,0.gt
37
Outline
  • Introduction
  • CGR/FCGR
  • miRNA
  • Motivation
  • Research Objective
  • TCGR
  • EMM
  • miRNA Prediction using TCGR/EMM
  • Conclusion / Future Work

38
Research Approach
  1. Represent potential miRNA sequence with TCGR
    sequence of count vectors
  2. Create EMM using count vectors for known miRNA
    (miRNA stem loops, miRNA targets)
  3. Predict unknown sequence to be miRNA (miRNA stem
    loop, miRNA target) based on normalized product
    of transition probabilities along clustering path
    in EMM

39
Related Work 1
  • Predicted occurrence of pre-miRNA segments form a
    set of hairpin sequences
  • No assumptions about biological function or
    conservation across species.
  • Used SVMs to differentiate the structure of
    hiarpin segments that contained pre-miRNAs from
    those that did not.
  • Sensitivey of 93.3
  • Specificity of 88.1
  • 1 C. Xue, F. Li, T. He, G. Liu, Y. Li, nad X.
    Zhang, Classification of Real and Pseudo
    MicroRNA Precursors using Local
    Structure-Sequence Features and Support Vector
    Machine, BMC Bioinformatics, vol 6, no 310.

40
Preliminary Test Data1
  • Positive Training This dataset consists of 163
    human pre-miRNAs with lengths of 62-119.
  • Negative Training This dataset was obtained
    from protein coding regions of human RefSeq
    genes. As these are from coding regions it is
    likely that there are no true pre-miRNAs in this
    data. This dataset contains 168 sequences with
    lengths between 63 and 110 characters.
  • Positive Test This dataset contains 30
    pre-miRNAs.
  • Negative Test This dataset contains 1000
    randomly chosen sequences from coding regions.
  • 1 C. Xue, F. Li, T. He, G. Liu, Y. Li, nad X.
    Zhang, Classification of Real and Pseudo
    MicroRNA Precursors using Local
    Structure-Sequence Features and Support Vector
    Machine, BMC Bioinformatics, vol 6, no 310.

41
TCGRs for Xue Training Data
POS I T I VE
NEGAT I VE
42
TCGRs for Xue Test Data
POS I T I VE
NEGAT I VE
43
Predictive Probabilities with Xues Data
EMM Test Data Mean Std Dev Max Min
Negative Test-Neg 0 0 0 0
Negative Test-Pos 0 0 0 0
Negative Train-Neg 0.37963 0.050085 0.91256 0.2945
Negative Train-Pos 0 0 0 0
Positive Test-Neg 0 0 0 0
Positive Test-Pos 0.25894 0.18701 0.42075 0
Positive Train-Neg 0 0 0 0
Positive Train-Pos 0.38926 0.048439 0.91155 0.32209
44
Preliminary Test Results
  • Positive EMM
  • Cutoff Probability 0.3
  • False Positive Rate 0
  • True Positive Rate 66
  • Test results could be improved by meta
    classifiers combining multiple positive and
    negative classifiers together.

45
Outline
  • Introduction
  • CGR/FCGR
  • miRNA
  • Motivation
  • Research Objective
  • TCGR
  • EMM
  • miRNA Prediction using TCGR/EMM
  • Conclusion / Future Work

46
Future Research
  1. Obtain all known mature miRNA sequences for a
    species initially the 119 C. elegans miRNAs.
  2. Create TCGR count vectors for each sequence and
    each sub-pattern length (1,2,3,4,5).
  3. Train EMMs using this data for each sub-pattern
    length. Thus five EMMs will be created
  4. Obtain negative data (much as Xue did in his
    research) from coding regions for C. elegans.
  5. Train EMMs using this data for each sub-pattern
    length. Thus five EMMs will be created
  6. Construct a meta-classifier based on the combined
    results of prediction from each of these ten
    EMMs.
  7. Apply the EMM classifier to the existing 75x106
    base pairs of non-exonic sequence in the C.
    elegans genome to search for miRNAs. Note all
    119 validated C. elegans miRNAs are contained in
    the non-exonic part of the genome and thus the
    first pass of the algorithm will be tested for
    its ability to detect all 119 validated miRNAs.
  8. Validate the prediction of novel miRNAs using
    molecular biology.
Write a Comment
User Comments (0)
About PowerShow.com