Marcella A. McClure, Ph.D. - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Marcella A. McClure, Ph.D.

Description:

Bioinformatic Studies on the Evolution Structure and Function of ... RHERV NLSGKKQYP FTVLDLKDA TVLPQGFK LQYVDDLLIS TIEYLGFLL LKGFLGMAG. T47DHERV ILPVKKSDG FTV ... – PowerPoint PPT presentation

Number of Views:390
Avg rating:3.0/5.0
Slides: 31
Provided by: crg3
Category:
Tags: mcclure | ftv | marcella

less

Transcript and Presenter's Notes

Title: Marcella A. McClure, Ph.D.


1
(No Transcript)
2
Computational Virology
Lectures in
Bioinformatic Studies on the Evolution Structure
and Function of RNA-based Life Forms
Marcella A. McClure, Ph.D. Department of Microbio
logy and the Center for Computational Biology
Montana State University, Bozeman MT
mars_at_parvati.msu.montana.edu
3
Summary Lecture II
  • Introduction to Retroid Agents
  • The Genome Parsing Suite
  • Retroid Agents in the Human Genome
  • Discovery-based Hypothesis Generation

4
Retroid Agents
Retroviruses, retrotransposons,
pararetroviruses, retroposons, retroplasmids,
retrointrons, and retrons




RNA viruses e.g., Ebola, rabies, influenza, polio
All cellular systems most DNA Viruses
reverse transcriptase mediated replication or
transposition
RNA
DNA
Replication by DNA-dependent DNA polymerase
transcription
Replication by RNA-dependent RNA Polymerase
translation
snRNAs, ribozymes tRNA, rRNA

PROTEIN SYNTHESIS
McClure, 2000
5
Distribution of Retroid Agents among Eukaryotes
and Eubacteria
6
(No Transcript)
7
Gene Maps
Phylogenetic Tree based
Gene Maps
on 65 RT sequences
MA
C
NC
retroviruses
HIV-1
orphan class
DIRS-1
C
NC
gypsy-like retrotransposons
17.6
NC
CaMV
caulimoviruses
hepadnaviruses
HBV
NC
copia-like retrotransposons
Copia
C
LIN-H
NC
C
CIN4
C
R2Bm


NC
retroposons
C
I-FAC
INGI
introns
INT-SC1
Group II
plasmids
MAUP
retrons
MX65
TERT
1000
2000
3000
4000
RT reverse transcriptase
RH ribonuclease H

Nucleotides
H-C/IN integrase

PR aspartic acid protease
McClure, 2000
8
RNA-dependent DNA Polymerase
Reverse Transcriptase
Ribonuclease H
1 2 3 4
5 6
1 2 3 4
P
D
K
D E D





NX
D
3




fingers
palm
fingers
palm
thumb
connection
Aspartic Acid Protease
1 2 3
1 2 3
DTG G ILG
DTG G ILG
Integrase
1 2 3 4
1 2 3 4
D D E
Hx
H CX
C
Hx
H CX
C
D D E
4
2
4
2
zinc-binding
core
DNA-binding
zinc-binding
core
DNA-binding




9
Roles of Retroid Agents
1) Disease a) retroviruses 1) exogenous inf
ectious HIV HTLV 2) endogenous associations b
reast cancer, testicular tumors,
insulin dependent diabetes, multiple
sclerosis, rheumatoid arthritis, schizophr
enia and systemic lupus erythematosus
b) LINEs insertional mutagenesis
1) Hemophilia A 2) muscular dystrophies Duc
henne and Fukuyama- congenital type
3) X-linked disorders Alport
Syndrome-Diffuse Leiomyomatosis and Chroni
c Granulomatous Disease 2) Regulation of cellu
lar genes and reproduction 3) Telomere maintena
nce 4) Repair of broken dsDNA 5) Exchange of
genetic information among and between organisms
10
Possible function of HERV-W
11
What is the host genomic environment of active
Retroid Agents ?
Predicted functional RT
Predicted Retroid genome
Real Contig
Real Chromosome
What roles do Retroid Agents play in disease,
development, reproduction and evolution through
out the three domains of life?
12
Status of the Human Genome Project
  • 3,200,000 Kbp of the euchromatic portion of the
    human chromosomes are being sequenced
  • Heterochromatic portion is not being done
  • As of July, 2003
  • 99.9 of euchromatic portion to 99 accuracy

13
(No Transcript)
14
Genome Parsing Suite (jGPS)
The Stage I 1) WU-tBLASTn RT queries through dat
abase 2) Raw hits sorted by contig and direction
3) Remove WU-tBLASTn redundancy 4) Compound sm
all hits likely to be from one gene
5) Remove false positives due to query
cross-coverage 6) Quality assessment of unique RT
hits motifs, perfect, 1F/S, etc.
Excise 14kb DNA centered on potential RT
The Stage 2 1) WU-tBLASTn each gene component th
rough 14kb DNA databases 2) Raw hits for each com
ponent sorted by contig and direction
3) Remove WU-tBLASTn redundancy
4) Compound small hits likely to be from one
gene 5) Build each Retroid genome-- using the RT-
outward approach 6) Quality assessment of potenti
al Retroid genomes presence of components,
stop-codons, framshifts and percent identity
toeach query component
15
Verification of Low Frequency Hits
Excise each LFH from the genome
HTLV1 Query 80      IDLKDAFFQIPLPKQ-FQPYFAFTV
P 104                      IDLKD FF IPL K F  FA
FTP RTX   Sbjct 7910057 IDLKDCFFTIPLAKEDFEK-FAF
TIP 7910131
Use as tBLASTn query on Genbank
  • Expected result
  • Same initial query that retrieved hit from HGD
  • Or
  • 2) More similar to another RT NOT in the initial
    query set

Observed result
RTX   Query 1    IDLKDCFFTIPLAKEDFEKFAFTIP 25
                   IDLKDCFFTIPLAKEDFEKFAFTIP
HERVK  Sbjct 5173 IDLKDCFFTIPLAKEDFEKFAFTIP 5247
16
False Positives due to BLAST error
(tBLASTn on the human genome)
Probe X
Readingframe 1
Human genome
Probe Y
Readingframe 2
Probe X
Readingframe 1
Human genome
Readingframe 2
Probe Y
WARNING It appears that in MOST cases tBLASTn ret
urns ALL hits in the SAME position without regar
d to reading frame but in SOME cases it does
not?
17
RT Motif Chart for all 30 Probes
Probe I II III IV V
VI   LINE ILIPKPGRD LMNIDAKIL TGTRQGCP SLFADDMI
VY RIKYLGIQL PCSWVGRIN LHERV WPVQKTDGS YAAIDLANA
TVLPQGYI VHYIDDIMLI SVKFLGSSG HISYLGVLF
EHERV LPVPKPGTK FTCLDLKDA TQLPQRFK LQYVDDLLLG
QVCYLGFTI VREFLGAVG FHERV ILPIKKPDG FSVLDFKDF TI
LHQGFR LQHEDDLLLC KVSYLGLII LLSFLGLVG
WHERV LGVQKPNRQ FTVLDLQDA TILPQGFR SVGVDDLLLA
SQQYLGLKL LRGFLGVIG FRDHERV ILTVKKTNG FSVLDFKNF
TVLPQGFR LQYMDDLLIC AIQYLGIIM FAFLGITR
SHERV WPVRKPDGT HFVVDLANA TMLPQGYV FHYIDDIMIL
SAKLLGVIW FVGFLGYQ RHERV NLSGKKQYP FTVLDLKDA TV
LPQGFK LQYVDDLLIS TIEYLGFLL LKGFLGMAG
T47DHERV ILPVKKSDG FTVIDLKVD TVLPQGFT LQYMDDLLIS
EVKYLGHLI LRKFLGLVT KHERV FVIQKKSGK LIIIDLKDC KV
LPQGML IHCIDDILCA PFHYLGMQI FQKLLGDIN
IHERV ILPVKKSDG FTVIDLKDA TVLPQGFM LQYVDDILIS
KVKYLGRLI LRKFLGLVG HHERV LPVQKPDKS YSVLDLKDG TV
LPQGFR IQYIDELLLC SVTYLGIIL LLSFLGMVG
FMuLV LPVKKPGTN YTVLDLKDA TRLPQGFK LQYVDDLLLA
QVKYLGYLL LREFLGTAG HTLV1 FPVKKANGT LQTIDLKDA RV
LPQGFK LQYMDDILLA TIKFLGQII LQALLGEIQ
SRV2 FVIKKKSGK KIVIDLKDC KVLPQGMA IHYMDDILIA
PYTYLGFQI FQKLLGDIN Snakehead WPVGKPDGS YSSLDISN
G TRLPQGFH LQYVDDILLM QVQYLGVNV LRSALGLFN
Spuma YPVPKPDGR KTTLDLANG TRLPQGFL QVYVDDIYLS
TVEFLGFNI LQSILGLLN FIV FAIKKKSGK VTVLDIGDA CSLP
QGWI YQYMDDIYIG PYTWMGYEL LQKLAGKIN
HIV1 FAIKKKDST VTVLDVGDA NVLPQGWK YQYMDDLYVG
PFLWMGYEL IQKLVGKLN Dirs FTVPKPGTN MVKLDIKKA KTM
PFGLS IAYLDDLLIV SITFLGLQI PRKLAGLKG
Gypsy VLVPKKDGT FTTLDLHSG TVMPFGLV NVYLDDILIF
ETEFLGYSI AQRFLGMIN Caulimo KRRGKKRMV FSSFDCKSG
NVVPFGLK CVYVDDILVF KINFLGLEI LQRFLGILT
Badna EVAQKPRIV FSKFDLKAG NVCPFGIA LLYIDDILIA
EVEYLGVEI LQAYLGLLN HBV FLVDKNPHN WLSLDVSAA RKIP
MGVG FSYMDDVVLG SLNFMGYVI IVGLLGFAA
Copia WTITKRPEN KYQIDYEET MRLPQGIS LLYVDDVVIA
IKHFIGIRI CRSLIGCLM Intron VGGEKGPYS TGRIDDQEN G
LTPKTEF VRYADDLLLG TVEFPGMVI KFRNLGNSI
Retron TVEKKGPEK ILNIDLEDF NLLPQGAP TRYADDLTLS
QRKVTGLVI HHIFCGKSS PMAUP VYIPKANGK FPSVDLAYL NG
VPQGAS IMYADDGILC SVKFLGLEF YIQVLGYLP
Archaea IEIPKKSGG LLEFDIKGL KGTPQGGV ERYADDSVIH
KFDFLGYTF WVNYYGLFY HTERT RFIPKPDGL FVKVDVTGA QG
IPQGSI LRLVDDFLLV EDEALGGTA RRKLFGVLR
18
The score of a given motif is calculated by
M, M1 and M2 are based on the number of amino
acids in a motif found in common between a known
RT query sequence and the potential RT
M is a count of amino acid identities M1 is a
count on conservative substitution of (ILMV,
AG, ST, DE, NQ, FY, RK) M2 accounts for older s
ubstitutions (LIMV, AGST, DENQ, FYW, RKH)
The overall OSM score is calculated by
T motifs is the number of motifs comprising the
OSM
19
(No Transcript)
20
Figure 1 Gene products of two common Retroid
agents found in the human genome. A. The Human
Endogenous Retrovirus (HERV) is bounded by two
long terminal repeats (5 and 3 LTRs), with
three major genes GAG encodes proteins essential
for ribonuclear protein complex formation and
capsid assembly POL which encodes the enzymatic
core of the virus including protease (Pr),
Reverse Transcriptase (RT), the tether (T) which
connects the RT and the Ribonuclease H (RH)
domains, and the Integrase (IN) and ENV which
encodes the membrane proteins necessary for
exogenous particle formation. B. The LINE agent
contains many of the same components as a
retrovirus, but lacks LTRs, GAG and ENV, and has
a reduced enzymatic core that includes an
apurinic-apyrimidinic endonuclease (APE) instead
of the IN. A Leucine zipper protein (LZ) is found
in ORF I, and the enzymatic core in ORF II. UN is
a conserved region of unknown function. LINEs
are bounded by untranslated regions (UTRs) that
encode a promoter (P) found in the 5'UTR and a
polyadenylation signal and poly A tail (A(n))
found near the 3' UTR. Both agents are flanked
Target Site Duplications (TSDs), which are 7-21
base host genomic repeats that are hallmarks of
integration of a reverse transcribed DNA.
21
Distribution of Significant WU-tBLASTn Hits Per
Query Sequence.
GPS Stage 2 Full Agent Analysis
GPS Stage 1 RT Analysis
22
(No Transcript)
23
Distribution of Significant WU-tBLASTn Hits Per
Chromosome.
GPS Stage 1 RT Analysis
GPS Stage 2 Full Agent Analysis
LINEs/HERVs
LINEs
24
(No Transcript)
25
Distribution of Significant WU-tBLASTn Hits on a
Query Per Motif Basis.
Number Of Motifs
26
Looking at the environment of each Retroid Agent
Truncated LINE inserted into Intron 6
Truncated L1MB1 inserted into Intron 6
Truncated L1PA5 inserted into Intron 8
Truncated LINE inserted into Intron 18

Chromosome 21 contig NT_029490
TPTE Gene
Figure 3 Looking at the environment of each
Retroid Genome. In this example, four truncated
LINEs are found within three different exons of a
putative Tyrosine Phosphatase gene (TPTE).
Insertions of Retroid genomes into introns may
have little effect on a gene, or may allow for
gene shuffling. In this case none of the coding
region for the gene was disrupted, which
demonstrates that Retroid sequence information
may be utilized to make introns, or selection
favors insertions that do not disrupt coding
capacity or introns may provide the preferential
target site for transposition. The black lines
represent the exons of the TPTE gene.
27
(No Transcript)
28
Distribution of Retroid Agents on Human
Chromosomes
(November, 2002 Freeze)
Query 30 distinct reverse transcript
ase sequences representing 18 subgroups were used
to query the NCBIs Human Genome Database
Results 1) Retroid Agents are not
randomly distributed on Human Chromosomes.
2) Chromosomes X and Y have the highest
percent Retroid Agent sequence
3) Of those remaining, Chromosome 4, has the
most, while Chromosome 20 comprises the
least percent Retroid Agents.
Only two chromosomes, 19 and 21 are witho
ut at least one intact and potentially
active LINE. Using exact sequence lengths
for each hit of each category indicated in
the table of data, the November freeze of
the human genome contains at least 2.84
unique RT sequences, and 1.98 full-length
LINEs.
29
New hypotheses from discovery-based research
1) Low frequency RT-like sequences (not from
LINEs or ERVs) are discernible in the Human
Genome. 2) Human low frequency RT-like sequences
are remnants of ancient invasions.
3) Human low frequency RT-like sequences are
remnants of failed invasions.
4)The pattern of low frequency RT-like sequences
is unique In each organismal genome.
5) Both unique and trans-organismal patterns of
low frequency RT-like sequences are found in
Eukaryotes.
What mechanisms could be maintaining these
signals ?
  • Gene conversion, an event without a mechanism.
  • Transcriptional inactivation due to methylation
    of CpG regions.
  • Translational recoding.
  • Complementation.

30
Hugh Richardson, Ph.D., M.S., Post Doc.
Brad Crowther, B.S., Bioinformatician I/Lab
Manager
Vijay Raghavan, B.S., Systems Admin/Programmer
Crystal Hepp, B.S., Bioinformatician I
Angela Olson, Undergraduate Ashwini Talasila, Und
ergraduate

Dr. Marcella McClure, P.I. (Marcie)
Write a Comment
User Comments (0)
About PowerShow.com