Bioinformatics I, Sequence Analysis - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Bioinformatics I, Sequence Analysis

Description:

Part 1: a story of mice and men. Of Mice and Men. Mouse genes and human genes are 80 to 95% identical, but their locations on the ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 26
Provided by: chrisby
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics I, Sequence Analysis


1
Bioinformatics I, Sequence Analysis
Lecture 1 Introduction, Genome Edit Distance
UNIX vi Pseudocode.
2
Topics covered in this course
  • Alignment, Multiple alignment. Homology/similarity
  • Motif finding, footprint finding, gene finding
  • Sorting, clustering, tree building, molecular
    evolution
  • Database structure, database searching,
    statistical significance.
  • Sequence models, Markov models.
  • Protein/RNA secondary structure prediction.
  • Genomics, proteomics.
  • Ontologies, algorithms.

3
Part 1 a story of mice and men
4
Of Mice and Men
  • Mouse genes and human genes are 80 to 95
    identical, but their locations on the chromosome
    are largely scrambled.
  • Mouse and human have a common ancestor about 80
    MYA. (million years ago). Since then, we have
    evolved independently.
  • Evolution occurs by point mutations and
    rearrangements.

Order of genes in a Human Chromosome
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Order of genes in a Mouse Chromosome
ABVWCUTSRQPKLMNOJIHGFEDXYZ
5
Synteny
conservation of gene order A cluster of genes
that occur in the same order in different genomes
is a "syntenic group".
6
Syntenic group in bacteria
7
(No Transcript)
8
Of Mice and Men
  • Gene rearrangement occurs by "reversals".

A
B
D
3'
C
5'
A
C
D
3'
B
5'
How many reversals does it take to switch Mouse
to Man?
9
A plausible theory
...for how reversals can occur is illegitimate
synapsis during prophase 1 of meiosis.
3'
5'
D
C
A
B
10
Chromosomal evolution is a series of flip-flops
ancestral mammal
123456789
432156789 436512789 435612789 435698721
123498765 123789465 123789645 873219645
human
mouse
11
The Sloppy Cook
The sloppy cook at a pancake diner makes pancakes
of all different sizes and stacks them
haphazardly.
The waiter likes the pancakes to be stacked with
the largest on the bottom and the smallest on
top. On the way to the table, using only one hand
with a spatula, he flips the pancakes until they
are arranged by size. How does he do it in the
minimum number of flips?
12
In class exercise
Given the arrangement below, flip the pancakes
until they are in order. How many flips? (You
can order the numbers instead of the pancakes.)
123456
642315
13
In class exercise
  • Write detailed instructions on how to stack six
    pancakes by flipping. The instructions should
    not depend on the starting order.
  • Give your instructions to your partner.
  • Follow your partner's written instructions to
    stack the following six "pancakes" in order
    125436 (Order them smallest to largest. The plate
    is on the right side)
  • Run the "instruction set". Describe what happens.
  • Fix bugs. Repeat as time permits.

14
Part 2 the basics
15
Biological Macromolecules are Conveniently
Represented as Linear Strings.
DNA nucleotides. 4 character alphabet. Protein
amino acids. 20 character alphabet. Lipids,
carbohydrates, other stuff not linear
heteropolymers. Not easily represented as a
sequence.
Your task MEMORIZE THE ALPHABET of AMINO ACIDS
16
A DNA Sequence
1 gtcgggaaga tggcgctacg tctgctgcgg
agggcggcgc gcggagctgc ggcggcggcg 61
ctgctgaggc tgaaagcgtc tctagcagct gatatcccca
gacttggata tagttcctca 121 tcccatcaca
agtacatccc ccggagggca gtgctttatg tacctggaaa
tgatgaaaag 181 aaaataaaga agattccatc
cctgaatgta gattgtgcag tgctcgactg tgaggatgga
241 gtggctgcaa acaaaaagaa tgaagctcga ctgagaattg
taaaaactct tgaagacatt 301 gatctgggcc
ctactgaaaa atgtgtgaga gtcaactcag tttccagtgg
tctggcggaa 361 gaagacctag agaccctttt
gcaatcccgg gtccttcctt ccagcctgat gctaccaaag
421 gtggaaagtc ctgaagaaat ccagtggttt gcagacaaat
tttcattcca cttaaaaggc 481 cgaaaacttg
aacaaccaat gaatttaatc ccttttgtgg aaactgcaat
gggtttgctc 541 aattttaagg cagtgtgtga
agaaaccctg aaggtcgggc ctcaagtagg tctctttcta
601 gatgcagtcg tttttggagg agaagacttt cgagccagca
taggtgcaac aagtagtaaa 661 gaaaccctgg
atattctcta cgcccggcaa aagattgttg tcatagcgaa
agcctttggt 721 ctccaagccg tagatctggt
gtacattgac tttcgagatg gagctgggct gcttagacag
781 tcacgagaag gagccgccat gggcttcact ggtaagcagg
tgattcaccc taaccaaatt 841 gccgtggtcc
aggagcagtt ttctccttcc cctgaaaaaa ttaagtgggc
tgaagaactg 901 attgctgcct ttaaagaaca
tcaacaatta ggaaaggggg cctttacttt ccaagggagt
961 atgatcgaca tgccattact gaagcaggcc cagaacactg
ttacgcttgc cacctccatc 1021 aaggaaaaat
gatctgttaa atgaagctgt catcggggaa tgctgagctg
caatgaccat 1081 tactgtagag ttacaacaag
agggtaaagt tcatacatgg cgacctgtgt caaatccgtc
1141 cattgatctg ccctccagca cacatttact gagcttctgt
tacgtgcctg tggttcttgg 1201 aaagagcttt
ttccttctct acaaggagga atctgatgca actgacatcc
tcaatagcta 1261 cagagaactt gcaaaggagt
agagagaatg tttgaggtcc agccttggtg tagagaagcg
1321 gcagaaacag aaatcccaaa aggtgtcatg cttggctcca
gctctgtgct ctcaggactc
17
RNA
A
G
The 2' OH is missing in DNA
Note U hasno methyl here
C
U
18
A Protein Sequence
MVGSLNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSVEGKQNL
VIMGKKTWFSIPEKNRPLKGRINLVLSRELKEPPQGAHFLSRSLDDALKL
TEQPELANKVDMVWIVGGSSVYKEAMNHPGHLKLFVTRIMQDFESDTFFP
EIDLEKYKLLPEYPGVLSDVQEEKGIKYKFEVYEKND
19
Study this page http//www.johnkyrk.com/aminoacid
.html
20
Chemical classification of the amino acids using
the Ven Diagram.
Taylor, W. R. (1986). Classification of amino
acid conservation. J. Theor. Biol. 119, 205-218.
21
Genetic Code
wobble base
Coding regions of DNA have special constraints on
mutation.
22
Learn UNIX basics
If you don't know UNIX, sit next to someone who
does.
Use the handouts or unixetc.pdf from the course
web page
23
In class exercises learning UNIX
  • List all files starting with lowercase L (ls).
  • Make a course directory. Call it bioinfo.
    (mkdir)
  • Change directories to your new directory. (cd)
  • Copy the file "lotsofjunk" from my directory to
    your directory. (see whiteboard)
  • Count the number lines in "lotsofjunk" that have
    the string "product". (grep) answer___________

24
In class exercises learning UNIX
Find out how to sort by field (man sort). Sort
the file lotsofjunk using field 2, numerically.
Same thing. Pipe the output to more () Same
thing. Instead of piping, redirect the output to
a file called sortedjunk (gt) Edit the file
lotsofjunk using vi, (vi, see next page)
25
In class exercises vi lotsofjunk
Try each of the move commands and write what it
does on a separate page. Try each of the delete
commands and write what it does on a separate
page. Hit Undo ("u") after each delete, so the
file is unchanged. Try each of the modify
commands and describe what it does on a separate
page. Don't forget to hit escape! Search for the
string "protein_id" (/). Copy those lines (yy)
and put them at the top of the file (1Gp).
Delete the remaining lines (.,d), and save the
file as "junk" (w junk).
Write a Comment
User Comments (0)
About PowerShow.com