TGAC Electronic Sequences - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

TGAC Electronic Sequences

Description:

These options are also available through the Edit menu of most programs. Whence ... On two occasions, I have been asked [by members of Parliament], 'Pray, Mr. ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 26
Provided by: jamesg61
Category:

less

Transcript and Presenter's Notes

Title: TGAC Electronic Sequences


1
TGACElectronic Sequences
Insulin, sequenced in 1955
2
This is a sequence
  • MALWTRLRPLLALLALWPPPPARAFVNQHLCGSHLVEALYLVCGERGFFY
    TPKARREVEGPQVGALELAGGPGAGGLEGPPQKRGIVEQCCASVCSLYQL
    ENYCN

This is also a sequence
ACCATGATTACGCCAAGCTTGCATGCCTGCAGGTCGGCTGCATTCGAGGC
TGCCAGCAAGCAGGTCCTCGCAGCCCCGCCATGGCCCTGTGGACACGCCT
GCGGCCCCTGCTGGCCCTGCTGGCGCTCTGGCCCCCCCCCCCGGCCCGCG
CCTTCGTCAACCAGCATCTGTGTGGCTCCCACCTGGTGGAGGCGCTGTAC
CTGGTGTGCGGAGAGCGCGGCTTCTTCTACACGCCCAAGGCCCGCCGGGA
GGTGGAGGGCCCGCAGGTGGGGGCGCTGGAGCTGGCCGGAGGCCCGGGCG
CGGGCGGCCTGGAGGGGCCCCCGCAGAAGCGTGGCATCGTGGAGCAGTGC
TGTGCCAGCGTCTGCTCGCTCTACCAGCTGGAGAACTACTGTAACTAGGC
CTGCCCCGACAAATAAACCCTTACGAGCAAG
3
How to work with sequences
  • Cut paste sequences
  • Save files as text from sequence repositories
  • Unix vs. Windows format

4
How to Use Notepad
  • Start Accessories Click Notepad
  • Start Run Type notepad Click OK

5
How to Use Cut and Paste on PCs
  • Select sequence
  • Hit Ctrl c to copy
  • Hit Ctrl v to paste
  • These options are also available through the Edit
    menu of most programs

6
Whence come sequences?
  • Individual researchers
  • Genome sequencing projects
  • Patent applications

7
Whither do the sequences go?
8
(No Transcript)
9
How can sequences be obtained?
  • Batch ENTREZ

10
What is available via Entrez?
  • PubMed
  • Protein
  • Nucleotide
  • Structure
  • Genome
  • PopSet
  • OMIM
  • Taxonomy
  • Books
  • ProbeSet
  • 3D Domains
  • UniSTS
  • SNP
  • CDD

11
ENTREZ XRefs Then
12
ENTREZ XRefs Now
13
Sequence Formats
  • Raw
  • Fasta
  • ASN.1
  • GenBank/GenPept
  • DDBJ
  • Ensembl
  • Graphics
  • XML

- convertible via ReadSeq
14
Fasta Format
Greater than
No Whitespace!
Description line
  • gtgi23200275pdb1KP5B Chain B, Cyclic Green
    Fluorescent Protein TGSRHHHHHHSRKGEELFTGVVPILVELDG
    DVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFXVQCFS
    RYPDHMKRHDFFKSAMPEGYVQERTISFKDDGNYKTRAEVKFEGDTLVNR
    IELKGIDFKEDGNILGHKLEYNYNSHNVYITADKQKNGIKANFKIRHNIE
    DGSVQLADHYQQNTPIGDGPVLLPDNHYLS TQSALSKDPNEKRDHMVLL
    EFVTAAGLVPRGTGLYK

Line feed
Sequence in single-letter code
15
GenBankFormat
16
Graphics Format
17
RefSeqs
  • Reference sequence standards
  • Available for chromosomes, mRNAs, proteins
  • Non-redundant
  • Curated
  • Status and history are available
  • Avoids redundancy in GenBank

18
Nomenclature
  • NW_ whole genome shotgun assembly
  • NT_ BAC based contig
  • NM_ Reference transcript
  • XM_ Predicted transcript
  • NP_ Referrence protein
  • XP_ Predicted protein
  • NC_ Reference chromosome(including mitchondrial
    and chloroplast genomes)

19
Other Sequence Formats
  • GCG
  • DNA Strider
  • Intelligenetics
  • NBRF

- convertible in ReadSeq
20
Multiple Sequence Formats
  • MSF
  • Phylip
  • PAUP
  • Fitch
  • Pretty

- convertible in ReadSeq
21
Converting Sequence Formats
  • READSEQ
  • SEQIO
  • GCG e.g. FROMEMBL, TOFASTA, etc.

22
Batch ENTREZ
  • A method for obtaining large numbers of
    sequences by supplying a file containing a list
    of GI or accession numbers.

23
Charles Babbage
24
(No Transcript)
25
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com