Information organization - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Information organization

Description:

DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial cds, and ... dbSTS-Non-redundant db of sequence-tagged sites (useful for physical mapping) ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 20
Provided by: jmom
Category:

less

Transcript and Presenter's Notes

Title: Information organization


1
Information organization
  • April 19, 2011
  • Learning objectives-Understand how information is
    stored in GenBank. Learn how to read a GenBank
    flat file. Learn how to search GenBank for
    information. Understand difference between
    header, features and sequence. Distinguish
    between a primary database and secondary
    database.
  • Homework 3 due today.

2
What is GenBank?
  • Gene sequence database
  • Annotated records that represent single
    contiguous stretches of DNA or RNA-may have more
    than one coding region.
  • Generated from direct submissions to the DNA
    sequence databases from the authors.
  • Part of the International Nucleotide Sequence
    Database Collaboration.

3
http//www.ncbi.nlm.nih.gov/Genbank/genbankstats.h
tml
4
History of GenBank
  • Began with Atlas of Protein Sequences and
    Structures (Dayhoff et al., 1965)
  • In 1986 it shared data with EMBL and in 1987 it
    shared data with DDBJ.
  • Primary database
  • Examples of secondary databases derived from
    GenBank UniProt, EST database.
  • GenBank Flat File is a human readable form of a
    GenBank record.

5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
General Comments on GBFF
  • Three sections
  • 1) Header-information about the whole record
  • 2) Features-description of annotations-each
    represented by a key.
  • 3) Nucleotide sequence-each ends with // on last
    line of record.
  • DNA-centered
  • Translated sequence is a feature

9
Feature Keys
  • Purpose
  • 1) Indicates biological nature of sequence
  • 2) Supplies information about changes to
    sequences
  • Feature Key Description
  • conflict Separate determinations of the
    same seq. differ
  • rep_origin Origin of replication
  • protein_bind Protein binding site on DNA
  • CDS (Protein) coding sequence

10
Feature Keys-Terminology
  • Feature Key Location/Qualifiers
  • CDS 23..400
  • /productalcohol dehydro.
  • /geneadhI
  • The feature CDS is a coding sequence beginning at
    base 23 and ending at base 400 that has a product
    called alcohol dehydrogenase and corresponds to
    the gene called adhI.

11
Feature Keys-Terminology (Cont.)
  • Feat. Key Location/Qualifiers
  • CDS join (544..589,688..1032)
  • /productT-cell recep. B-ch.
  • /partial
  • The feature CDS is a partial coding sequence
    formed by joining the indicated elements to form
    one contiguous sequence encoding a product called
    T-cell receptor beta-chain.

12
Record from GenBank
GenBank division (plant, fungal and algal)
Locus name
Modification date
LOCUS SCU49845 5028 bp DNA
PLN 21-JUN-1999 DEFINITION Saccharomyces
cerevisiae TCP1-beta gene, partial cds, and
Axl2p (AXL2) and Rev7p (REV7) genes,
complete cds. ACCESSION U49845 VERSION
U49845.1 GI1293613 KEYWORDS . SOURCE
baker's yeast. ORGANISM Saccharomyces
cerevisiae Eukaryota Fungi
Ascomycota Hemiascomycetes Saccharomycetales
Saccharomycetaceae Saccharomyces.
Accession number (never changes)
Coding sequence
GeneInfo identifier (changes whenever there is a
change)
Nucleotide sequence identifier (changes when
there is a change in sequence (accession.version))
Word or phrase describing the sequence (not based
on controlled vocabulary). Not used in newer
records.
Common name for organism
Formal scientific name for the source organism
and its lineage based on NCBI Taxonomy Database
13
Record from GenBank (cont.1)
  • REFERENCE 1 (bases 1 to 5028)
  • AUTHORS Torpey,L.E., Gibbs,P.E., Nelson,J.
    and Lawrence,C.W.
  • TITLE Cloning and sequence of REV7, a gene
    whose function is required
  • for DNA damage-induced mutagenesis in
    Saccharomyces cerevisiae
  • JOURNAL Yeast 10 (11), 1503-1509 (1994)
  • MEDLINE 95176709
  • REFERENCE 2 (bases 1 to 5028)
  • AUTHORS Roemer,T., Madden,K., Chang,J. and
    Snyder,M.
  • TITLE Selection of axial growth sites in
    yeast requires Axl2p, a
  • novel plasma membrane glycoprotein
  • JOURNAL Genes Dev. 10 (7), 777-793 (1996)
  • MEDLINE 96194260

Oldest reference first
Medline UID
REFERENCE 3 (bases 1 to 5028) AUTHORS
Roemer,T. TITLE Direct Submission JOURNAL
Submitted (22-FEB-1996) Terry Roemer, Biology,
Yale University, New Haven, CT, USA
Submitter of sequence (always the last reference)
14
Record from GenBank (cont.2)
There are three parts to the feature key a
keyword (indicates functional group), a location
(instruction for finding the feature), and a
qualifier (auxiliary information about a feature)
  • FEATURES Location/Qualifiers
  • source 1..5028
  • /organism"Saccharomyces
    cerevisiae"
  • /db_xref"taxon4932"
  • /chromosome"IX"
  • /map"9"
  • CDS lt1..206
  • /codon_start3
  • /product"TCP1-beta"
  • /protein_id"AAA98665.1"
  • /db_xref"GI1293614"
  • /translation"SSIYNGISTSGLDLN
    NGTIADMRQLGIVESYKLKRAVVSSASEA
  • AEVLLRVDNIIRARPRTANRQHM"

Location
Keys
Qualifiers
Partial sequence on the 5 end. The 3 end is
complete.
Start of open reading frame
Descriptive free text must be in quotations
Database cross-refs
Protein sequence ID
Values
Note only a partial sequence
15
Record from GenBank (cont.3)
Another location
  • gene 687..3158
  • /gene"AXL2"
  • CDS 687..3158
  • /gene"AXL2"
  • /note"plasma membrane
    glycoprotein"
  • /codon_start1
  • /function"required for
    axial budding pattern of S.
  • cerevisiae"
  • /product"Axl2p"
  • /protein_id"AAA98666.1"
  • /db_xref"GI1293615"
    /translation"MTQLQISLLLTATISLLHLVVATP
    YEAYPIGKQYPPVARVN. . .
  • gene complement(3300..4037)
  • /gene"REV7"
  • CDS complement(3300..4037)
  • /gene"REV7"
  • /codon_start1
  • /product"Rev7p"
  • /protein_id"AAA98667.1"
  • /db_xref"GI1293616"
    /translation"MNRWVEKWLRVYLKCYINLILFYRNV
    YPPQSFDYTTYQSFNLPQ . . .

Cutoff
Another location
Cutoff
16
Record from GenBank (cont.4)

BASE COUNT 1510 a 1074 c 835 g 1609
t ORIGIN 1 gatcctccat atacaacggt
atctccacct caggtttaga tctcaacaac ggaaccattg
61 ccgacatgag acagttaggt atcgtcgaga gttacaagct
aaaacgagca gtagtcagct . . .//
17
protein
RNA
DNA
cDNA
DNA databases derived from GenBank containing
data for a single gene
  • Non-redundant (nr)
  • dbGSS
  • dbSTS

RNA (cDNA) databases derived from
GenBank containing data for a single gene
  • dbEST
  • UniGene

18
Types of primary databases carrying biological
infomation
  • GenBank/EMBL/DDBJ
  • dbEST-expressed sequence tags-single pass cDNA
    sequences (high error freq.)
  • It is non-redundant
  • PDB-Three-dimensional structure coordinates of
    biological molecules
  • PROSITE-database of protein domain/function
    relationships.

19
Workshop
Write a Comment
User Comments (0)
About PowerShow.com