Title: Biology
1Biology
Computer Science
2Introduction/Brief History
Protein Database
1. In 1974, Margret Dayhoff at the National
Biomedical Research Foundation (NBRF) devised the
concept of the protein family and super-family,
defined by sequence similarity, as a means of
organizing and classifying proteins. The
collection center became Protein Information
Resource (PIR). In 1988, PIR became
PIR-International as a result of collaborations
with NBRF, Munich Center for Protein Sequence
(MIPS), and Japan International Protein
Information Database (JIPID).
2. In 1986, SWISS-PROT database was founded by
Amos Bairoch from the department of medical
biochemistry in the University of Geneva.
TrEMBL is a computer-annotated suppliment of
SWISS-PROT with tranlational data from nucleotide
sequences from EMBL. http//us.expasy.org/sprot/
The data can be accessed through Sequence
Retrieve System (SRS). It is maintained at the
Swiss Institute for Bioinformatics.
A set of matrices (tables) were devised to
reflect percent amino acid mutations (PAM) which
shows the probability of an amino acid to be
mutated to another
3Introduction/Brief History
DNA Database
1. DNA sequence databases were first assembled in
Los Alamos National Laboratory (LANL), New Mexico
by Walter Goad and colleagues in GenBank database
and European Molecular Biology Laboratory (EMBL)
in Heidelberg, Germany. In 1979, Goad established
GenBank. LANL collected GenBank data until 1992
when GenBank became under National Center for
Biotechnology Information (NCBI). It can be
accessed through ENTREZ.
2. In 1980, EMBL database was founded. It is
maintained by European Bioinformatics Institute
(EBI) in Hinxton, Cambridge, UK. It can be
accessed by SRS system.
3. In 1984, DNA DataBank of Japan (DDBJ) in
Mishima, Japan was founded.
4, Other Databases UniGene www.ncbi.nlm.nih.go
v/UniGene/. Saccharomyces Genome Database
(SGD) www.stanford.edu/Saccharomyces/. EBI
Genomes www.ebi.ac.uk/genomes/. Genome
Biology www.ncbi.nlm.nih.gov/Genomes/.
4Introduction/Brief History
Protein Motifs Database
Motifs are short sequences of amino acids that
reflect a functional aspect of a protein. It
contains domains of proteins such ATP-binding
cassette (ABC-domain) or Kinase domain
1. Protein Family database (Pfam). Founded in
1996 and is maintained by consortum of scientists
such as Erik Sonnhammer (CGB, KI, Sweden), Sean
Eddy (WashU, St Louis USA), Richard Durbin, Alan
Bateman and Ewan Birney (Sanger Centre, UK)
2. PROSITE. Amos Bairoch, is part of SWISS-PROT
5Introduction/Brief History
Macromolecular 3D structures Database
Protein Data Bank (PDB). The primary database
for 3D structurs of biological molecules.
Started in the 1970s at the Brookhaven Lab on
Long Island, New York State, US. In 1999, the
management was moved to the Research
Collaboratory for Structural Bionformatics (RCSB)
The SCOP (Structural Classification of Proteins)
database was started by Alexey Murzin in 1994
(Lab of Molecular Biology, MRC, Cambridge, UK)
The CATH database (Class, architecure, topology,
homologous superfamily) It was started by
Christine Orengo in Janet Thornton's lab
(University College London) in 1996.
6Introduction/Brief History
Metabolic Pathways Database
7(No Transcript)
8(No Transcript)
9Web Access www.ncbi.nlm.nih.gov
10Organization of GenBankTraditional Divisions
- Records are divided into 17 Divisions.
- 11 Traditional
- 6 Bulk
PRI (28) Primate PLN (13) Plant and
Fungal BCT (11) Bacterial and Archeal INV
(7) Invertebrate ROD (15) Rodent VRL (4)
Viral VRT (7) Other Vertebrate MAM (1)
Mammalian PHG (1) Phage SYN (1) Synthetic
(cloning vectors) UNA (1) Unannotated
- Traditional Divisions
- Direct Submissions
- (Sequin and BankIt)
- Accurate
- Well characterized
Entrez query gbdiv_xxxProperties
From www.ncbi.nlm.nih.gov
11Organization of GenBankBulk Divisions
- Records are divided into 17 Divisions.
- 11 Traditional
- 6 Bulk
EST (355) Expressed Sequence Tag GSS (132)
Genome Survey Sequence HTG (62) High
Throughput Genomic STS (5) Sequence Tagged
Site HTC (6) High Throughput cDNA PAT (17)
Patent
- BULK Divisions
- Batch Submission
- (Email and FTP)
- Inaccurate
- Poorly characterized
Entrez query gbdiv_xxxProperties
From www.ncbi.nlm.nih.gov
12Other NCBI Databases
- dbSNP nucleotide polymorphism
- Geo Gene Expression Omnibus
- microarray and other expression data
- Gene gene records
- Unifies LocusLink and Microbial Genomes
- Structure imported structures (PDB)
- Cn3D viewer, NCBI curation
- CDD conserved domain database
- Protein families (COGs)
- Single domains (PFAM, SMART, CD)
From www.ncbi.nlm.nih.gov
13Web Access www.ncbi.nlm.nih.gov
14BLAST Sequence Similarity Searches
From www.ncbi.nlm.nih.gov
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19File Formats of theSequence Databases
Each sequence is represented by a text record
called a flat file.
- GenBank/GenPept (useful for scientists)
- FASTA (the simplest format)
- ASN.1 XML (useful for programmers)
From www.ncbi.nlm.nih.gov
20A TraditionalGenBank Record
LOCUS AY182241 1931 bp
mRNA linear PLN 04-MAY-2004 DEFINITION
Malus x domestica (E,E)-alpha-farnesene synthase
(AFS1) mRNA, complete cds. ACCESSION
AY182241 VERSION AY182241.2
GI32265057 KEYWORDS . SOURCE Malus x
domestica (cultivated apple) ORGANISM Malus x
domestica Eukaryota Viridiplantae
Streptophyta Embryophyta Tracheophyta
Spermatophyta Magnoliophyta eudicotyledons
core eudicots rosids eurosids I
Rosales Rosaceae Maloideae Malus. REFERENCE
1 (bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Cloning and functional
expression of an (E,E)-alpha-farnesene
synthase cDNA from peel tissue of apple fruit
JOURNAL Planta 219, 84-94 (2004) REFERENCE 2
(bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Direct Submission
JOURNAL Submitted (18-NOV-2002) PSI-Produce
Quality and Safety Lab, USDA-ARS,
10300 Baltimore Ave. Bldg. 002, Rm. 205,
Beltsville, MD 20705, USA REFERENCE
3 (bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Direct Submission
JOURNAL Submitted (25-JUN-2003) PSI-Produce
Quality and Safety Lab, USDA-ARS,
10300 Baltimore Ave. Bldg. 002, Rm. 205,
Beltsville, MD 20705, USA REMARK
Sequence update by submitter COMMENT On Jun
26, 2003 this sequence version replaced
gi27804758. FEATURES
Location/Qualifiers source 1..1931
/organism"Malus x
domestica" /mol_type"mRNA"
/cultivar"'Law Rome'"
/db_xref"taxon3750"
/tissue_type"peel" gene
1..1931 /gene"AFS1"
CDS 54..1784
/gene"AFS1" /note"terpene
synthase" /codon_start1
/product"(E,E)-alpha-farnesene
synthase" /protein_id"AAO228
48.2" /db_xref"GI32265058"
/translation"MEFRVHLQADNEQKI
FQNQMKPEPEASYLINQRRSANYKPNIWK
NDFLDQSLISKYDGDEYRKLSEKLIEEVKIYISAETMDLVAKLELIDSV
RKLGLANLF EKEIKEALDSIAAIESDNL
GTRDDLYGTALHFKILRQHGYKVSQDIFGRFMDEKGTLE
DFLHKNEDLLYNISLIVRLNNDLGTSAAEQERGDSPSSI
VCYMREVNASEETARKNIK
GMIDNAWKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSLYKDGDGFGDQ
EKGPRTHI LSLLFQPLVN" ORIGIN
1 ttcttgtatc ccaaacatct cgagcttctt
gtacaccaaa ttaggtattc actatggaat 61
tcagagttca cttgcaagct gataatgagc agaaaatttt
tcaaaaccag atgaaacccg 121 aacctgaagc
ctcttacttg attaatcaaa gacggtctgc aaattacaag
ccaaatattt 181 ggaagaacga tttcctagat
caatctctta tcagcaaata cgatggagat gagtatcgga
241 agctgtctga gaagttaata gaagaagtta agatttatat
atctgctgaa acaatggatt //
The Flatfile Format
From www.ncbi.nlm.nih.gov
21The Header
LOCUS AY182241 1931 bp
mRNA linear PLN 04-MAY-2004 DEFINITION
Malus x domestica (E,E)-alpha-farnesene synthase
(AFS1) mRNA, complete cds. ACCESSION
AY182241 VERSION AY182241.2
GI32265057 KEYWORDS . SOURCE Malus x
domestica (cultivated apple) ORGANISM Malus x
domestica Eukaryota Viridiplantae
Streptophyta Embryophyta Tracheophyta
Spermatophyta Magnoliophyta eudicotyledons
core eudicots rosids eurosids I
Rosales Rosaceae Maloideae Malus. REFERENCE
1 (bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Cloning and functional
expression of an (E,E)-alpha-farnesene
synthase cDNA from peel tissue of apple fruit
JOURNAL Planta 219, 84-94 (2004) REFERENCE 2
(bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Direct Submission
JOURNAL Submitted (18-NOV-2002) PSI-Produce
Quality and Safety Lab, USDA-ARS,
10300 Baltimore Ave. Bldg. 002, Rm. 205,
Beltsville, MD 20705, USA REFERENCE
3 (bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Direct Submission
JOURNAL Submitted (25-JUN-2003) PSI-Produce
Quality and Safety Lab, USDA-ARS,
10300 Baltimore Ave. Bldg. 002, Rm. 205,
Beltsville, MD 20705, USA REMARK
Sequence update by submitter COMMENT On Jun
26, 2003 this sequence version replaced
gi27804758.
From www.ncbi.nlm.nih.gov
22Header Locus Line
LOCUS AY182241 1931 bp
mRNA linear PLN 04-MAY-2004 DEFINITION
Malus x domestica (E,E)-alpha-farnesene synthase
(AFS1) mRNA, complete cds. ACCESSION
AY182241 VERSION AY182241.2
GI32265057 KEYWORDS . SOURCE Malus x
domestica (cultivated apple) ORGANISM Malus x
domestica Eukaryota Viridiplantae
Streptophyta Embryophyta Tracheophyta
Spermatophyta Magnoliophyta eudicotyledons
core eudicots rosids eurosids I
Rosales Rosaceae Maloideae Malus. REFERENCE
1 (bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Cloning and functional
expression of an (E,E)-alpha-farnesene
synthase cDNA from peel tissue of apple fruit
JOURNAL Planta 219, 84-94 (2004) REFERENCE 2
(bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Direct Submission
JOURNAL Submitted (18-NOV-2002) PSI-Produce
Quality and Safety Lab, USDA-ARS,
10300 Baltimore Ave. Bldg. 002, Rm. 205,
Beltsville, MD 20705, USA REFERENCE
3 (bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Direct Submission
JOURNAL Submitted (25-JUN-2003) PSI-Produce
Quality and Safety Lab, USDA-ARS,
10300 Baltimore Ave. Bldg. 002, Rm. 205,
Beltsville, MD 20705, USA REMARK
Sequence update by submitter COMMENT On Jun
26, 2003 this sequence version replaced
gi27804758.
LOCUS AY182241 1931 bp mRNA linear
PLN 04-MAY-2004
From www.ncbi.nlm.nih.gov
23Header Database Identifiers
LOCUS AY182241 1931 bp
mRNA linear PLN 04-MAY-2004 DEFINITION
Malus x domestica (E,E)-alpha-farnesene synthase
(AFS1) mRNA, complete cds. ACCESSION
AY182241 VERSION AY182241.2
GI32265057 KEYWORDS . SOURCE Malus x
domestica (cultivated apple) ORGANISM Malus x
domestica Eukaryota Viridiplantae
Streptophyta Embryophyta Tracheophyta
Spermatophyta Magnoliophyta eudicotyledons
core eudicots rosids eurosids I
Rosales Rosaceae Maloideae Malus. REFERENCE
1 (bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Cloning and functional
expression of an (E,E)-alpha-farnesene
synthase cDNA from peel tissue of apple fruit
JOURNAL Planta 219, 84-94 (2004) REFERENCE 2
(bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Direct Submission
JOURNAL Submitted (18-NOV-2002) PSI-Produce
Quality and Safety Lab, USDA-ARS,
10300 Baltimore Ave. Bldg. 002, Rm. 205,
Beltsville, MD 20705, USA REFERENCE
3 (bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Direct Submission
JOURNAL Submitted (25-JUN-2003) PSI-Produce
Quality and Safety Lab, USDA-ARS,
10300 Baltimore Ave. Bldg. 002, Rm. 205,
Beltsville, MD 20705, USA REMARK
Sequence update by submitter COMMENT On Jun
26, 2003 this sequence version replaced
gi27804758.
- Accession
- Stable
- Reportable
- Universal
ACCESSION AY182241 VERSION AY182241.2
GI32265057
Version Tracks changes in sequence
GI number NCBI internal use
From www.ncbi.nlm.nih.gov
24Header Organism
LOCUS AY182241 1931 bp
mRNA linear PLN 04-MAY-2004 DEFINITION
Malus x domestica (E,E)-alpha-farnesene synthase
(AFS1) mRNA, complete cds. ACCESSION
AY182241 VERSION AY182241.2
GI32265057 KEYWORDS . SOURCE Malus x
domestica (cultivated apple) ORGANISM Malus x
domestica Eukaryota Viridiplantae
Streptophyta Embryophyta Tracheophyta
Spermatophyta Magnoliophyta eudicotyledons
core eudicots rosids eurosids I
Rosales Rosaceae Maloideae Malus. REFERENCE
1 (bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Cloning and functional
expression of an (E,E)-alpha-farnesene
synthase cDNA from peel tissue of apple fruit
JOURNAL Planta 219, 84-94 (2004) REFERENCE 2
(bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Direct Submission
JOURNAL Submitted (18-NOV-2002) PSI-Produce
Quality and Safety Lab, USDA-ARS,
10300 Baltimore Ave. Bldg. 002, Rm. 205,
Beltsville, MD 20705, USA REFERENCE
3 (bases 1 to 1931) AUTHORS Pechous,S.W. and
Whitaker,B.D. TITLE Direct Submission
JOURNAL Submitted (25-JUN-2003) PSI-Produce
Quality and Safety Lab, USDA-ARS,
10300 Baltimore Ave. Bldg. 002, Rm. 205,
Beltsville, MD 20705, USA REMARK
Sequence update by submitter COMMENT On Jun
26, 2003 this sequence version replaced
gi27804758.
SOURCE Malus x domestica (cultivated apple)
ORGANISM Malus x domestica Eukaryota
Viridiplantae Streptophyta Embryophyta
Tracheophyta Spermatophyta Magnoliophyta
eudicotyledons core eudicots rosids
eurosids I Rosales Rosaceae
Maloideae Malus.
NCBI-controlled taxonomy
From www.ncbi.nlm.nih.gov
25The Feature Table
FEATURES Location/Qualifiers
source 1..1931
/organism"Malus x domestica"
/mol_type"mRNA"
/cultivar"'Law Rome'"
/db_xref"taxon3750"
/tissue_type"peel" gene 1..1931
/gene"AFS1" CDS
54..1784 /gene"AFS1"
/note"terpene synthase"
/codon_start1
/product"(E,E)-alpha-farnesene synthase"
/protein_id"AAO22848.2"
/db_xref"GI32265058"
/translation"MEFRVHLQADNEQKIFQNQMKPEPEASYLINQR
RSANYKPNIWK
NDFLDQSLISKYDGDEYRKLSEKLIEEVKIYISAETMDLVAKLELIDSVR
KLGLANLF EKEIKEALDSIAAIESDNLG
TRDDLYGTALHFKILRQHGYKVSQDIFGRFMDEKGTLE
NHHFAHLKGMLELFEASNLGFEGEDILDEAKASLTLALRD
SGHICYPDSNLSRDVVHS
LELPSHRRVQWFDVKWQINAYEKDICRVNATLLELAKLNFNVVQAQLQKN
LREASRWW ANLGIADNLKFARDRLVECF
ACAVGVAFEPEHSSFRICLTKVINLVLIIDDVYDIYGS
EEELKHFTNAVDRWDSRETEQLPECMKMCFQVLYNTTCEI
AREIEEENGWNQVLPQLT
KVWADFCKALLVEAEWYNKSHIPTLEEYLRNGCISSSVSVLLVHSFFSIT
HEGTKEMA DFLHKNEDLLYNISLIVRLN
NDLGTSAAEQERGDSPSSIVCYMREVNASEETARKNIK
GMIDNAWKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSL
YKDGDGFGDQEKGPRTHI
LSLLFQPLVN"
start (atg)
stop (tag)
Coding sequence
From www.ncbi.nlm.nih.gov
26The Sequence 99.99 Accurate
ORIGIN 1 ttcttgtatc ccaaacatct
cgagcttctt gtacaccaaa ttaggtattc actatggaat
61 tcagagttca cttgcaagct gataatgagc agaaaatttt
tcaaaaccag atgaaacccg 121 aacctgaagc
ctcttacttg attaatcaaa gacggtctgc aaattacaag
ccaaatattt 181 ggaagaacga tttcctagat
caatctctta tcagcaaata cgatggagat gagtatcgga
1741 ggacccacat cctgtcttta ctattccaac
ctcttgtaaa ctagtactca tatagtttga 1801
aataaatagc agcaaaagtt tgcggttcag ttcgtcatgg
ataaattaat ctttacagtt 1861 tgtaacgttg
ttgccaaaga ttatgaataa aaagttgtag tttgtcgttt
aaaaaaaaaa 1921 aaaaaaaaaa a //
From www.ncbi.nlm.nih.gov
27GenPept FASTA format
gtgi32265058gbAAO22848.2 (E,E)-alpha-farnesene
synthase Malus x domestica MEFRVHLQADNEQKIFQNQMK
PEPEASYLINQRRSANYKPNIWKNDFLDQSLISKYDGDEYRKLSEKLIE
EVKIYISAETMDLVAKLELIDSVRKLGLANLFEKEIKEALDSIAAIESDN
LGTRDDLYGTALHFKILRQH GYKVSQDIFGRFMDEKGTLENHHFAHLKG
MLELFEASNLGFEGEDILDEAKASLTLALRDSGHICYPDSN LSRDVVHS
LELPSHRRVQWFDVKWQINAYEKDICRVNATLLELAKLNFNVVQAQLQKN
LREASRWWANLG IADNLKFARDRLVECFACAVGVAFEPEHSSFRICLTK
VINLVLIIDDVYDIYGSEEELKHFTNAVDRWDS RETEQLPECMKMCFQV
LYNTTCEIAREIEEENGWNQVLPQLTKVWADFCKALLVEAEWYNKSHIPT
LEEY LRNGCISSSVSVLLVHSFFSITHEGTKEMADFLHKNEDLLYNISL
IVRLNNDLGTSAAEQERGDSPSSIV CYMREVNASEETARKNIKGMIDNA
WKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSLYKDGDGFGDQEK GPR
THILSLLFQPLVN gtgi32265070gbAAP75563.1
putative doublecortin domain-containing protein
MAKTGAEDHREALSQSSLSLLTEAMEVLQQSSPEGTLDGNTVNPIYKYI
LNDLPREFMSSQAKAVIKTTD DYLQSQFGPNRLVHSAAVSEGSGLQDCS
THQTASDHSHDEISDLDSYKSNSKNNSCSISASKRNRPVSAP VGQLRVA
EFSSLKFQSARNWQKLSQRHKLQPRVIKVTAYKNGSRTVFARVTAPTITL
LLEECTEKLNLNM AARRVFLADGKEALEPEDIPHEADVYVSTGEPFLNP
FKKIKDHLLLIKKVTWTMNGLMLPTDIKRRKTKP VLSIRMKKLTERTSV
RILFFKNGMGQDGHEITVGKETMKKVLDTCTIRMNLNLPARYFYDLYGRK
IEDIS KGKH
From www.ncbi.nlm.nih.gov
28Abstract Syntax Notation ASN.1
Seq-entry set class nuc-prot , descr
title "Malus x domestica (E,E)-alpha-farnesene
synthase (AFS1) mRNA, complete cds." ,
source org taxname "Malus x
domestica" , common "cultivated apple" ,
db db "taxon" ,
tag id 3750 ,
orgname name binomial
genus "Malus" ,
species "x domestica" , mod
subtype cultivar ,
subname "'Law Rome'" ,
subtype old-name , subname
"Malus domestica" , attrib
"(10)cultivar'Law Rome'" , lineage
"Eukaryota Viridiplantae Streptophyta
Embryophyta Tracheophyta Spermatophyta
Magnoliophyta eudicotyledons core eudicots
rosids eurosids I Rosales Rosaceae Maloideae
Malus" , gcode 1 ,,
From www.ncbi.nlm.nih.gov
29(No Transcript)
30Choose a reference organism
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)