Title: Bioinformatics for Genomic and Proteomic data analysis
1Bioinformatics for Genomic and Proteomic data
analysis
-- Gene Prediction
-- Alignment techniques (BLAST, PSI-BLAST)
-- Major databases and retrieval techniques.
-- Predicting Function, domains etc.
-- finding homology between sequences,
identifying repeats etc (DOTPLOT).
-- Predicting phyico-chemical properties of
protein (ProtParam).
-- Predicting signal peptides and transmembrane
proteins (SignalP).
-- Phylogenetic analysis
-- Analysis of Protein structure and conformation
(Rasmol, SwissPDBViewer, VMD etc).
-- Protein structure predictions- Homology
modeling (SwissModel, Modeller).
- Some practical applications
-- Sequence analysis
-- Structure analysis
2Major Bioinformatics databases, Search engines
and data formats.
By Sachin Pundhir Bioinformatics
sub-centre DAVV, Indore
3Database
- Collection of records and files
- Organized for a particular purpose
- Tables
- Tuples (records)
- Attributes
- Values
4BIO520 Student Database
- 1998
- Name ID Grade
- Amy 123 A
- Joe 456 B
- Sue 789 C
.
.
5Database Operations
1998 Name ID Grade Amy 123 A Joe 456
B Sue 789 C
- Tables
- Create, delete
- Tuples (Records)
- Read,write, delete
- Search, sort, modify, print
6International Nucleotide Sequence Database
Collaboration (INSDC)
- Consists of
- DDBJ (Japan)
- GenBank (USA)
- EMBL Nucleotide Sequence Database.
- The three databases exchange new and updated data
on a daily basis to achieve optimal
synchronisation.
7Bioinformatics databases
- Protein sequence database
- Genpept Protein sequence database.
- UniProtKB/Swiss-Prot curated protein sequence
database, minimal level of redundancy and high
level of integration with other databases. - UniProtKB/TrEMBL computer-annotated supplement
of Swiss-Prot that contains all the
translations of EMBL nucleotide sequence
entries not yet integrated in Swiss-Prot. - Refseq Well curated, non-redundant database.
- Nucleotide sequence database
- Genbank Nucleotide sequence database. Highly
redundant. - DDBJ DNA Data Bank of Japan.
- EMBL nucleotide sequence database.
- Refseq integrated, non-redundant set of
sequences, including genomic DNA, transcript
(RNA), and protein products, for major research
organisms.
- Structure Database
- PDB Protein Data Bank
- MMDB Molecular Modeling Database
Primary databases
8GenBank Record
- Header
- information that apply to the whole record
- Features
- annotations on the record
- Sequence
9GenBank Record
GeneBank Record
Header
modification date
Molecule Type
Locus Name
Sequence Length
Modification Date
Accession Number
Version Number
GenBank Division
10GeneBank Record
FEATURE
Link to Seq
11GenBank Record
Sequence
12Using Entrez
- An integrated database
- search and retrieval system
13WWWAccess
Entrez BLAST
14Entrez Database Integration
Word weight
PubMed abstracts
3-D Structure
3 -D Structure
Taxonomy
VAST
Genomes
Phylogeny
Protein sequences
Nucleotide sequences
BLAST
BLAST
15The (ever expanding) Entrez System
16Database Searching with Entrez
- Using limits and field restriction to find human
MutL homolog - Linking and neighboring with MutL
17Global Entrez Search
18Document SummariesMutLAll Fields
19Entrez Nucleotides Limits Preview/Index
Tabs
20Entrez Nucleotides Limits
Exclude bulk sequences
21Entrez Nucleotides Limits
Title Definition
Exclude Bulk Sequences
22Document Summaries Limits
23Adding Terms Preview/Index
Accession All Fields Author Name EC/RN
Number Feature key Filter Gene Name Issue Journal
Name Keyword Modification Date Organism Page
Number Primary Accession Properties Protein
Name Publication Date SeqID String Sequence
Length Substance Name Text Word Title Uid Volume
24Human MutL Search Results
25Human MutL RefSeq
26NM_000249 Links
27Literature Links
28NM_000249 PubMed
Books
29Books Link
30OMIM Human Disease Genes
31Sequence Links
32NM_000249 Related Sequences
similarity
33Taxonomy Link
- The Tax Browser
- NCBIs Taxonomy
34Taxonomy Link
35The Tax Browser
36Batch Downloads
37Batch Downloads FASTA and GI list
38Batch Entrez / Entrez-utilities
39NCBI Protein Databases
- GenPept GenBank, EMBL, DDBJ CDS translations
- RefSeq mRNA based (NP_) and genome based (XP_)
- Swiss-Prot curated high quality protein reviews
- PIR protein information resource Georgetown
University - PRF protein resource foundation
- PDB Protein Databank sequences from structures
40Protein Link
BLAST Link
Conserved Domains
41Related Proteins Redundancy
Redundant Sequences
42Related Proteins Links
43BLink non-redundant relatives
Arabidopsis homolog
Conserved Domain
44MLH1 Domain Structure CDD
45MLH1 ATPase Domain
46ATPase structural alignment
ATP Binding site helix
47Genome Resources
48NM_000249 Genome Links
49Higher Genome Resources
50MLH1 UniGene Cluster
51ESTs in UniGene
52The New Homologene
- No longer UniGene based
- Protein similarities first
- Guided by taxonomic tree
- Includes orthologs and paralogs
53The New Homologene
54Entrez Genes integrated gene-based access
- LocusLink
- Complete Genomes
- eukaryotic
- microbial
- organelle
55Genes MLH1 Central Resource
56