Bioinformatics for Genomic and Proteomic data analysis - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Bioinformatics for Genomic and Proteomic data analysis

Description:

Predicting phyico-chemical properties of protein (ProtParam) ... frog A chick A mouse A. mouse B chick B frog B. paralogs. orthologs orthologs. gene duplication ... – PowerPoint PPT presentation

Number of Views:502
Avg rating:3.0/5.0
Slides: 52
Provided by: peter940
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics for Genomic and Proteomic data analysis


1
Bioinformatics for Genomic and Proteomic data
analysis
  • Sequence Analysis

-- Gene Prediction
-- Alignment techniques (BLAST, PSI-BLAST)
-- Major databases and retrieval techniques.
-- Predicting Function, domains etc.
-- finding homology between sequences,
identifying repeats etc (DOTPLOT).
-- Predicting phyico-chemical properties of
protein (ProtParam).
-- Predicting signal peptides and transmembrane
proteins (SignalP).
-- Phylogenetic analysis
  • Structure analysis

-- Analysis of Protein structure and conformation
(Rasmol, SwissPDBViewer, VMD etc).
-- Protein structure predictions- Homology
modeling (SwissModel, Modeller).
  • Some practical applications

-- Sequence analysis
-- Structure analysis
2
Major Bioinformatics databases, Search engines
and data formats.
By Sachin Pundhir Bioinformatics
sub-centre DAVV, Indore
3
Database
  • Collection of records and files
  • Organized for a particular purpose
  • Tables
  • Tuples (records)
  • Attributes
  • Values

4
BIO520 Student Database
  • 1998
  • Name ID Grade
  • Amy 123 A
  • Joe 456 B
  • Sue 789 C

.
.
5
Database Operations
1998 Name ID Grade Amy 123 A Joe 456
B Sue 789 C
  • Tables
  • Create, delete
  • Tuples (Records)
  • Read,write, delete
  • Search, sort, modify, print

6
International Nucleotide Sequence Database
Collaboration (INSDC)
  • Consists of
  • DDBJ (Japan)
  • GenBank (USA)
  • EMBL Nucleotide Sequence Database.
  • The three databases exchange new and updated data
    on a daily basis to achieve optimal
    synchronisation.

7
Bioinformatics databases
  • Protein sequence database
  • Genpept Protein sequence database.
  • UniProtKB/Swiss-Prot curated protein sequence
    database, minimal level of redundancy and high
    level of integration with other databases.
  • UniProtKB/TrEMBL computer-annotated supplement
    of Swiss-Prot that contains all the
    translations of EMBL nucleotide sequence
    entries not yet integrated in Swiss-Prot.
  • Refseq Well curated, non-redundant database.
  • Nucleotide sequence database
  • Genbank Nucleotide sequence database. Highly
    redundant.
  • DDBJ DNA Data Bank of Japan.
  • EMBL nucleotide sequence database.
  • Refseq integrated, non-redundant set of
    sequences, including genomic DNA, transcript
    (RNA), and protein products, for major research
    organisms.
  • Structure Database
  • PDB Protein Data Bank
  • MMDB Molecular Modeling Database

Primary databases
8
GenBank Record
  • Header
  • information that apply to the whole record
  • Features
  • annotations on the record
  • Sequence

9
GenBank Record
GeneBank Record
Header
modification date
Molecule Type
Locus Name
Sequence Length
Modification Date
Accession Number
Version Number
GenBank Division
10
GeneBank Record
FEATURE
Link to Seq
11
GenBank Record
Sequence
12
Using Entrez
  • An integrated database
  • search and retrieval system

13
WWWAccess
Entrez BLAST
14
Entrez Database Integration
Word weight
PubMed abstracts
3-D Structure
3 -D Structure
Taxonomy
VAST
Genomes
Phylogeny
Protein sequences
Nucleotide sequences
BLAST
BLAST
15
The (ever expanding) Entrez System
16
Database Searching with Entrez
  • Using limits and field restriction to find human
    MutL homolog
  • Linking and neighboring with MutL

17
Global Entrez Search
18
Document SummariesMutLAll Fields
19
Entrez Nucleotides Limits Preview/Index
Tabs
20
Entrez Nucleotides Limits
Exclude bulk sequences
21
Entrez Nucleotides Limits
Title Definition
Exclude Bulk Sequences
22
Document Summaries Limits
23
Adding Terms Preview/Index
Accession All Fields Author Name EC/RN
Number Feature key Filter Gene Name Issue Journal
Name Keyword Modification Date Organism Page
Number Primary Accession Properties Protein
Name Publication Date SeqID String Sequence
Length Substance Name Text Word Title Uid Volume
24
Human MutL Search Results
25
Human MutL RefSeq
26
NM_000249 Links
27
Literature Links
  • PubMed
  • OMIM

28
NM_000249 PubMed
Books
29
Books Link
30
OMIM Human Disease Genes
31
Sequence Links
  • Nucleotide Protein

32
NM_000249 Related Sequences
similarity
33
Taxonomy Link
  • The Tax Browser
  • NCBIs Taxonomy

34
Taxonomy Link
35
The Tax Browser
36
Batch Downloads
37
Batch Downloads FASTA and GI list
38
Batch Entrez / Entrez-utilities
39
NCBI Protein Databases
  • GenPept GenBank, EMBL, DDBJ CDS translations
  • RefSeq mRNA based (NP_) and genome based (XP_)
  • Swiss-Prot curated high quality protein reviews
  • PIR protein information resource Georgetown
    University
  • PRF protein resource foundation
  • PDB Protein Databank sequences from structures

40
Protein Link
BLAST Link
Conserved Domains
41
Related Proteins Redundancy
Redundant Sequences
42
Related Proteins Links
43
BLink non-redundant relatives
Arabidopsis homolog
Conserved Domain
44
MLH1 Domain Structure CDD
45
MLH1 ATPase Domain
46
ATPase structural alignment
ATP Binding site helix
47
Genome Resources
48
NM_000249 Genome Links
49
Higher Genome Resources
50
MLH1 UniGene Cluster
51
ESTs in UniGene
52
The New Homologene
  • No longer UniGene based
  • Protein similarities first
  • Guided by taxonomic tree
  • Includes orthologs and paralogs

53
The New Homologene
54
Entrez Genes integrated gene-based access
  • LocusLink
  • Complete Genomes
  • eukaryotic
  • microbial
  • organelle

55
Genes MLH1 Central Resource
56
  • QUESTIONS!!!
Write a Comment
User Comments (0)
About PowerShow.com