Title: Bioinformatics and Protein Database Concepts
1Bioinformatics and Protein Database Concepts
With the emergence of high-throughput techniques
for generation of protein sequences,
?computational tools are required for storing,
sharing, analyzing and updating this data.
Databases and its associated features provide
tools for accomplishing meaningful storage of
biological data.
2Master Layout Part 1
1
This animation consists of 2 parts Part 1 From
wet lab to Bioinformatics Part 2 Database
concepts and Protein databases
Extract protein, purify and cleave it into
smaller peptides.
2
Protein extract
3
Mass Spectrometry
Edman degradation
4
Protein sequences determined and stored in
databases for future usage
MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDLGEENFKALVLIA
FAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLCT
VATLRETYGEMADCCAKQEP ERNECFLQHKDDNP
Protein Sequence
5
Re-draw all images Reference Biochemistry by
Stryer et al., 5th edition
3Definitions of the componentsPart 1 From wet
lab to bioinformatics
1
- Protein Protein is a bio-molecule made out of
chains of amino acid residues. These chains are
formed between amino-acids by eliminating a water
molecule and forming a peptide bond. Proteins
are involved in performing the structural,
functional and regulatory functions of the cell. - Peptide Small protein fragments which are formed
by a stretch of around 50 amino-acids are called
peptides - Amino acid sequence The order of amino acids and
their linear arrangement is known as amino-acid
sequence. It is also known as the primary
structure of the protein. - Edman degradation This is a chemical method for
sequencing amino acid residues in a protein or a
peptide. The N-terminal residue is labelled using
phenyl isothiocyanate and then cleaved from the
remaining peptide chain without disrupting any of
the other peptide bonds. This labelled amino acid
is then detected and the procedure is repeated to
identify each N-terminal amino acid sequentially. - Mass spectrometry A technique for production and
detection of charged molecular species in vacuum,
after their separation by magnetic and electric
fields based on mass to charge (m/z) ratio.
2
3
4
5
4Step 1 Protein Extraction
1
Break open the cells
Protein source (usually a cultured tissue or
microbial extract)
2
Re-suspend the extract in lysis buffer
3
Centrifugation
Supernatant containing proteins is isolated
CENTRIFUGE
Crude Extract
4
Action
Audio Narration
Description of the action
As shown in animation
The cells present in the tissue culture are lysed
open thereby releasing crude extract. This
extract is centrifuged to separate the protein
mixture from the cell debris. The supernatant
obtained is made up of a mixture of proteins
having a variety of properties. Protein of
interest must then be isolated from this mixture.
Redraw all the figures. Animator has to re-draw
the figure titled CENTRIFUGE with all the
labeling, as it has been taken from a
web-resource. On protein source show the zoom
in effect focusing on purple molecule. Show the
arrow that leads to breaking the molecule. Add
a spin effect on the crude extract to depict
centrifugation. Remove the supernatant (orange
liquid in last figure) .
5
- http//3.bp.blogspot.com/_xW3FQUQ2DYI/Rp4DF1r_0HI/
AAAAAAAAAhY/B5MzdxVSV6I/s400/centrifugation.png - Biochemistry by Stryer et al., 5th edition
5Step 1 Protein Extraction
1
Proteins are purified using various techniques
such as
2
Solution containing purified protein extract.
Proteins are cleaved into smaller peptides using
proteases.
Chromatography
3
Electrophoresis
Action
Audio Narration
4
Description of the action
The protein of interest is separated from the
protein mixture present in the supernatant. This
is carried out by suitable techniques such as
chromatography or electrophoresis which make use
of various properties of the proteins such as
their charge, mass etc for separation.
This slide is in continuation with the previous
slide. Show the arrow from first figure to the
two techniques. Then show converging arrows to
the last figure
As shown in animation
5
Biochemistry by Stryer et al., 5th
edition Biochemistry by A.L.Lehninger et al., 3rd
edition
6Step 2 Edman Degradation
1
Peptide to be sequenced Ala-Gly-Asp-Phe-Arg-Gly
First round
2
3
4
Action
Audio Narration
Description of the action
Edman degradation employs pheny isothiocyanate
reagent, which reacts with the amino terminal
residue of the peptide giving rise to phenyl
thiocarbamoyl derivative of the amino-acid
reside. In mild acidic conditions, this cyclic
derivative of the amino acid is released in the
form of a PTH-amino acid, which can then be
identified by chromatographic techniques. The
procedure is then repeated to identify each
N-terminal amino acid sequentially.
Breakdown of Molecule
Re-draw all images. Both sides depict the same
process. Left side is the schematic and right
side is the same process at molecular level. Show
the steps of both processes in a parallel fashion
5
Biochemistry by Stryer et al., 5th editiond
edition
7Step 3 Mass Spectrometry
1
Vacuum Envelope
Detection
Ionization
Mass Analyzer (filtering)
2
Sort Ions by Mass (m/z)
Forms ions (charged molecules)
Detects ions
3
Data Processing
Data System
Relative Abundance
4
Action
Audio Narration
Description of the action
From 1st figure show an arrow leading to figure 2
Ion Source. From their Arrow leads to Mass
Analyzer followed by Ion Detector. Enclose all
figures n a box titled Vacuum Envelop. From
there on, arrow leads to the Data System and
then to Data Processing
The mass spectrometer is an instrument that
produces charged molecular species in vacuum,
separates them by means of electric and magnetic
fields and measures the mass-to-charge
ratios and relative abundances of the ions thus
produced. A tandem mass spectrometer makes use of
a combination of two mass analyzers, separated by
a collision cell, in order to provide improved
resolution of the fragment ions. The first mass
analyzer usually operates in a scanning mode in
order to select only a particular peptide ion
which is further fragmented and resolved in the
second analyzer. This can be used for protein
sequencing studies.
Experimental Process as shown in animation
5
8Master Layout Part 2
1
This animation consists of 2 parts Part 1 From
wet lab to Bioinformatics Part 2 Database
concepts and Protein databases
Based on the type of the data and its prospected
usage, design a database schema.
2
3
Provide software and analysis tools to access
this data
4
5
Re-draw all images. Reference Biochemistry by
Stryer et al., 5th edition
9Definitions of the componentsPart 2 Database
concepts and Protein databases
1
- Type of data The type of data stored in
Biological Databases can be of various types such
as Pure Sequences, Sequences with structure,
meta-data about the source of the sequence,
experimental detail, etc. - Prospected Usage The databases are primarily
used to store all the information in a single
web-based resource. It also provide analysis
tools for various sequence analysis functions
such as pair-wise sequence alignment, multiple
sequence alignment, homology modelling, etc - Database schema The design of the database at
various levels is called a database schema. It
includes the attributes of all individual tables
and the relationships between them. The schema is
defined at three levels, namely, Physical,
Logical and View. - Primary Database In biological database studies,
primary databases store only the protein sequence
information.
2
3
4
5
10Definitions of the componentsPart 2 Database
concepts and Protein databases
1
- Secondary Database In biological database
studies, secondary databases refer to the
repository of domains and patterns that occur
within a sequence. This information can be stored
in the form of signature patterns, fingerprints,
etc. - Structure Database In biological database
studies, structural database store the
three-dimensional geometry of the protein. It
stores the atomic coordinates of individual atoms
in the protein molecule and other geometrical
parameters along with sequence information. - Analysis tools Analysis tools are the software
tools that are available on most of the web-based
database sites. These tools help in conducting
further studies and analysis on protein sequences
such as alignment, phylogenetic predictions, etc. - Meta data Meta-data is the information about the
data that is getting documented in an database.
It covers various features such as the source of
data, methods for retrieval, etc.
2
3
4
5
11Step 1 A generic protein DB Types of data
1
- Source organism
- Scientific name and common name
- Taxonomy
- Organelle
- Amino-acid sequence
- Location
- Length of the sequence
- Molecular type and classification
- Accession and version, Gene ID
- Keywords an Feature table
- Patterns and Domains
2
Sequence
Source
Gene
Reference
3
- Source gene
- Corresponding mRNA
- Corresponding Coding Sequence (CDS)
- Author
- Title
- Journal
- Cross references
- Comments
Action
Audio Narration
Description of the action
4
Categories of data
Animate the sub-parts according to the order
given in this animation, i.e. Sequence followed
by its descriptive blue box). Similarly for
Source, Reference and Gene. Re-draw all
images
All data related to a protein can be divided into
four broad categories namely sequence details,
Source, Gene details and References. Sequence
details contain the features of a proteins amino
acid sequence such as the length, location,
patterns and identifiers of the protein sequence.
The source contains information based on the
biological source used for retrieving the
protein. Gene contains details of the gene from
which the proteins is being expressed.
Reference contains the details of the research
publication in which the study was reported.
5
http//www.ncbi.nlm.nih.gov/ http//expasy.org/ ht
tp//www.pdb.org/pdb/home/home.do http//www.ddbj.
nig.ac.jp/
12Step 2 A Generic Protein DB Schema
1
LOGICAL
VIEW
PHYSICAL
2
Describes which type of data will be stored in
which particular table and the relationships
between these tables.
Describes the user interface of the database and
the view that will be shown to the user.
Describes the physical location of storage of the
data within a database.
3
4
Action
Audio Narration
Description of the action
Database designing is done at various levels such
as Physical, Logical and View. At the physical
level, we define the purpose of the database
which is in accordance with the prospected usage.
At the logical level, we define the tables,
attributes of the tables and relationship between
tables . Logical level is the most complex and
important schema for databases and requires a
thorough understanding of the data and its
contexts and relationships. At the View level we
define the views and appearance of the database
Defines the various Database schemata
Show the three boxes in as the first step while
the narrator speaks the first line of audio
narration Database . and View. In the next
step of animation, show the text of each box
5
13Step 3 Protein Database characteristics
1
PRIMARY/SEQUENCE DATABASE
ANALYSIS TOOLS
- BLAST
- FASTA
- Multiple Sequence Alignment
- Structure Prediction
- Functional annotation
- Search engine
- Pattern and Domain alignment /search
2
DERIVED/SECONDARY DATABASE
TYPE
TOOLS
3
- PDB
- Proteopedia
- Biological Structural Database from EBI
STRUCTURAL DATABASE
Action
Audio Narration
Description of the action
4
Defines the various Database schemata
Show the central round figure followed by the 3
types of DB on the left and their examples. In
the end, show the Analysis tools and its examples
A typical biological database can be
characterized by its Type and its Tools. The
Type defines the category of data that it
includes, such as sequence, domains or structure.
This implies that the particular databases most
prominent feature includes either sequences,
domains or structure and it will primarily be
used for their analysis. The analysis tools
defines the platforms that the site will provide
for gaining an insight into the protein data.
5
http//www.ncbi.nlm.nih.gov/, http//expasy.org/,
http//www.pdb.org/pdb/home/home.do,
http//www.ddbj.nig.ac.jp/, http//www.ebi.ac.uk/D
atabases/structure.html http//www.uniprot.org/,
http//expasy.org/prosite/, http//prodom.prabi.fr
/prodom/current/html/home.php, http//pfam.sanger.
ac.uk/, http//www.proteopedia.org/wiki/index.php/
Main_Page
14Step 4 Database input formats
1
PROTEIN DATABASE
Enter your Query term
SEARCH DATABASES
2
Serum albumin
P01009
MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDLGEENFKALVLIA
FAQYLQQCPF EDHVKLVNEVTEFAKTCVADESAENCDKSLHTLFGDKLC
TVATLRETYGEMADCCAKQEP ERNECFLQHKDDNP
Acute phase OR blood coagulation OR Protease
inhibitor
9606NCBI
UNIQUE ID MOLECULE NAME AMINO-ACID SEQUENCE
KEYWORD LITERATURE GENE TAXONOMY
Full-length cDNA libraries and normalization
SERPINA1
3
Enter the amino-acid sequence of the protein to
be analyzed.
Enter the name of the molecule to be searched.
Ex- protein, peptide, gene related to the
protein, etc.
Enter the key-word to identify this protein.
Enter the literature related information like the
name of the journal, citation or title of the
research paper.
Enter the name of the gene that codes for the
protein or other gene related information
Enter the unique identification number for the
protein. These IDs vary according to database
such as accession number, GeneID, ODB ID, etc.
Enter taxonomic identifiers.
SEARCH
4
Action
Audio Narration
Description of the action
Follow the steps as shown in the animation. DO
NOT animate the yellow box. As the animated
cursor goes to Unique ID narrator will read the
text that comes in the yellow box displayed along
with Unique ID entry. This will be followed by
an example entry in the white box. Similarly,
Molecular Name will be followed by its
corresponding narration in yellow box, and so on.
- For extracting the protein information from a
database, users can give a variety of input
terms. These can be - Unique ID ltRead the text in the yellow box in
each casegt - Molecular Name
- Amino-acid sequence
- Keyword
- Literature
- Gene
- Taxonomy
Shows the general functions in a database
5
15Step 5 Database output formats
1
CITATIONS
PATTERN ANALYSIS
2
MOLECULAR DESCRIPTION
SOURCE ORGANISM DETAILS
ANNNOTATIONS
SECONDARY STRUCTURAL DETAILS
3
GENE NAMES AND DESCRIPTION
IDs OF ENTRIES IN RELATED DATABASE
EXPERIMENT DETAILS
4
Action
Audio Narration
Description of the action
- Once the user submits the query, the output can
be of multiple formats. The generalized
information that users can obtain from protein
databases is the proteins - General Description of the protein molecule
- Annotations of the protein
- Name and description of the gene that transcribes
them - ID of the same protein in other relevant
databases - Details of the experiment conducted for
characterizing proteins - Details of the Proteins secondary structure
- Details of the organism which was used as a
source for obtaining the protein - Citations of research conducted for obtaining
this protein - Patterns occurring within a sequence and their
analysis
Co-ordinate the animation with the audio
narration. For Example, in animation mode, the
first step is to display Molecular Description.
This display must have the first point of audio
narration spoken along with it. Show the outputs
tab as and when it is narrated
Output from database
5
16Step 6 Database Analysis Tools
1
OUTPUT
INPUT
2
ANALYSIS TOOLS
MAPWMHLLTVLALLALWGPNSVQAYSSQHLCGSNLVEALYMTCGRSGFYR
PHDRRELEDLQVEQAELGLEAGGLQPSALEMILQKRGIVDQCCNNICTFN
QLQNYCNVP
Identify physico-chemical properties such as
chemical formula, half-life, iso-electric point,
molecular weight, etc.
Aligned sequences and structures
Identify protein from sequence
Variable and conserved residues
Synonyms and Scientific terminology of proteins
Predicted Secondary and Tertiary Structures
3
4
Action
Audio Narration
Description of the action
- This slide shows the different kinds of analysis
that can be conducted on a given protein
sequence. The query can be the protein name,
sequence or any other identifier of the protein.
In this example, we provide the protein sequence
as Input. Once the query protein sequence is
entered into the Analysis tool, it can give
various kinds of results such as - Identify protein from sequence
- Identify physico-chemical properties such as
chemical formula, half-life, iso-electric point,
molecular weight, etc. - Aligned sequences and structures
- Variable and conserved residues
- Predicted Secondary and Tertiary Structures
- Synonyms and Scientific terminology of proteins
Input Output Slide
Display the panel in the left. In first step the
input appears, followed y the arrow embossed with
letters Analysis Tools. The output panel
appears thereafter, with each output appearing
one after the other. At display of each output,
the narrator to read aloud the text written
5
17Step 1 Case study To study the characteristics
of human serum albumin
1
2
PHYSICO-CHEMICAL PROPERTIES
DOMAIN ANALYSIS
STRUCTURAL ANALYSIS
OBTAIN FASTA SEQUENCE
3
View Full Animation
4
Action
Audio Narration
Description of the action
We explain the usage of Protein databases using
the example of Human Serum Albumin protein. If
you want to view a specific step in the case
study, click on the relevant panel. Else click on
View Full Animation
Slides with Options to chose a step or view fll
case study
Display the 4 panels in the animation. These 4
steps are in sequence, but the user must be given
an option to directly go to the specific step if
they want to. In the bottom, give a link to view
full case study
5
http//www.pdb.org/pdb/home/home.do
18Step 1.a Obtain FASTA Sequence SWISS PROT
1
2
3
Serum Albumin
4
Action
Audio Narration
Description of the action
Open a web browser and go to http//expasy.org/spr
ot/. On the top right corner of the page, there
will be a search box. Click on the downlink ahead
of the Search box (indicated by the arrow). We
get a list of options for the databases to search
from. Select UniProtKB. Type the name of the
protein of your choice (Ex -Serum Albumin ) in
the text box in front of the word for
Retrieving data
All the screen shots taken from the web-site
needs to be remade by the animator to simulate
the web based environment . None of the images
should be a part of the web database. Follow the
steps as shown in the animated flowchart
5
http//expasy.org/sprot/
19Step 1.b Obtain FASTA Sequence SWISS PROT
1
2
3
Action
Audio Narration
4
Description of the action
The results page for the search shows 179 hits
for our query. It is shown on the top of the
page. The first 25 of them are shown on the first
page, which can be viewed by scrolling down the
page. Click on the entry of your choice. Here we
click on the human Albumin hit (ALBU_HUMAN)
Re-make all the screen shots. Follow the steps as
shown in the animated flowchart
Retrieving data
5
http//expasy.org/sprot/
20Step 1.c Obtain FASTA Sequence SWISS PROT
1
2
3
Place for headings. Scroll down to find the word
Sequences in this position
Action
Audio Narration
4
Description of the action
The first image is displayed parallel to the
narration The top like this. When the arrow
appears read the second line of narration Search
forthe page. The second panel of images in this
slide goes parallel to narration Click on
tabnew tab. In the last panel
The top of the result page looks like this.
Search for the heading Sequences, by scrolling
down the page. Click on the tab FASTA next to the
sequence of your interest. The FASTA sequence
opens on a new tab. Save this FASTA sequence in
your computer.
Retrieving data
5
http//expasy.org/sprot/
21Step 1.d Analysis Tools
1
ProtParam
HeliQuest
2
Radar
SAPS
3
Three to One
ColorSeq
4
Action
Audio Narration
Description of the action
Show the chart with the color coded division for
types of tools as shown in figure. Highlight the
Primary Structural Analysis and follow it up by
the display of all the tabs on the right.
Highlight the first tool ProtParam
Types of Tools
Once the FASTA sequence is retreived, we can
subject it to variety of Protein Analysis toools
which are broadly classified into Sequence
Similarity search tools, Primary structural
analysis tools, Phylogenetic Analysis tools,
Molecular Modeling and Visualisation Tools and
Structure Prediction tools. Here we explore the
web based service called ProtParam which belongs
to Primary Structural Analysis tools. For
exploring other such services, users can visit
http//expasy.org/sprot/
5
http//expasy.org/sprot/
22Step 2.a Physico-chemical Properties SWISS
PROT
1
Enter the accession number
OR paste the sequence here
2
Delete the first line (descriptive line) from
your FASTA sequence, such that only the amino
acid sequence is there
Click on Compute Parameters
3
Action
Audio Narration
4
Description of the action
Re-make all the screen shots. Follow the steps as
shown in the animated flowchart
The front-end for the tool will ask you to input
the accession ID of the protein under study OR
the sequence of that protein. Delete the first
line (descriptive line) from your FASTA sequence,
such that only the amino acid sequence is there.
Click on Compute Parameters. On the results
page, scroll down to find the various
physico-chemical parameters of this protein
Tool Input
5
http//expasy.org/tools/protparam.html /
23Step 2.c Physico-chemical Properties SWISS
PROT
1
CSV stands for Comma Separated Values. Files
with .csv extension, can be easily accessed in
Plain text as well as spreadsheet formats
2
3
Action
Audio Narration
4
Description of the action
This part of the results gives the percentage of
each amino acid in the sequence. The highlighted
region indicates the CSV file link. CSV stands
for Comma Separated Values. which can be
opened from text as well as spread sheet formats.
This file can be downloaded in its comma
separated format, by clicking on it. CSV files
can also be opened with Microsoft Excel
Re-make all the screen shots. Follow the steps as
shown in the animated flowchart. When the user
clicks on the green highlighted tab, the
definition must be read aloud alongwith the
written display of the definition in a separate
box as shown in the slide animation
Tool Output
5
http//expasy.org/tools/protparam.html
24Step 2.d Physico-chemical Properties SWISS
PROT
1
- Formula represents the chemical formula for the
query molecule
- Represents the Number of atoms present in the
molecule
- This shows the charge states of the amino acid
residues within the protein molecule
- Half Life describes the time required for the
protein to degrade to half of its original mass
2
Defines the solubility of the proteins.
Hydrophobic molecules exhibit a Positive GRAVY
value while hydrophilic molecules show a negative
GRAVY value
3
Action
Audio Narration
4
Description of the action
- Other information that can be obtained from these
databases include chemical formula for the
protein, total number of atoms present in the
protein, total number of negatively and
positively charged residues, estimated half-life
of the protein, i.e. the time in which the
protein will degrade to half its original mass
and the average hydropathicity which gives an
insight into the solubility of the proteins.
Hydrophobic molecules exhibit a Positive GRAVY
value while hydrophilic molecules show a negative
GRAVY value
Tool Output
Re-make all the screen shots. Follow the steps as
shown in the animated flowchart. When the user
clicks on the green highlighted tab, the
difinition must be read aloud alongwith the
written display of the definition in a separate
box as shown in the slide animation
5
http//expasy.org/tools/protparam.html
25Step 3.a Domain Analysis PROSITE
1
2
3
Action
Audio Narration
4
Description of the action
Re-Draw all screen shots. Display the sequence
and then minimize it to fit into the input window
of the web based tool. Show the clicking effect
on the button named Scan
Go to http//expasy.org/prosite/ .Input the FASTA
sequence obtained in previous steps into the
input box of the server. Click on Scan.
Tool Input
5
http//expasy.org/prosite/
26Step 3.b Domain Analysis PROSITE
1
HITS BY PROFILE
HIT 1
2
HIT 2
3
HIGHEST SCORE
HIT 3
Action
Audio Narration
4
Description of the action
Re-Draw all screen shots. Show the 3 results and
then emphasize on the score of thee 2nd hits as
it is the highest. Display clicking effect on 2nd
hit
The results page shows the various profiles that
have the highest probability of occurrence on the
basis of which they are assigned scores. You
should select the hit with the highest score
Tool Output
5
http//expasy.org/prosite/
27Step 3.c Domain Analysis PROSITE
1
Location of Albumin Domain in the sequence
amino acid position 210-402
2
POSITION OF THE PATTERN MATCHED FOR IDENTIFYING
DOMAIN
CONSERVED CYSTEINE INVOLVED IN DISULPHIDE BOND
PROSITE figure of the albumin domain
3
Structure of an albumin domain
Action
Audio Narration
4
Description of the action
Re-Draw all screen shots. Type the name of the
query in the search box. Click on Go. Follow it
up by an arrow and the output image
The result displays the position of the Albumin
domain highlighted in the sequence from position
210-402. It also displays a graphical view in
form of a downloadable png image where the
Profile hits are represented as colored shapes
with their PROSITE name. It then displays the
structure of the Albumin Domain highlighting the
di-sulhphide bonding cysteine residues as C and
and its signature pattern as
Tool Output
5
http//expasy.org/prosite/
28Step 4.a Structural Analysis RCSB PDB
1
Summary
Biology and Chemistry
Geometry
2
Classification Transport Protein Structure
Weight 133377.93 Molecule Serum
albumin Polymer 1 Type polypeptide(L) Length
585 Chains A, B
Molecular Description
Related PDB entries
3
Ligand chemical components
Derived data
Action
Audio Narration
4
Description of the action
Once the user enters Serum Albumin in the PDB
search box, in the output page of the selected
PDB entry, we find the following tabs. The
horizontal tabs summarize the entire result page.
The vertical tabs occur as the initial
description in the first page. Each of these tabs
can be explored in detail. The structural
analysis of the protein can display a wide range
of properties such as the description of the
protein molecule including classification of the
protein, the chains it contains, number of amino
acids, etc.
Re-Draw the tabs. The first panel of tabs is
horizontal one. Out of them Summary tab is
active in this slide. Thats is the tab in white
is active. Under Summary there are 4 more tabs
which are vertical. Out of them the blue tab is
Active. Slide 4.a to 4.d shows the vertical tabs
active one by one. Slide 4.e. to 4.g, shows the
remaining two horizontal tabs active. followed
one while reading the audio narration of each
slide, with the display it carries
Tool Output Display slide
5
http//www.pdb.org/pdb/home/home.do
29Step 4.b Structural Analysis RCSB PDB
1
Summary
Biology and Chemistry
Geometry
2
Molecular Description
1AO6 Crystal structure of human serum albumin
1BM0 Crystal structure of human serum albumin
1E7E Human serum albumin complexed with
decanoic acid 2BXC Human serum albumin
complexed with phenylbutazone 2BXF Human
serum albumin complexed with diazepam 2BXN
Human serum albumin complexed with myristate and
iodipamide
Related PDB entries
Ligand chemical components
3
Derived data
Action
Audio Narration
4
Description of the action
The display also shows entries that are closely
related to the users query, such as in the case
of the same protein characterized from a
different organism.
Re-Draw the tabs. The first panel of tabs is
horizontal one. Out of them Summary tab is
active in this slide. Thats is the tab in white
is active. Under Summary there are 4 more tabs
which are vertical. Out of them the blue tab is
Active. Slide 4.a to 4.d shows the vertical tabs
active one by one. Slide 4.e. to 4.g, shows the
remaining two horizontal tabs active. followed
one while reading the audio narration of each
slide, with the display it carries
Tool Output Display slide
5
http//www.pdb.org/pdb/home/home.do
30Step 4.c Structural Analysis RCSB PDB
1
Summary
Biology and Chemistry
Geometry
2
Molecular Description
Identifier LQZ Name 2-(diethylamino)-N-
(2,6-dimethylphenyl)ethanamide Formula C14 H22
N2 O Interaction View Ligand Explorer
Related PDB entries
Ligand chemical components
3
Derived data
Action
Audio Narration
4
Description of the action
Re-Draw the tabs. The first panel of tabs is
horizontal one. Out of them Summary tab is
active in this slide. Thats is the tab in white
is active. Under Summary there are 4 more tabs
which are vertical. Out of them the blue tab is
Active. Slide 4.a to 4.d shows the vertical tabs
active one by one. Slide 4.e. to 4.g, shows the
remaining two horizontal tabs active. followed
one while reading the audio narration of each
slide, with the display it carries
The protein molecules are generally structurally
characterized by attaching it with a ligand and
determining its structure from experimental
techniques. The description of these ligands is
given in the result summary of the query protein
Tool Output Display slide
5
http//www.pdb.org/pdb/home/home.do
31Step 4.d Structural Analysis RCSB PDB
1
Summary
Biology and Chemistry
Geometry
Molecular Function Cellular Component
DNA binding extracellular region
fatty acid binding extracellular space
copper ion binding platelet alpha granule lumen
protein binding protein complex
drug binding
lipid binding
metal ion binding
chaperone binding
2
Molecular Description
Related PDB entries
Ligand chemical components
3
Derived data
Action
Audio Narration
4
Description of the action
Result summary displays derived data for the
Serum Albumin such as the molecular and
biological functions that the protein is involved
in.
Re-Draw the tabs. The first panel of tabs is
horizontal one. Out of them Summary tab is
active in this slide. Thats is the tab in white
is active. Under Summary there are 4 more tabs
which are vertical. Out of them the blue tab is
Active. Slide 4.a to 4.d shows the vertical tabs
active one by one. Slide 4.e. to 4.g, shows the
remaining two horizontal tabs active. followed
one while reading the audio narration of each
slide, with the display it carries
Tool Output Display slide
5
http//www.pdb.org/pdb/home/home.do
32Step 4.e Structural Analysis RCSB PDB
1
Summary
Biology and Chemistry
Geometry
SNP ID Amino Acid Change PDB position
rs11538232 N - gt S 18
rs59066571 E - gt V 48
rs11538221 E - gt G 57
rs11538216 S - gt L 65
rs11538226 T - gt A 79
rs11538217 F - gt L 134
rs58624704 R - gt Q 186
rs11538220 K - gt E 190
rs17400586 K - gt N 190
rs3204504 A - gt V 191
rs3210154 A - gt T 191
rs11538228 A - gt G 194
rs3210163 Q - gt L 196
rs28930975 E - gt K 297
SNP ID Amino Acid Change PDB position
rs72552710 K - gt N 313
rs11538223 N - gt D 318
rs72552711 E - gt K 321
rs3210210 L - gt R 327
rs28930976 D - gt V 340
rs11538215 C - gt R 369
rs11538214 E - gt V 382
rs1140449 D - gt Y 451
rs1063469 K - gt E 466
rs60826059 T - gt I 478
rs11538227 S - gt P 517
rs57636959 K - gt N 536
rs61579038 T - gt P 540
rs72552712 K - gt E 545
rs11538208 M - gt K 548
2
3
Action
Audio Narration
4
Description of the action
The Biological aspect of Serum Albumin are also
displayed as results. The unique feature of this
tab is that it gives a complete list of Single
Nucleotide Polymorphisms (SNP) in the protein
sequence. This shows the change in amino acids as
well as the locations of the SNPs and the SNP
Ids.
Re-Draw the tabs. The first panel of tabs is
horizontal one. Out of them Biology and
Chemistry tab is active in this slide
Tool Output Display slide
5
http//www.pdb.org/pdb/home/home.do
33Step 4.f Structural Analysis RCSB PDB
1
- The length of the covalent bonds between two
adjacent atoms in a protein molecule
- The angle formed by 3 consecutive atoms in native
conformation of a protein
Summary
Biology and Chemistry
Geometry
2
- The angle formed by 2 consecutive planes of 4
linearly bonded atoms
3
Action
Audio Narration
4
Description of the action
- The 3-D visualization of Serum Albumin is given
as a part of the results which can be viewed from
a tool called Jmol. Along with the image analysis
from Jmol, users can also study and download the
structural characteristics of the protein such as
its Bond Length along with the place and
frequency of its occurrence. Structural results
also summarize the Bond Angle and the Dihedral
Angles including the chain where they occur and
the frequency of its occurrence.
Re-Draw the tabs. The first panel of tabs is
horizontal one. Out of them Geometry tab is
active in this slide
Tool Output Display slide
5
http//www.pdb.org/pdb/home/home.do
34Interactivity option 1Step No 1 To find the
sequence corresponding to the beta chain of
insulin and compare their lengths in different
organisms
1
Check the names of the source organism 4
Sort the file according to sequence lengths 6
Store the sequence ID, source organism and length
of the sequence in a separate text file 5
2
Input the term serum albumin in the search box 1
Chose and open a primary sequence database of
your choice 2
Click on the entry corresponding to beta chains 3
3
4
Results
Boundary/limits
Interacativity Type Options
Remove the step number from the bottom of the
tab. Show all the steps in the mixed order. The
user must click on the tabs order wise. If the
user clicks at a tab which is not in the right
order, then flash a message saying try again
All the tabs must be arranged in right order. The
numbers mentioned indicate the correct order.
Arrange the steps in the order to be performed.
5
35Questionnaire
1
- 1. Which of the following is a Protein Sequence
Database? - Answers a)Swiss-Prot b)PDB c) CSD d)? GEO
- 2. Which server should be used for identifying
Protein Domains? - Answers a)NCBI b)DDBJ c) PROSITE d)? All
- 3. Which reagent is used for Edman Degradation?
- Answers a)Dabsyl Chloride b)Ninhydrin c) Phenyl
iso-thiocyanate d)? Cyanogen Bromide - 4. Which amongst the following can be used for
retrieving proteins from a database - Answers a)Protein Name b) Corresponding Gene
Name c) Unique Identifier d) All
2
3
4
5
36Questionnaire
1
- 5. Which one is NOT a step in sequence
identification using Mass spectroscopy - Answers a) Labelling terminal residue b) Electro-
spray ionization c) Peptide fragmentation d)?
calculating m/z ratio - 6. Which one is NOT a derived protein Database
- Answers a) Prosite b) Pfam c) Swiss-prot d)
?ProDom - 7. Answers The most complex and important
database schema is - a) Physical b) Logical c) View d)? All
2
3
4
5
37Links for further reading
- Reference websites
- http//www.proteopedia.org/wiki/index.php/Main_Pag
e - http//www.pdb.org/pdb/home/home.do
- http//www.ncbi.nlm.nih.gov
- http//expasy.org/sprot/
- http//prodom.prabi.fr/prodom/current/html/home.ph
p - http//expasy.org/prosite/
- http//pfam.sanger.ac.uk
38Links for further reading
- Books
- Biochemistry by Stryer et al., 5th edition
- Biochemistry by A.L.Lehninger et al., 3rd edition
- Database System Concepts by Korth et al., 5th
edition