Title: Systems Biology Visualization
1Systems Biology Visualization
There has been a rapid accumulation of data from
protein interaction, gene expression and
metabolic pathway analysis. To derive meaningful
information out of this data, we need to develop
integrative visualization techniques, which
provide an insight into its biological relevance.
2Definition of the Problem
1
We will consider the case study of the disease
condition known as Glioma which is a group of
brain tumors. In the first part of the
animation, we take an insight into the regulation
of genes in Glioma by gene expression data
analysis . It will give us an insight into the
genes, which are modulated (up- or
down-regulated) during Glioma. In the second
part of the study, we will find the metabolic
pathways that are involved in Glioma by
undertaking a study with the protein Interaction
data . In the third part, we will explore
pathway databases and its features to study the
pathways that were retrieved from the gene and
protein interaction studies.
2
3
4
Audio Narration
Action
Description of the action
Static Image
Dsiplay image and read narration
We will consider the case study of the disease
condition known as Glioma which is a group of
brain tumors. In the first part of the animation,
we take an insight into the regulation of genes
in Glioma by gene expression data analysis . It
will give us an insight into the genes, which are
modulated (up- or down-regulated) during Glioma.
In the second part of the study, we will find
the metabolic pathways that are involved in
Glioma by undertaking a study with the protein
Interaction data . In the third part, we will
explore pathway databases and its features to
study the pathways that were retrieved from the
gene and protein interaction studies.
5
3Master Layout (Part 1)
1
This animation consists of 3 parts Part 1 Gene
Expression Data Analysis Part 2 Protein
Interaction Data Analysis Part 3 Metabolic
Profile Databases
Chose the problem to study and extract relevant
data
2
Send the gene expression profile data as input to
the tool
3
4
Compute the features related to gene regulation
5
Genes up- or down-regulation
http//www.genome.jp/kegg/
4Definitions of the componentsPart 1 Gene
expression data analysis
1
- Interaction Data Interaction data refers to
information regarding the nature and type of
bonding between various biological components. It
can be Protein Interaction Data, Gene Expression
Data and Metabolic Pathway Data. - Visualization tools Software tools that are
capable of reading interaction data and then
representing it in a graphical format thereby
providing a simplistic biological insight. E.g.
Cytoscape for Protein Interaction data,
Genespring for Gene Expression Data. - Microarray Microarrays are printed on a solid
surface, typically glass, and used to study and
analyze large number of samples simultaneously in
high-throughput.
2
3
4
5
5Gene Expression Profile DataOption
1
DATA GENERATION
INPUT
VISUALIZATION
2
3
Proceed to Full Animation
4
Audio Narration
Action
Description of the action
Option for user to view Input Or Output
The Data generation box should be linked to step
1. Input box should be linked to the step 2
input slides. Same goes for output. Output slides
should be linked to step 3. Visulaization slide
should be linked to Step 4.This SLIDE is to
provide the user an option to go through only
specific content from the animation
To view the protocol for submitting files, click
on input. To view the protocol for retrieving and
analyzing output files, click on output. To
proceed to full animation click on the arrow.
5
6Step 1.a - Gene Expression Profile Data Data
Extraction from Experiments
?
1
2
Biological Samples e.g. gliomas
3
Microarray Chips
Scanned Slides
4
Audio Narration
Action
Description of the action
Schematic for extracting the data for defined
problem
Follow the animation. Re-draw the figures.
Users can extract gene microarray data from
Microarray Experiments. The normalized microarray
data gives an insight into the regulation of the
genes. This regulation is checked by studying the
microarray data through Gene Expression Profile
Data Analysis software. For a detailed insight
into the Microarray Technique, study the OSCAR
animation for Microarray Technologies.
5
Biochemistry by A.L.Lehninger et al., 3rd edition
7Step 1.b - Gene Expression Profile Data Data
Extraction from Databases
?
1
B Input - Extracting microarray data For analysis
Microarray Data Repository
Query Term
High-Grade glioma
2
3
Microarray Data file
PMID ACCESSION NUMBER PROTEIN NAME GLIOMA TYPE VALIDATION FOLD CHANGE p-VALUE
4
Audio Narration
Action
Description of the action
Schematic for extracting the data for defined
problem
Follow the animation and show storage of files in
Local System
Users can extract microarray data directly from
experiments or from Public repositories such as
GEO datasets from NCBI. Premier microarray
research institutes have their own dedicated
databases for the microarray data that has been
extracted in their labs. This data is in the
form of compressed files due to their large file
sizes. These files need to be stored in a local
Personal Computer System. Here, as an example,
well study the regulation of genes in brain
tumor, known as Glioma. Gene expression data
analysis will give us a picture of the genes,
which are modulated (up- or down-regulated)
during Glioma.
5
8Step 2 Gene Expression Profile Data - Input
1
?
The technology used in Microarray Experiments
refers to the reference organism used for making
the microarray chip
ADD PROJECT
Glioma
ADD EXPERIMENT
Select Experimental Type
Affymentrix Expression
2
- Agilent Single Color
- Agilent Two Color
- Affymentrix Copy Number
- Affymentrix Expression
- Illumina Association Analysis
- Illumina Copy Number
- Illumina Single Color
- RealTime - PCR
SELECT PLATFORM
Select Technology (if applicable)
Human
- Barley
- Bovine
- E.Coli
- BSubtilis
- Drosophila
- Human
- Mouse
- Maize
- Human
3
UPLOAD DATA
Folder A/GSE123/GSM456.CEL
4
Action
Audio Narration
Description of the action
Schematic for entering data and setting parameters
Follow the animation and re-draw images to
replicate the working of a software environment
The software follows the input procedure in a
sequential manner. Initial steps are to add a new
project and experiment. While adding experiment,
user needs to define the type of experiment. Due
to lack of standardization, microarray data is
saved in various file formats such as CEL, GPR,
GAL, CDT. Various tools support one or more of
such formats.
5
http//www.genome.jp/kegg/
9Step 3.a - Gene Expression Profile Data - Output
1
?
High cutoff to give significant results.
Probe Set ID Fold change(GSM34580.CEL vs GSM34586.CEL) Regulation(GSM34580.CEL vs GSM34586.CEL) Gene Symbol
34517_at 16.870739 up HMGCS1
37513_at 14.440558 up SCD
33369_at 9.3396635 up SC4MOL
34375_at 10.11749 down CCL2
35372_r_at 12.105057 down IL8
35766_at 8.585363 down KRT18
38427_at 12.070478 down COL15A1
1369_s_at 11.556258 down IL8
875_g_at 9.369903 down CCL2
695_at 9.015739 down TNC
266_s_at 8.460315 up CD24
2
Filter data - Fold Change
Heat Map
3
Summary Statistics
Functional Analysis - GO
4
Action
Audio Narration
Description of the action
Schematic for interpreting the results of Gene
Expression Data Analysis
High cutoff is provided to give significant
results. During comparison, probe sets that
satisfy the fold change cutoff of more than 8 in
at least one condition pair will be displayed in
the result. Regulation is reported by comparing
ratio of conditions 1 and 2. Thus, highlighted
gene HMGCS1 is up-regulated in sample GSM34580 as
compared to GSM 34586.
Show the simulation of the software. In each
slide, the tab that is high-lighted is ACTIVE. In
the animation format, the tab should highlight
when you click on it followed by the content of
the slide. Then the mouse should move to the
second tab and click on it leaving the first tab
inactive and second tab active. Activity of tabs
can be differentiated by separate Colors
5
http//www.genome.jp/kegg/
10Step 3.b - Gene Expression Profile Data - Output
1
?
upregulated
downregulated
2
Filter data - Fold Change
Heat Map
3
Summary Statistics
Legend for color coding of regulation
Functional Analysis - GO
4
Audio Narration
Action
Description of the action
Schematic for interpreting the results of Gene
Expression Data Analysis
Animator needs to re-draw all screen shots as
they have been taken from the references
software. Animator must not copy the image or a
part thereof., in the final animation. Show the
simulation of the software. In each slide, the
tab that is high-lighted is ACTIVE. In the
animation format, the tab should highlight when
you click on it followed by the content of the
slide. Then the mouse should move to the second
tab and click on it leaving the first tab
inactive and second tab active. Activity of tabs
can be differentiated by separate Colors
Heat Map is the graphical visualization of the
regulation of genes, which is determined by the
cut-off value of fold change provided by the
user. The up-regulation of the gene is marked in
red while the down-regulation is marked by
blue color as explained in the figure legend.
5
http//www.genome.jp/kegg/
11Step 3.c - Gene Expression Profile Data - Output
1
?
Property GSM34580.CEL GSM34586.CEL
No. of Observations 11 11
No. of Missing Values 0 0
Minimum -1.798769 -2.0382257
Maximum 2.0382257 1.798769
Mean -0.42409563 0.42409545
Median -1.5862231 1.5862226
Std. Deviation 1.7535444 1.7535444
2
Filter data - Fold Change
Heat Map
3
Summary Statistics
Functional Analysis - GO
4
Action
Audio Narration
Description of the action
Schematic for interpreting the results of Gene
Expression Data Analysis
Show the simulation of the software. In each
slide, the tab that is high-lighted is ACTIVE. In
the animation format, the tab should highlight
when you click on it followed by the content of
the slide. Then the mouse should move to the
second tab and click on it leaving the first tab
inactive and second tab active. Activity of tabs
can be differentiated by separate Colors
The summary statistics result gives the
statistical gist of the genes screened after
specifying a cut-off to the gene expression
analysis server. This includes the number of
genes observed to be regulated and the
statistical significance of the fold change
corresponding to it.
5
http//www.genome.jp/kegg/
12Step 3.d - Gene Expression Profile Data - Results
1
?
- Molecular Functions
- catalytic activity
- hydroxymethylglutaryl-CoA synthase activity
- cytokine activity
- protein binding
- chemokine activity
- G-protein-coupled receptor binding
- signal transducer activity
- Cellular Components affected
- endoplasmic reticulum
- extracellular region
- soluble fraction
- cytoplasm
- membrane fraction
2
Filter data - Fold Change
Heat Map
- Biological Functions
- lipid metabolic process
- fatty acid metabolic process
- positive regulation of endothelial cell
proliferation - angiogenesis
- apoptosis
- cell adhesion
- response to hypoxia
3
Summary Statistics
Functional Analysis - GO
4
Audio Narration
Action
Description of the action
Schematic for interpreting the results of Gene
Expression Data Analysis
Show the simulation of the software. In each
slide, the tab that is high-lighted is ACTIVE. In
the animation format, the tab should highlight
when you click on it followed by the content of
the slide. Then the mouse should move to the
second tab and click on it leaving the first tab
inactive and second tab active. Activity of tabs
can be differentiated by separate colors
The Functional Analysis tools gives the functions
that the regulated genes are involved in at the
molecular level, biological level and the
cellular components they modulate.
5
http//www.genome.jp/kegg/
13Step 4. - Gene Expression Profile Data -
Visualization
1
?
2
3
4
5
http//www.ingenuity.com/
14Step 3.d - Gene Expression Profile Data -
Visualization
1
Audio Narration
Action
Description of the action
The pathway information relevant in Gliomas
Studies, from the input data, can be extracted.
In this we show the merged gene regulatory
pathway. We zoom into the pathway titled Cell
Cycle, Cellular Assembly and Organization, DNA
Replication, Recombination, and Repair and see
the interactions of TP53 pathway.
Static Slide
Animator needs to re-draw all screen shots as
they have been taken from the references
software. Animator must not copy the image or a
part thereof, in the final animation. Show the
image with audio narration. Show the zooming
effect a shown in the animation.
2
3
4
5
http//www.ingenuity.com/
15Master Layout (Part 2)
1
This animation consists of 3 parts Part 1 Gene
Expression Data Analysis Part 2 Protein
Interaction Data Analysis Part 3 Metabolic
Profile Databases
Retrieve protein interaction data from
experiments or public repositories or experiments
2
Input the data in the software tool in the right
format
3
4
View, download and interpret the results
5
http//www.genome.jp/kegg/
16Definitions of the componentsPart 2 Protein
Interaction Data Analysis
1
- Knowledgebase The Protein Interaction Network
tools accept the user data and map it to its
repository. These storage units of the tools are
called their knowledgebase. - Accession Number The accession number of a
protein refers to the unique identifier, which
acts as a common link to relate the data provided
as input by the users with the knowledgebase of
the tool. - Protein microarray These are miniaturized
arrays, commonly printed on glass, polyacrylamide
gel pads or microwells, onto which small
quantities of thousands of proteins can be
simultaneously immobilized for high-throughput
assaying.
2
3
4
5
17Gene Expression Profile DataOption
1
DATA GENERATION
INPUT
VISUALIZATION
2
3
Proceed to Full Animation
4
Audio Narration
Action
Description of the action
Option for user to view Input Or Output
The Data generation box should be linked to step
1. Input box should be linked to the step 2
input slides. Same goes for output. Output slides
should be linked to step 3. Visulaization slide
should be linked to Step 4.This SLIDE is to
provide the user an option to go through only
specific content from the animation
To view the protocol for submitting files, click
on input. To view the protocol for retrieving and
analyzing output files, click on output. To
proceed to full animation click on the arrow.
5
18Step 1.a - Protein Molecular Interaction Network
Data Extraction
1
2
Protein Samples
3
Protein Microarray Chips
Scanned Slides
4
Audio Narration
Action
Description of the action
Schematic for extracting the data for defined
problem
Follow the animation. Re-draw the figures.
Users can extract protein microarray data from
Microarray Experiments. The normalized microarray
data gives an insight into the regulation of the
genes. This regulation is checked by studying the
microarray data through Gene Expression Profile
Data Analysis software. For a detailed insight
into the Microarray Technique, study the OSCAR
animation for Microarray Technologies.
5
19Step 1.b - Protein Molecular Interaction Network
Data Extraction
1
?
Extract Data from Literature sources and store it
in a spreadsheet
Literature Resource
Query Term
High-Grade glioma
Rawdata.xls
2
PMID ACCESSION NUMBER PROTEIN NAME GLIOMA TYPE VALIDATION FOLD CHANGE p-VALUE
Extract data from Microarray Data repositories
3
4
Audio Narration
Action
Description of the action
- Protein molecular interaction software are used
to build and analyze networks of proteins, given
their accession numbers. The networks are built
by mapping input data to the softwares
knowledgebase. Here, we explain with a list of
proteins modulated in the disease condition
called glioma, which are extracted from - literature resources.
- Microarray Databases
- As an output we get a spreadsheet containing
microarray data
The first panel is about extracting information
from web resource. Show the required PDFs getting
downloaded and read through to extract data.
Follow this by a screen shot of Microarray
databases. In the end show the Raw.xls file
being formed.
Schematic for extracting the data for defined
problem
5
20Step 1.c - Protein Molecular Interaction Network
Data Extraction
1
2
Extract data from Microarray Data repositories
3
Rawdata.xls
PMID ACCESSION NUMBER PROTEIN NAME GLIOMA TYPE VALIDATION FOLD CHANGE p-VALUE
4
Audio Narration
Action
Description of the action
Schematic for extracting the data for defined
problem
The first panel is about extracting information
from web resource. Show the required PDFs getting
downloaded and read through to store specific
data in spreadsheets
Protein molecular interaction software are used
to build and analyze networks of proteins, given
their accession numbers. The networks are built
by mapping input data to the softwares
knowledgebase. Here, we explain with a list of
proteins modulated in the disease condition
called glioma, which are extracted from
literature resources or databases.
5
21Step 2.a - Protein Molecular Interaction Network
Input
1
?
CREATE PROJECT
UPLOAD
MAP DATA
Project Glioma
Enter Project Name
2
Core Analysis
Enter Experiment Type
Biomarker Analysis Core Analysis Toxicology
Analysis Metabolic Analysis
3
4
Audio Narration
Action
Description of the action
Schematic for Input
Show the simulation of the software. In each
slide, the tab that is high-lighted is ACTIVE. In
the animation format, the tab should highlight
when you click on it followed by the content of
the slide. Then the mouse should move to the
second tab and click on it leaving the first tab
inactive and second tab active. Activity of tabs
can be differentiated by separate Colors
The name of the project and experiments must be
entered by the user in the software for the
purpose of saving the current status of the work.
In the experiment type, the user must select the
type of analysis that needs to be conducted on
the dataset. For this Glioma case study, we
undertake core analysis of the data to identify
its network.
5
22Step 2.b - Protein Molecular Interaction Network
Input
1
?
CREATE PROJECT
UPLOAD DATA
MAP DATA
Folder1/Rawdata.xls
Upload Excel File
2
PMID Protein Name Accession Number Glioma Type
17653765 Fructose bisphosphate aldolase 78070601 anaplastic oligodendroglioma
17653765 Phosphoglycerate mutase 1 56081766 anaplastic oligodendroglioma
17653765 Carbonic anhydrase ii 443135 anaplastic oligodendroglioma
Enolase 1 4503571 Glioblastoma multiforme
Enolase 693933 Glioblastoma multiforme
a-Enolase like 1 3282243 Glioblastoma multiforme
Enolase 1 4503571 Glioblastoma Multiforme
Aldolase C, fructose biphosphate P09972 glioblastoma,Grade II,III,IV
Enolase 1 P06733 glioblastoma,Grade II,III,IV
Enolase 2 P09104 glioblastoma,Grade II,III,IV
Glyceraldehyde-3-phosphate dehydrogenase, liver P04406 glioblastoma,Grade II,III,IV
Lactate dehydrogenase B P07195 glioblastoma,Grade II,III,IV
Phosphoglycerate kinase 1 P00558 glioblastoma,Grade II,III,IV
Phosphoglycerate mutase 1, brain Q6P6D7 glioblastoma,Grade II,III,IV
Pyruvate kinase, isozymes M1/M2 P14618-2 glioblastoma,Grade II,III,IV
Pyruvate kinase, isozymes M1/M2, splice isoform M1 P14618 glioblastoma,Grade II,III,IV
Triosephosphate isomerase P60174 glioblastoma,Grade II,III,IV
Pyruvate kinase NI Malignant Glioma
Glyceraldehyde 3-phosphate dehydrogenase P04406 Malignant Glioma
Triosephosphate isomerase P60174 Malignant Glioma
Enolase 1 P06733 Malignant Glioma
Aldolase A NI Malignant Glioma
19109410 GAPDH P16858 Glioma gradeIII,IV
19109410 Pyruvate kinase isozyme M1/M2 P52480 Glioma gradeIII,IV
19109410 Alpha-Enolase P17182 Glioma gradeIII,IV
19109410 Phosphoglycerate kinase 1 P09411 Glioma gradeIII,IV
19109410 GAPDH P16858 Glioma gradeIII,IV
3
4
MENTION THE TYPE OF IDENTIFIER SUCH AS UNIPROT,
GENEBANK ID, REFSEQ ID, ENTREZ GENE, ETC
5
23Step 2.c - Protein Molecular Interaction Network
Input
1
Audio Narration
Action
Description of the action
Upload the Raw data file that was created after
scrutinizing the papers. The format of the Raw
data file to be uploaded varies amongst different
software. Although most software recognize
Spreadsheet format of data, some of them have
their own specific input file format such as .sif
file for Cytoscape. Once the raw data file is
uploaded, the tool will display all columns. The
user needs to select the columns that are to be
given to the tool. Out of all the columns, it is
compulsory to enter the ACCESSION NUMBER (OR ANY
OTHER PROTEIN IDENTIFIER). This column is
highlighted in red. These identifiers can be of
multiple types, which need to be defined so that
the tool can match the users data to its
dictionary of identifier terms called the
knowledgebase. All other information provided is
optional and the users can provide them depending
on the nature of analysis.
Schematic for Input
Show the simulation of the software. In each
slide, the tab that is high-lighted is ACTIVE. In
the animation format, the tab should highlight
when you click on it followed by the content of
the slide. Then the mouse should move to the next
tab and click on it leaving the first tab
inactive and second tab active. Activity of tabs
can be differentiated by separate colors.
2
3
4
5
24Step 2.d - Protein Molecular Interaction Network
Input
1
?
CREATE PROJECT
UPLOAD DATA
MAP DATA
PMID Protein Name Accession Number Glioma Type
17653765 Fructose bisphosphate aldolase 78070601 anaplastic oligodendroglioma
17653765 Phosphoglycerate mutase 1 56081766 anaplastic oligodendroglioma
17653765 Carbonic anhydrase ii 443135 anaplastic oligodendroglioma
Enolase 1 4503571 Glioblastoma multiforme
Enolase 693933 Glioblastoma multiforme
a-Enolase like 1 3282243 Glioblastoma multiforme
Enolase 1 4503571 Glioblastoma Multiforme
Aldolase C, fructose biphosphate P09972 glioblastoma,Grade II,III,IV
Enolase 1 P06733 glioblastoma,Grade II,III,IV
Enolase 2 P09104 glioblastoma,Grade II,III,IV
Glyceraldehyde-3-phosphate dehydrogenase, liver P04406 glioblastoma,Grade II,III,IV
Lactate dehydrogenase B P07195 glioblastoma,Grade II,III,IV
Phosphoglycerate kinase 1 P00558 glioblastoma,Grade II,III,IV
Phosphoglycerate mutase 1, brain Q6P6D7 glioblastoma,Grade II,III,IV
Pyruvate kinase, isozymes M1/M2 P14618-2 glioblastoma,Grade II,III,IV
Pyruvate kinase, isozymes M1/M2, splice isoform M1 P14618 glioblastoma,Grade II,III,IV
Triosephosphate isomerase P60174 glioblastoma,Grade II,III,IV
Pyruvate kinase NI Malignant Glioma
Glyceraldehyde 3-phosphate dehydrogenase P04406 Malignant Glioma
Triosephosphate isomerase P60174 Malignant Glioma
Enolase 1 P06733 Malignant Glioma
Aldolase A NI Malignant Glioma
19109410 GAPDH P16858 Glioma gradeIII,IV
19109410 Pyruvate kinase isozyme M1/M2 P52480 Glioma gradeIII,IV
19109410 Alpha-Enolase P17182 Glioma gradeIII,IV
19109410 Phosphoglycerate kinase 1 P09411 Glioma gradeIII,IV
19109410 GAPDH P16858 Glioma gradeIII,IV
2
3
4
5
25Step 2.d - Protein Molecular Interaction Network
Input
1
Audio Narration
Action
Description of the action
The input raw data is mapped to the knowledgebase
of the software to provide a uniform set of IDs
for building a network. The IDs from the input
file that are not matched with its knowledgebase
are highlighted in red
Schematic for Input
This file is same as input file. Only the entries
that are not mapped need to be highlighted as
animation
2
3
4
5
26Step 2.e - Protein Molecular Interaction Network
Input
1
?
CREATE PROJECT
UPLOAD
MAP DATA
Data gets mapped to Knowledgebase of software to
produce output files
ID Gene Description Location Family
78070601 ALDOC aldolase C, fructose-bisphosphate Cytoplasm enzyme
56081766 PGAM1 phosphoglycerate mutase 1 (brain) Cytoplasm phosphatase
4503571 ENO1 enolase 1, (alpha) Cytoplasm transcription regulator
693933 ENO1 enolase 1, (alpha) Cytoplasm transcription regulator
P09972 ALDOC aldolase C, fructose-bisphosphate Cytoplasm enzyme
P06733 ENO1 enolase 1, (alpha) Cytoplasm transcription regulator
P09104 ENO2 enolase 2 (gamma, neuronal) Cytoplasm enzyme
P04406 GAPDH (includes EG2597) glyceraldehyde-3-phosphate dehydrogenase Cytoplasm enzyme
P07195 LDHB lactate dehydrogenase B Cytoplasm enzyme
P00558 PGK1 phosphoglycerate kinase 1 Cytoplasm kinase
Q6P6D7 PGAM1 phosphoglycerate mutase 1 (brain) Cytoplasm phosphatase
P14618-2 PKM2 pyruvate kinase, muscle Cytoplasm kinase
P14618 PKM2 pyruvate kinase, muscle Cytoplasm kinase
P60174 TPI1 triosephosphate isomerase 1 Cytoplasm enzyme
P04406 GAPDH (includes EG2597) glyceraldehyde-3-phosphate dehydrogenase Cytoplasm enzyme
P60174 TPI1 triosephosphate isomerase 1 Cytoplasm enzyme
P06733 ENO1 enolase 1, (alpha) Cytoplasm transcription regulator
P16858 GAPDH (includes EG14433) glyceraldehyde-3-phosphate dehydrogenase Plasma Membrane enzyme
P52480 PKM2 pyruvate kinase, muscle Cytoplasm kinase
P17182 ENO1 enolase 1, (alpha) Cytoplasm transcription regulator
2
3
4
5
27Step 2.e - Protein Molecular Interaction Network
Input
1
Audio Narration
Action
Description of the action
The tool also extracts other relevant information
from its knowledgebase corresponding to that ID.
The uniform IDs and the new columns are displayed
in the form of a new spreadsheet which has the
refined data. The columns highlighted in blue
are the ones that are newly added. The red column
is provided for uniformity by taking one specific
naming scheme for identifiers.
Schematic for Input
Show the simulation of the software. In each
slide, the tab that is high-lighted is ACTIVE. In
the animation format, the tab should highlight
when you click on it followed by the content of
the slide. Then the mouse should move to the next
tab and click on it leaving the first tab
inactive and second tab active. Activity of tabs
can be differentiated by separate Colors
2
3
4
5
28Step 3 - Protein Interaction Data Analysis -
Output
1
?
BUILD PATHWAY
OUTPUT NETWORK
OUTPUT PATHWAY
TOP DISEASE NETWORK
TOP PHYSIOLOGICAL NETWORK
TOP NETWORK FUNCTIONS
2
- Genetic Disorder, Neurological Disease, Nucleic
Acid Metabolism - Cell-To-Cell Signaling and Interaction, Nervous
System Development and Function, Cellular
Assembly and Organization - Cancer, Reproductive System Disease,
Gastrointestinal Disease
- Cancer
- Gastrointestinal Disease
- Neurological Disease
- Nervous System Development and Function
- Hematological System Development and Function
- Immune Cell Trafficking
3
TOP CANONICAL PATHWAY
- Glycolysis/Gluconeogenesis
- Mitochondrial Dysfunction
- 14-3-3-mediated Signaling
4
Audio Narration
Action
Description of the action
The tools provide a summary of results which show
the top networks produced in each category. The
ranking is based on the number of mappings from
user input dataset to softwares knowledgebase.
The prediction of Neurological Disease,
Cancer, Nervous System as top networks
reinforce our data analysis. The data analysis
from this tool also shows that Glycolysis/Glucone
ogenesis is the pathway that is getting
modulated from our list of proteins
Schematic for Output summary
Follow the animation
5
29Step 4.a -Protein Molecular Interaction Network -
Output
1
?
BUILD PATHWAY
OUTPUT NETWORK
OUTPUT PATHWAY
2
Select the number of networks to be constructed
1
Select the maximum number of Molecules in the
network
70
3
Select endogenous chemicals
No
4
Audio Narration
Action
Description of the action
Users can modulate parameters which define the
number and size of networks to be formed. Users
can also modulate the presence of molecules apart
from genes, proteins or RNA. The molecules that
have shown relationships with other genes or
proteins of the knowledgebase are mapped into the
network. The IDs that are repetitive will point
to the same node in the network
Schematic for Output summary
Follow the animation
5
30Step 4.b - Protein Molecular Interaction Network
- Output
1
?
BUILD PATHWAY
OUTPUT NETWORK
OUTPUT PATHWAY
2
3
Seed Molecules
Molecular interaction
Another Small Interaction Network
Network interaction
4
Audio Narration
Action
Description of the action
From the input given by users, the tool analyzes
the set of molecules, which are present in its
database of metabolic network. The molecules that
are found to occur most frequently are used as
seeds which connect to other such molecules.
Networks are also extended based on interactions
between two small networks to produce a larger
network. Such analysis will depend on the
parameters set by the user in the initial steps.
Based on this information, the tool will predict
the pathway to which the molecules are most
likely to belong. Further analysis of these
pathways can be carried out using metabolic
profile databases.
Schematic for Output summary
Follow the animation. Highlight the yellow boxes
in animation as well.
5
31Step 4.c - Protein Molecular Interaction Network
- Visualization
1
?
2
3
4
5
http//www.ingenuity.com/
32Step 3.d - Gene Expression Profile Data -
Visualization
1
Audio Narration
Action
Description of the action
Zoom effect
Animator needs to re-draw all screen shots as
they have been taken from the references
software. Animator must not copy the image or a
part thereof, in the animation. Show the image
with each part zooming and then coming as a
zoomed image.
The pathway information relevant in Gliomas
Studies, from the input data, can be extracted.
In this pathway, we can observe the role of
Isocitrate Dehydrogenase (IDH), in regulation of
metabolism during Glioma. Recently a published
study has also shown the involvement of IDH in
Gloma related pathways. Most such software are
linked to Protein Pathway Interaction Software,
which are described in detail in the next part of
the animation.
2
3
4
5
http//www.ingenuity.com/
33Master Layout (Part 3)
This animation consists of 3 parts Part 1 Gene
Expression Data Analysis Part 2 Protein
Interaction Data Analysis Part 3 Metabolic
Profile Databases
1
Select the level of organization of the
biological system to study
2
Select from one of the publicly available
databases
3
Select the relevant options in the database to
view the pathway network and interaction data of
the system under consideration
4
5
http//www.genome.jp/kegg/
34Definitions of the componentsPart 3 Metabolic
profile databases
1
- 1. Biological System In the biological context,
a system refers to an entity that exists with
the help of mutual interactions between its
components. - 2. Level of organization The level of
organization describes the complexity of the
biological system being studied. Components of
one system could be made up of constituent parts,
which in turn form another system at a different
level of organization. For example, a cell is a
system in itself. However for larger
physiological systems, a cell would only be a
component within it. - 3. Visualization To explore various
protein-protein interactions, it is critical to
percept lists of protein interaction data, which
is retrieved as elaborate spreadsheets that make
the analysis cumbersome. Mapping of such data in
a diagrammatic form makes it easier for
scientists to develop a biological insight into
the interaction data. - 4. Functional annotation By examining the maps
of proteinprotein interaction data, researchers
can discover new biological relationships between
proteins or predict their functions based on
specific interactions. - 5. Graphical Notation The first step in the
analysis of protein interaction data is the
identification of protein complexes and groups of
complexes. In a simple graphical notation, a
Node represents a protein while the Edges
represent the interaction between the two
proteins.
2
3
4
5
35Definitions of the componentsPart 3 Metabolic
profile databases
1
- 6. Pathway A pathway in Biology refers to a
series of inter-related metabolic reactions,
which depicts the order of conversion of one
entity to another. - 7. Meta node It is a single node onto which all
members of a protein cluster are collapsed. These
meta nodes help in deciphering biological
applications of the networks which are collapsed
as one.
2
3
4
5
36Step 1 Pathway Databases Input
1
Choose the system
ORGANISM
ENZYMES
DISEASE
PATHWAY
2
METABOLISM GENETIC INFORMATION PROCESSING ENVIRONM
ENTAL INFORMATION PROCESSING CELLULAR
PROCESSES ORGANISMAL SYSTEMS
CANCER IMMUNE SYSTEM DISEASE NEURO DEGENERATIVE
DISEASE CARDIO-VASCULAR DISEASE METABOLIC
DISEASES INFECTIOUS DISEASES
ENZYME NAME EC NUMBER SYNONYMS
PROKARYOTES PROTISTS FUNGI PLANTS ANIMALS
CANCER IMMUNE SYSTEM DISEASE NEURO DEGENERATIVE
DISEASE CARDIO-VASCULAR DISEASE METABOLIC
DISEASES INFECTIOUS DISEASES
3
4
Action
Audio Narration
Description of the action
- The pathway databases are repositories to gain a
visual insight into the biological interaction
of genes and proteins. The general features of
these databases include searching by - Pathway The entire network information in the
web based database can be searched by selecting
the metabolic pathway of interest, such as
cellular processes, genetic information flow,
etc. - Diseases Here all the networks are grouped based
on the diseases which are caused by their
modulation. - Enzymes The enzymes belonging to the pathway
database are grouped and the pathways can be
searched by giving their enzyme information as a
query. - Organism All organisms are given a unique
identifier. Users can also select the organism,
and then study the pathway as it occurs in those
organisms.
Animation of the Input search strategies for
Pathway databases
Follow the steps in the animation. Re-draw
images. The audio narration must be read, as the
cursor in the animation moves to the 4 headings
of the web-page
5
http//www.genome.jp/kegg/
37Step 2.a - Pathway Databases Visualization of
Pathways for Glioma
1
Nodes
2
3
Edges
4
Audio Narration
Description of the action
Action
Animator needs to re-draw all screen shots as
they have been taken from the references
software. Animator must not copy the image or a
part thereof., in the final animation. Display
Image. Highlight the nodes and edges as
shown in animation. The red box zooms to show the
area of the network which is getting zoomed into.
This is followed by the zoomed image of that part
of the network. Each zoomed image is followed by
the narration in the order given.
We use pathway databases to study one of the
pathways from our Glioma studies in Protein
Interaction Networks, namely Cell Cycle,
Cellular Assembly and Organization, DNA
Replication, Recombination, and Repair. Here we
highlight the nodes and edges within the pathway.
Here the nodes are the corresponding gene and the
edges are interaction between them. Users can
also find images from such visualization tools
for specific gene interaction such as in this
case we depict the interactions of TP53, derived
from Glioma studies.
Zoomed Images
5
http//www.ingenuity.com/, http//www.cytoscape.or
g/
38Step 2.c - Pathway Databases Interpretation
1
2
3
4
Audio Narration
Action
Description of the action
Options given once you click on a particular
entity of Pathway
Pathways can also be pbtained for protein
interaction networks. In such networks, the
metabolites are the nodes and the reaction
between them are the edges. Each node such as a
substrate, reactant or an enzyme is hyper-linked
to another page which gives the detailed
information about the particular entity. Each
element of the pathway including the pathway
itself is assigned an identifier for the purpose
of referring to it from anywhere in the database.
It also gives all the information related to the
molecule or reaction such as its orthology, the
pathways it belongs to and the corresponding gene
IDs.
In each slide, the tab that is high-lighted is
ACTIVE. In the animation format, the tab should
highlight when you click on it followed by the
content of the slide. Then the mouse should move
to the second tab and click on it leaving the
first tab inactive and second tab active.
Activity of tabs can be differentiated by
separate Colors
5
http//www.genome.jp/kegg/
39Step 2.d - Pathway Databases Interpretation
1
2
3
4
Action
Audio Narration
Description of the action
Options given once you click on a particular
entity of Pathway
It also gives all the enzyme related information
for the reaction such as the Enzyme nomenclature,
Enzyme Commission Number, Class of Enzyme,
substrates and products.
In each slide, the tab that is high-lighted is
ACTIVE. In the animation format, the tab should
highlight when you click on it followed by the
content of the slide. Then the mouse should move
to the second tab and click on it leaving the
first tab inactive and second tab active.
Activity of tabs can be differentiated by
separate Colors
5
http//www.genome.jp/kegg/
40Step 2.e - Pathway Databases Interpretation
1
2
3
4
Action
Audio Narration
Description of the action
Options given once you click on a particular
entity of Pathway
Re-Draw the equation. In each slide, the tab that
is high-lighted is ACTIVE. In the animation
format, the tab should highlight when you click
on it followed by the content of the slide. Then
the mouse should move to the second tab and click
on it leaving the first tab inactive and second
tab active. Activity of tabs can be
differentiated by separate Colors
The metabolic reaction that the enzyme is
involved in is also provided in its equation form
along with structures of reaction substrates.
5
http//www.genome.jp/dbget-bin/www_bget?R00960RP0
0303RC00078
41Interactivity option 1Step No 1 - Assignment
1
.gal Files
Name of Enzyme
Name of Disease
Type of Input Data
.cel Files
Name of Pathway
List of Protein Identifiers
2
.gpr Files
.sif Files
.cdt Files
3
Type of Analysis Tools
4
Results
Boundary/limits
Interactivity Type Options
Drag the yellow buttons into one amongst the 3
Analysis Tools. The correct results are given in
the next slide
If the user drags it into the right box, the
animation should flash a Tick Sign. If the box
is incorrect, flash a Cross Sign and ask the
user to Try Again
Drag and Drop.
5
42Interactivity option 1Step No 2 -RESULTS
1
.cdt Files
Name of Enzyme
2
.gpr Files
Name of Pathway
.sif Files
.gal Files
Name of Disease
List of Protein Identifiers
.cel Files
3
4
Results
Boundary/limits
Interacativity Type Options
Drag the yellow buttons into one amongst the 3
Analysis Tools. The correct results are given in
the next slide
If the user drags it into the right box, the
animation should flash a Tick Sign. If the boox
is incorrect, flash a Cross Sign and ask the
user to Try Again
Drag and Drop.
5
43Questionnaire - 1
1
1. Which amongst these is not a feature of a
Protein network? a. Edges b. Nodes c. Metanodes d.
Antinodes 2. What are the results of Gene
Expression Analysis? a. Heat Map b. Fold
Change c. P-value d. All of the Above 3.
Protein Pathways can be studied
using? a. Stand-alone tools b. Web-based
tools c. Both d. None
2
3
4
5
44Questionnaire - 2
1
4. Which is a mandatory entry to study Protein
Interaction Pathways? a. Fold Change b. p-Value c.
Unique Identifier like Accession Number d. All
of the Above 5. In case of Gene Expression Data
Analysis, Heat Map represents? a. Significance of
the Gene b. Fold Change c. p-value d. Gene
Ontology 6. Which amongst these is a valid
Microarray File Extension? a. GAL b. GPR c. CEL d.
All of the Above
2
3
4
5
45Links for further reading
- Books
- Systems Biology An Approach P Kohl1, EJ
Crampin2, TA Quinn1 and D Noble1 - An introduction to Systems Biology Design
Principles of Biological Circuits by Uri Alon
June 2006, ChapmanHall/CRC, Taylor and Francis
Group - Introduction to Systems Biology Choi, Sangdun
(California Institute of Technology) July 2007,
Humana Press - Research Papers
- Visualizing biological pathways requirements
analysis, systems evaluation, and research
agenda. Saraiya, P., North, C. Duca, K. (2005). - Tools for visually exploring biological networks.
Suderman, M. Hallett, M (2007). - A survey of visualization tools for biological
network analysis. Pavlopoulos, G.A.G., Wegener,
A.L.A. Schneider, R.R. (2008). - Visualization of omics data for systems biology
Nils Gehlenborg, Seán I ODonoghue, Nitin S
Baliga, Alexander Goesmann, Matthew A Hibbs,
Hiroaki Kitano, Oliver Kohlbacher, Heiko
Neuweger, Reinhard Schneider, Dan Tenenbaum
Anne-Claude Gavin. Nature (2010)
46Links for further reading
- Webliography
- http//www.genome.jp/kegg/
- http//www.chem.agilent.com/Library/usermanuals/Pu
blic/GeneSpring-manual.pdfhttp//www.moleculardevi
ces.com/pages/software/gn_genepix_pro.html - http//www.cytoscape.org/
- http//www.ingenuity.com/
- http//www.genego.com/metacore.php
- http//www.ece.cmu.edu/brunos/Lecture3.pdf
- http//pathways.embl.de/
- http//www.biocyc.org/
- http//www.arena3d.org/
- http//spotfire.tibco.com/
- http//www.bioconductor.org/
- http//www.chem.agilent.com/en-US/Products/softwar
e/lifesciencesinformatics/genespringgx/pages/gp347
27.aspx - http//www.cytoscape.org/download.php
47Links for further reading
- Following URLs are used for animations
- http//www.genome.jp/kegg/
- Biochemistry by A.L.Lehninger et al., 3rd edition
- http//www.ingenuity.com/
- http//www.cytoscape.org/
- http//www.genome.jp/dbget-bin/www_bget?R00960RP0
0303RC00078 - http//www.genego.com/metacore.php
- http//www.ece.cmu.edu/brunos/Lecture3.pdf
- http//pathways.embl.de/
- http//www.chem.agilent.com/Library/usermanuals/Pu
blic/GeneSpring-manual.pdfhttp//www.moleculardevi
ces.com/pages/software/gn_genepix_pro.html