Title: Considering protein interaction sites in coexpression networks and a tool called intersite
1Considering protein interaction sites in
co-expression networks and a tool called intersite
- In order to analyze a co-expression network of
Affy probes, putative protein product structure
was considered via number of interaction sites.
A web based tool called Intersite was developed
as a Group Decision Support System for annotating
the interaction sites on proteins.
2Background
- Understanding Protein Function on a Genome scale
using networks - First Annual Midwest Computational Biology and
Bioinformatics Symposium at Northwestern
University, 9/2007 - Talk by Dr. Mark Gerstein Yale University
- Problems with using pathway node degree
(hub-ness) as an indicator for essentiality - Need to consider interact-ability
- via number of interaction sites on a protein
product - Showed more correlation between essentiality
(slower evolution rate) and network between-ness
than between essentiality and node degree
3Background
- Yeates, Todd O and Beeby, Morgan.
- Proteins in a Small World.
- Science 22 Dec 2006 Vol. 314. no. 5807, pp. 1882
1883
4Background
- Kim PM, Lu LJ, Xia Y, Gerstein MB.
- Relating three-dimensional structures to protein
networks provides evolutionary insights. - Science. 2006 Dec 22 314(5807)1938-41.
Fig. 1. The creation of the structural
interaction network (SIN) data set. All
interactions from the filtered protein
interaction data set are mapped to Pfam domains
(30). The Pfam domains are mapped to known
structures of protein interactions by means of
iPfam (31). Only those interactions in which both
interaction partners (or a homologous domain of
either) can be found in a 3D structure of a
protein complex are kept. All interactions are
then classified into mutually exclusive and
simultaneously possible by 3D structural
exclusion. When a protein has more than one
simultaneously possible interaction, the number
of interaction interfaces is counted.
5The Idea
- Facilitate the same analysis for other networks.
- ie, coexpression networks
- Incorporate human decision making
- Incorporate genomic/primary sequence data
6A co-expression network
- Cramer GR, Ergül A, Grimplet J, Tillett RL,
Tattersall EA, Bohlman MC, Vincent D, Sonderegger
J, Evans J, Osborne C, Quilici D, Schlauch KA,
Schooley DA, Cushman JC. - "Water and salinity stress in grapevines early
and late changes in transcript and metabolite
profiles. - Funct Integr Genomics. 2007 Apr7(2)111-34. Epub
2006 Nov 29. - Data available as experiment VV2 on plexdb.org
- 4 timepoints
- X 3 treatments
- X 3 replicates
- 3 controls at time 0
- 39 Affy hybridizations with RMA normalization
- X 16,602 Affy grape probes
7A co-expression network
- A program was written that, among other things
- Built a correlation network based on an
arbitrarily set cutoff value for edge assignment - Thinned the network by dropping those edges
associated with first and second order partial
correlations not significantly different from
zero - according to de la Fuente, et al.
Bioinformatics. Vol. 20, No. 18. Pp. 3565-3574,
2004.
8A co-expression network
9A co-expression network
- Zero-order Pearson correlation networks were
constructed with the following characteristics
10A co-expression network
- Related networks were created, with edges removed
which are associated with a first-order partial
correlation not significantly different from
zero.
11Probe ID -gt a protein product
- Map file from Wengang Zhou
- Difficult problem!
OneMany
ManyMany
Grape Probe ID
AT Gene Name
UniProt AC
12Interactions?
- Kim el al. used
- Pfam
- Protein information based on a given network
- Finn, et al. Pfam clans, web tools and
services Bioinformatics. 35D 2005. - iPfam
- A database within a database containing
pairwise interaction information between proteins
and even their respective residues. - Finn, Marhsall, Bateman. iPfam visualization
of protein-protein interactions in PDB at domain
and amino acid resolutions. Bioinformatics. 21
3 2005. - But
13Im on my own
14Incorporate human decision making
- Group Decision Support Systems
- Electronically integrated decision help conducted
by and for many people at once. - InterSite
- a Group Decision Support System for the
annotation and classification of proteins based
on their number and types of interaction sites
15Intersite Database Schema
16Intersite Data Flow Diagram
Login
Enter UniProt ACs
17Fetch and store GenBank data
18Incorporate human decision making
19Logged in
20View Proteins
21Map External IDs
22Application to smallest VV2 network
23Incorporate genomic data Site Similarity Network
Site similarity is also used to assist decision
making
24Annotate/Classify
25GenBank File doesnt know it all
26User-defined sites
27Try it
- Goto http//lab.bcb.iastate.edu/sandbox/jlv/inters
ite/ - Log in
- Use the file linked with VV2 UniProt ACs
- Paste in the top 20 Acs
- Put on your Biochemistry hat and vote!
- Use the tools to help make a decision
- EBI fetch
- PIRSF scan
- UniProt record
- GenBank file
28Results Number of Interactions
29Results Number of Interactions
30Result Site similarity network
31Results Site similarity network
32Results Site similarity network
33Results Site similarity network
34Results Site similarity network
35Results Site similarity network
36Results Site similarity network
37Discussion
- This project was an interesting exercise in
utilizing multiple Bioinformatics tools (and
creating some new ones) in order to ask and
answer questions relating sequence and structure.
Future work may include separate analysis of the
groups of VV probes separated by their associated
proteins numbers of interaction sites. Also,
like in 1, between-ness, closeness, hub-ness,
and hierarchy level of the proteins in the
pathways can also be calculated. - A useful outcome from the project is the ability
to build protein networks based on interaction
site similarity and visualize such clusters. For
the data used here, nucleus, membrane, and
secreted proteins share unique interaction site
sequences. This is probably due to the fact that
nucleus proteins must be delivered to the nucleus
using signal sequences and special chaperone
interaction sites. The same is true for secreted
proteins and membrane proteins. Further analysis
could consider the rest of the locations
specified for proteins with multiple subcellular
location values. - Availability
- Intersite is available online at
http//lab.bcb.iastate.edu/sandbox/jlv/intersite
. More details on this project and all the
scripts used can be found at http//www.public.ias
tate.edu/jlv/intersite.shtml
38References
- 1 Gerstein, Mark. Presentation.
Understanding Protein Function on a Genome scale
using networks. First Annual Midwest
Computational Biology and Bioinformatics
Symposium, Northwestern University. - 2 Kim PM, Lu LJ, Xia Y, Gerstein MB. Relating
three-dimensional structures to protein networks
provides evolutionary insights. Science. 2006
Dec 22 314(5807)1938-41. - 4 Finn, et al. Pfam clans, web tools and
services Bioinformatics. 35D 2005. - 3 Finn, Marhsall, Bateman. iPfam
visualization of protein-protein interactions in
PDB at domain and amino acid resolutions.
Bioinformatics. 21 3 2005. - 5 Cramer GR, Ergül A, Grimplet J, Tillett RL,
Tattersall EA, Bohlman MC, Vincent D, Sonderegger
J, Evans J, Osborne C, Quilici D, Schlauch KA,
Schooley DA, Cushman JC. "Water and salinity
stress in grapevines early and late changes in
transcript and metabolite profiles. Funct
Integr Genomics. 2007 Apr7(2)111-34. Epub 2006
Nov 29. - 6de la Fuente, et al. Bioinformatics. Vol. 20,
No. 18. Pp. 3565-3574, 2004. - 8 Wu, Cathy H, et al. The iProClass
integerated database for protein functional
analysis. Computational Biology and Chemistry.
28 (2004) 87-96. - 9 Liu, Hongfang, et al. BioThesaurus a
web-based thesaurus of protein and gene names.
Bioinformatics. 22 1 2006. pp103-105.