Title: GO based data analysis
1GO based data analysis
Iowa State Workshop 11 June 2009
2- All tools and materials from this workshop are
available online at the AgBase database
Educational Resources link. - For continuing support and assistance please
contact - agbase_at_cse.msstate.edu
This workshop is supported by USDA CSREES grant
number MISV-329140.
3AgBase protein annotation process
Protein identifiers or Fasta format
GORetriever
Annotated Proteins
Proteins with no annotations
4Hypothesis generating
- Gene Ontology enrichment analysis
- GO terms that are statistically (Fishers
exact test) over or underrepresented in a set of
genes - Annotation Clustering
- group similar annotations based on the
hypothesis that they should have similar gene
members
5Some resources
- DAVID http//david.abcc.ncifcrf.gov/
- GOStat http//gostat.wehi.edu.au/
- EasyGO http//bioinformatics.cau.edu.cn/easygo/
- AmiGO http//amigo.geneontology.org/cgi-bin/amigo/
term_enrichment (does not use IEA) - Onto-Express OE2GO http//vortex.cs.wayne.edu/p
rojects.htm - GOEAST http//omicslab.genetics.ac.cn/GOEAST
- http//www.geneontology.org/GO.tools.shtml
- Comparison of enrichment analysis tools Nucleic
Acids Research, 2009, Vol. 37, No. 1 113 - (Tool_Comparison_09.pdf)
DAVID and EasyGO analysis included
DAVIDEasyGo.ppt
6Database for Annotation, Visualization and
Integrated Discovery
7(No Transcript)
8(No Transcript)
9http//vortex.cs.wayne.edu/ontoexpress
Onto-Express analysis instructions are Available
in onto-express.ppt
10Species represented in Onto-Express
11For uploading your own annotations use OE2GO
12Comparison
- Onto-Express , EasyGO, GOstat and DAVID
- Test set 60 randomly selected chicken genes
- Used AgBase GO annotations as baseline annotations
Vandenberg et al (BMC Bioinformatics, in review)
13(No Transcript)
14Networks Pathways
Iowa State Workshop 11 June 2009
15 Multiple data analysis platforms
Proteomics
LIST
Transcriptomics
ESTs
16Our original aim. understand biological
phenomena.
- Bits and pieces of information
- Do not have the full picture
- How do we get back to BIOLOGY in this digital
information landscape?
17What do we know about biological systems .
- biological systems are dynamic, not static
- how molecules interact is key to understanding
complex systems
18Types of interactions
- protein (enzyme) metabolite (ligand)
- metabolic pathways
- protein protein
- cell signaling pathways, protein complexes
- protein gene
- genetic networks
19STRING Database
Sod1 Mus musculus
http//string.embl.de/
20(No Transcript)
21Database/URL/FTP DIP http//dip.doe-mbi.uc
la.edu BIND http//bind.ca
MPact/MIPS http//mips.gsf.de/services/pp
i STRING http//string.embl.de MINT
http//mint.bio.uniroma2.it/mint IntAct
http//www.ebi.ac.uk/intact BioGRID
http//www.thebiogrid.org HPRD
http//www.hprd.org ProtCom
http//www.ces.clemson.edu/compbio/ProtCom 3did
, Interprets http//gatealoy.pcb.ub.es/3did/ P
ibase, Modbase http//alto.compbio.ucsf.edu/pibase
CBM ftp//ftp.ncbi.nlm.nih.gov/pub/cbm
SCOPPI http//www.scoppi.org/ iPfam
http//www.sanger.ac.uk/Software/Pfam/iPfam In
terDom http//interdom.lit.org.sg DIMA
http//mips.gsf.de/genre/proj/dima/index.html
Prolinks http//prolinks.doe-mbi.ucla.edu/cgibin/
functionator/pronav/ Predictome http//predictom
e.bu.edu/
PLoS Computational Biology March 2007, Volume 3
e42
22Pathways Networks
- A network is a collection of interactions
- Pathways are a subset of networks
- Network of interacting proteins that carry
out biological functions such as metabolism and
signal transduction - All pathways are networks of interactions
- NOT ALL NETWORKS ARE PATHWAYS
23Biological Networks
- Networks often represented as graphs
- Nodes represent proteins or genes that code for
proteins - Edges represent the functional links between
nodes (ex regulation) - Small changes in graphs topology/architecture
can result in the emergence of novel properties
24Yeast Protein-Protein Interaction Map
Nature 411, 2001, H. Jeong, et al
25Some resources
KEGG http//www.genome.jp/kegg/pathway.htm
l/ BioCyc http//www.biocyc.org/ Reactome
http//www.reactome.org/ GenMAPP
http//www.genmapp.org/ BioCarta
http//www.biocarta.com/ Pathguide the pathway
resource list http//www.pathguide
.org/
26(No Transcript)
27Pathguide Statistics
Gallus gallus is missing
28Reactome
29What is feasible with my specific dataset?
30Systems Biology Workflow
Nanduri McCarthy CAB reviews, 2008
31Systems Biology Workflow
For a given species of interest what type of
data is available???
32Retrieval of interaction datasets
-
- Evaluate PPI resources such as Predictome
- Prolinks for existence of species of
interest - If unavailable, find orthologous proteins in
- related species that have interactions!
33I have interactions what next?
- Evaluate the quality of interactions i.e. type of
method used for identification.what exactly are
these methods?
34I have interactions what next?
- Evaluate the quality of interactions i.e. type of
method used for identification.what exactly are
these methods?
STRING Database
35 PPI Identification
Computational
Experimental
Phylogenetic profile
Yeast two hybrid
Yeast two hybrid (Y2H)
Gene Cluster
TAP assays
TAP assays
Sequence coevolution
Gene Coexpression
Rosetta stone method
Protein arrays
Text mining
PLoS Computational Biology March 2007, Volume 3
e42
36PPI database comparisons
Proteins Structure, Function and Bioinformatics
63490-500 2006
37I have interactions what next?
- Evaluate the quality of interactions i.e. type of
method used for identification.what exactly are
these methods? - Visualize these interactions as a network and
analyze - what are the available tools?