Web Services for N-Glycosylation Process - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Web Services for N-Glycosylation Process

Description:

International Symposium on Web Services For Computational Biology and ... BUDDI BioUDDI is envisioned as the yellow pages' for all WS in life sciences ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 21
Provided by: satyassaho
Category:

less

Transcript and Presenter's Notes

Title: Web Services for N-Glycosylation Process


1
Web Services for N-Glycosylation Process
Satya S. Sahoo, Amit P. Sheth, William S. York,
John A. Miller
Presentation at International Symposium on Web
Services For Computational Biology and
Bioinformatics, VBI, Blacksburg, VA, May 26-27,
2005
Integrated Technology Resource for Biomedical
Glycomics NCRR/NIH
2
Glycomics
  • Study of structure, function and quantity of
    complex carbohydrate synthesized by an organism
  • Carbohydrates added to basic protein structure -
    Glycosylation

Folded protein structure (schematic)
3
Glycosylation why is it important?
  • Genome (comprised of DNA) or Proteome (proteins)
    are not the only factors in life functions of an
    organism
  • Carbohydrates attached to different protein
    structures (by glycosylation) are important for
  • Identification of foreign entities by immune
    system cells
  • Markers to accurately diagnose diseases
  • Regulate signaling activities
  • Categorization of glycosylation - the way
    carbohydrates are attached to proteins. Example
    N-glycosylation

4
N-Glycosylation Process (NGP)
Cell Culture
By N-glycosylation Process, we mean the
identification and quantification of glycopeptides
extract
Glycoprotein Fraction
proteolysis
Glycopeptides Fraction
1
Separation technique I
n
Glycopeptides Fraction
PNGase
n
Peptide Fraction
Separation technique II
nm
Peptide Fraction
Mass spectrometry
ms data
ms/ms data
Data reduction
Data reduction
ms peaklist
ms/ms peaklist
binning
Peptide identification
Glycopeptide identification and quantification
Peptide list
N-dimensional array
Data correlation
Signal integration
5
NGP part of the Bioinformatics coreIntegrated
Technology Resource for Biomedical Glycomics
  • This Resource was established by the National
    Center for Research Resources
  • The aim is to develop the tools and technology to
    analyze glycoprotein and glycolipid expression of
    embryonic stem cells
  • Our research provides bioinformatics support for
    four research groups
  • Embryonic Stem Cell Culture Program
  • Glycomic Analysis of Glycoproteins
  • Glycomic Analyses of Glycosphingolipids and
    Sphingolipids
  • Transcript analysis by kinetic RT-PCR

6
NGP need in Glycomics
  • Unlike proteomics or genomics, high-throughput
    experimental protocols are still being
    established in Glycomics
  • NGP involves a multitude of heterogeneous tasks,
    including human-mediated tasks
  • NGP attempts to encapsulate particular
    computational steps as platform-independent,
    scalable and Web-accessible tools Web Services
  • Enables glycobiologists to integrate automated
    data generation tasks with data processing tools
    (Web Services) end-to-end
    experimental lifecycle

7
N-Glycosylation identification - Problems
  • Extremely difficult to identify glycosylated
    peptide sequences using standard analytical
    methods
  • N-glycosylation occurs at particular sites on the
    protein structure consensus sequences

Asparagine
Aspartate
Consensus Sequence
Peptide
X
S/T
N
D
J
PNGaseF
Glycan
An example glycopeptide (schematic)
8
NGP - implementation
  • NGP,currently,implements a Web Process
    constituted of two Web Services
  • DB Modifier Web Service modifies the search
    database by replacing N (in consensus sequences)
    by J
  • Collator Web Service identifies a probable
    N-glycosylated peptide, using three parameters
  • Calculated molecular mass
  • Presence of J in a peptide sequence
  • MASCOT Score assigned to a hit
  • NGP also involves propriety Mass Spectrometer
    search engine service (MASCOT) as an
    intermediate task
  • Hence, NGP Web Process identifies probable
    glycosylated peptides enabling rapid processing
    of data from high throughput experiment

http//www.matrixscience.com/
9
NGP Architecture (current)
PEAK LIST FILE
ms/ms raw data
Primary Sequence Database
ModifyDB Web Service
MASCOT Mass Spectrometer Search Engine
Collator Web Service
MASCOT output file (contains both glycosylated
and non-glycosylated peptide sequences)
Deglycosylated peptide list
http//www.matrixscience.com/
10
NGP Results
q1_p1-1 q2_p10,626.349945,-0.023321,2,APGVAGR,18
,000000000,1.49,00020000000000000,0,0"gi51465537
"01901961 q2_p21,626.361191,-0.034567,2,APARG
R,18,00000000,1.33,00020000000000000,0,0"gi10140
845"0272 q2_p30,626.349945,-0.023321,2,APAVGG
R,18,000000000,1.33,00020000000000000,0,0"gi5147
0766"02122181,"gi51470768"02122181 q3_p3
0,634.368973,0.006151,4,DIIFK,12,0000000,25.26,000
10020000000000,0,0"gi47078238"03643682,"gi4
7078240"03283322 q3_p40,634.351227,0.023897,4
,MPLFK,12,0000000,25.24,00010020000000000,0,0"gi
41197108"095991,"gi4557311"0152 q3_p50,6
34.343811,0.031313,3,NNLFK,12,0000000,15.34,000100
20000000000,0,0"gi31377725"05395431 q3_p60,
634.368973,0.006151,3,LDIFK,12,0000000,15.34,00010
020000000000,0,0"gi39725634"08918951 q3_p70
,634.343811,0.031313,3,NNIFK,12,0000000,15.34,0001
0020000000000,0,0"gi7661646"02122161 q3_p80
,634.368973,0.006151,3,LDLFK,12,0000000,15.34,0001
0020000000000,0,0"gi51474898"02372411 q3_p9
0,634.368958,0.006166,3,EVIFK,12,0000000,13.61,000
10020000000000,0,0"gi28376662"067711 q3_p10
0,634.368958,0.006166,3,VELFK,12,0000000,13.61,000
10020000000000,0,0"gi51467300"04934971,"gi5
1467535"0991031 q4_p1-1 q5_p10,662.375122,0.
004702,5,DLLFR,14,0000000,18.41,00020020000000000,
0,0"gi21536369"084881,"gi21536367"01721
1,"gi4557871"06476511 q5_p20,662.375122,0.00
4702,3,DLFLR,14,0000000,12.81,00010020000000000,0,
0"gi33695153"04074111,"gi4504043"0330334
1,"gi11968045"06101 q5_p30,662.375122,0.004
702,3,DIFIR,14,0000000,12.81,00010020000000000,0,0
"gi4505725"09249281,"gi29788751"01170117
41 q5_p40,662.349960,0.029864,3,NNFIR,14,0000000
,11.84,00010020000000000,0,0"gi24416002"06676
711 q5_p50,662.375122,0.004702,4,IDLFR,14,000000
0,9.98,00020020000000000,0,0"gi12957488"06026
061,"gi41148707"05365401,"gi51464463"0646
6501 q5_p60,662.375122,0.004702,4,LDLFR,14,0000
000,9.98,00020020000000000,0,0"gi42657517"0335
3391 q5_p70,662.375107,0.004717,4,VELFR,14,0000
000,9.98,00020020000000000,0,0"gi6912230"0436
4401 q5_p80,662.375122,0.004702,4,LDIFR,14,00000
00,9.98,00020020000000000,0,0"gi8922081"02699
27031 q5_p90,662.349960,0.029864,4,NLNFR,64,0000
000,5.89,00010020000000000,0,0"gi19923416"0816
8201 q5_p101,662.361191,0.018633,2,NRFAR,14,000
0000,3.37,00010020000000000,0,0"gi4758704"097
1011 q6_p10,674.359863,-0.006639,4,VSDNIK,35,000
00000,11.27,00010020000000000,0,0"gi32130516"0
9359401 q6_p20,674.323456,0.029768,5,EGDLGGK,21
,000000000,7.97,00020020000000000,0,0"gi13569928
"0105810641 q6_p30,674.359848,-0.006624,5,EAT
VAGK,21,000000000,7.88,00020020000000000,0,0"gi5
1475822"05275331 q6_p41,674.389740,-0.036516,
3,QRMLK,14,0000000,7.46,00020010000000000,0,0"gi
24307905"04674712,"gi24307905"06386422 q6
_p50,674.359863,-0.006639,5,LSSSPGK,56,000000000,
7.38,00000020000000000,0,0"gi8922075"0806812
1 q6_p60,674.338730,0.014494,4,WDLGGK,42,00000000
,6.40,00010020000000000,0,0"gi13375817"012312
81 q6_p70,674.359879,-0.006655,4,QATDLK,56,00000
000,6.21,00020010000000000,0,0"gi21361684"0451
4561 q6_p81,674.371094,-0.017870,3,QTNKGK,14,00
000000,6.03,00020010000000000,0,0"gi41117716"0
85901 q6_p91,674.389740,-0.036516,6,QMRIK,28,00
00000,5.77,00020020000000000,0,0"gi28329439"02
692731,"gi28558993"02782821 q6_p101,674.38
9740,-0.036516,6,QMRLK,28,0000000,5.77,00020020000
000000,0,0"gi40255096"03003041 q7_p10,695.3
48969,0.007855,4,YDASLK,14,00000000,8.86,000200200
00000000,0,0"gi4758454"0276127661
  • A typical MASCOT output file is about 3MB!
  • High-throughput experiment protocol generate
    thousands of such files - manual identification
    is not feasible

11
NGP Web Services Adding Semantics
  • Two Ontologies developed as part of the
    NCRR-Glycomics project
  • GlycO a domain Ontology embodying knowledge of
    the structure and metabolisms of glycans
  • Contains 770 classes describe structural
    features of glycans
  • URL http//lsdis.cs.uga.edu/projects/glycomics/gl
    yco
  • ProPreO a comprehensive process Ontology
    modeling experimental proteomics
  • Contains 296 classes
  • Models three phases of experimental proteomics
    Separation techniques, Analytical techniques and,
    Data analysis
  • URL http//lsdis.cs.uga.edu/projects/glycomics/pr
    opreo

http//pedro.man.ac.uk/uml.html (PEDRO UML
schema)
12
ProPreO - Experimental Proteomics Process Ontology
  • ProPreO models the phases of proteomics
    experiment using five fundamental concepts
  • Data (Example a peaklist file from ms/ms raw
    data)
  • Data_processing_applications (Example MASCOT
    search engine)
  • Hardware embodies instrument types used in
    proteomics (Example ABI_Voyager_DE_Pro_MALDI_TOF)
  • Parameter_list describes the different types of
    parameter lists associated with experimental
    phases
  • Task (Example component separation, used in
    chromatography)

http//www.matrixscience.com/
13
Service description using WSDL-S
  • Formalize description and classification of Web
    Services using ProPreO concepts

lt?xml version"1.0" encoding"UTF-8"?gt ltwsdldefin
itions targetNamespace"urnngp"
.. xmlnsxsd"http//www.w3.org/2001/XMLSchema"gt
ltwsdltypesgt ltschema targetNamespace"urnngp
xmlns"http//www.w3.org/2001/XMLSchema"gt
.. lt/complexTypegt lt/schemagt lt/wsdltypesgt
ltwsdlmessage name"replaceCharacterRequest"gt
ltwsdlpart name"in0" type"soapencstring"/gt
ltwsdlpart name"in1" type"soapencstring"/
gt ltwsdlpart name"in2" type"soapencstring
"/gt lt/wsdlmessagegt ltwsdlmessage
name"replaceCharacterResponse"gt ltwsdlpart
name"replaceCharacterReturn" type"soapencstring
"/gt lt/wsdlmessagegt
lt?xml version"1.0" encoding"UTF-8"?gt ltwsdldefin
itions targetNamespace"urnngp"
xmlns wssem"http//www.ibm.com/xmlns/WebServ
ices/WSSemantics" xmlns ProPreO"http//lsdis.cs
.uga.edu/ontologies/ProPreO.owl" gt
ltwsdltypesgt ltschema targetNamespace"urnngp"
xmlns"http//www.w3.org/2001/XMLSchema"gt
lt/complexTypegt lt/schemagt lt/wsdltypesgt
ltwsdlmessage name"replaceCharacterRequest"
wssemmodelReference"ProPreOpeptide_sequence"gt
ltwsdlpart name"in0" type"soapencstring"/
gt ltwsdlpart name"in1" type"soapencstring
"/gt ltwsdlpart name"in2"
type"soapencstring"/gt lt/wsdlmessagegt
Description of a Web Service using Web Service D
escription Language
data
sequence
peptide_sequence
Concepts defined in process Ontology
ProPreO process Ontology
WSDL ModifyDB
WSDL-S ModifyDB
14
Biological UDDI (BUDDI) WS Registry for
Proteomics and Glycomics
  • There are no current registries that use semantic
    classification of Web Services in glycoproteomics
  • BUDDI classification based on proteomics and
    glycomics classification part of integrated
    glycoproteomics Web Portal called Stargate
  • NGP to be published in BUDDI
  • Can enable other systems such as myGrid to use
    NGP Web Services to build a glycomics workbench

15
Conclusions
  • As part of NCRR Integrated Technology Resource
    for Biomedical Glycomics, we implemented a
    Semantic Web Process for high throughput
    glycomics in open, web-centric environment
  • Large domain specific ontologies with process
    (ProPreO) and domain (GlycO) knowledge concepts
    was used to describe and classify Web Services
    at Semantic level
  • Used proposed Semantic Web Service specification
    (WSDL-S) to add semantics to Web Service
    description
  • Biological UDDI (BUDDI) part of Stargate is
    being developed as a single-window resource to
    discover and publish Web Services in
    glycoproteomics domain

16
Resources
  • NCRR (Integrated Technology Resource for
    Biomedical Glycomics) http//cell.ccrc.uga.edu/wo
    rld/glycomics/glycomics.php
  • Bioinformatics core of Glycomics project
    http//lsdis.cs.uga.edu/projects/glycomics/
  • ProPreO process Ontology http//lsdis.cs.uga.edu/
    projects/glycomics/propreo/
  • GlycO domain Ontology
  • http//lsdis.cs.uga.edu/projects/glycomics/gly
    co/
  • Stargate GlycoProteomics Web Portal
  • http//128.192.9.86/stargate
  • WSDL-S joint UGA-IBM technical note
  • http//lsdis.cs.uga.edu/library/download/WSDL-
    S-V1.pdf

17
Acknowledgement
Special Thanks James Atwood (CCRC,
UGA) Meenakshi Nagarajan (LSDIS Lab, UGA) Blake
Hunter (LSDIS Lab, UGA)
18
Extra Slides Stargate subsystems a bit of
detail
  • BUDDI BioUDDI is envisioned as the yellow
    pages for all WS in life sciences
  • The classification of WS uses biological taxonomy
  • Open resource for the worldwide community of life
    sciences research
  • Format Converter Enables conversion of two
    available representation formats into a xml-based
    representation
  • IUPAC to LINUCS to GLYDE (a xml-based
    representation)
  • Web Service Generator Enables existing java
    application to be exposed as Web Services
  • Generates required files from a java application
    to allow deployment as a Web Service
  • Enable the newly generated Web Service to be
    published on BioUDDI

19
Extra Slides Stargate subsystems a bit of
detail
  • Group Forum Members of the research group use
    it to foster a sense of community
  • Schedule meetings, discuss issues, collaborate on
    papers
  • Post papers for peer reviews, publications on
    relevant topic
  • Stargate Search is an integrated unit of the
    Stargate
  • Enables search for research publication within
    the research group
  • Enables search on the internet
  • Login Allows restrictions on accessibility of
    selected parts of Stargate

20
Extra Slides The take home message
Internet
Forum
Search
Web Service Generator
BUDDI
Write a Comment
User Comments (0)
About PowerShow.com