Title: Exploring WilliamsBeuren Syndrome using myGrid
1Exploring Williams-Beuren Syndrome using myGrid
R.D. Stevens,a H.J. Tipney,b C.J. Wroe,a T.M.
Oinn,c M. Senger,c P.W. Lord,a C.A. Goble,a A.
Brass,a M. Tassabehji b
b University of Manchester, Academic Unit of
Medical Genetics St Marys Hospital
c European Bioinformatics Institute Wellcome
Trust Genome Campus Hinxton
a Department of Computer Science University of
Manchester
2Williams-Beuren Syndrome (WBS)
- Congenital disorder caused by sporadic gene
deletion - 1/20,000 live births
- Effects multiple systems muscular, nervous,
circulatory - Characteristic facial features
- Unique cognitive profile
- Mental retardation (IQ 40-100, mean60, normal
mean 100 ) - Outgoing personality, friendly nature, charming
- Haploinsuffieciency of the region results in the
phenotype
3Williams-Beuren Syndrome Microdeletion
Eicher E, Clark R She, X An Assessment of the
Sequence Gaps Unfinished Business in a Finished
Human Genome. Nature Genetics Reviews (2004)
5345-354 Hillier L et al. The DNA Sequence of
Human Chromosome 7. Nature (2003) 424157-164
C-cen
A-cen
B-cen
C-mid
B-mid
A-mid
B-tel
A-tel
C-tel
WBSCR1/E1f4H
WBSCR5/LAB
GTF2IRD1
WBSCR21
WBSCR18
WBSCR22
WBSCR14
POM121
GTF2IRD2
BCL7B
BAZ1B
NOLR1
GTF2I
FKBP6
CYLN2
CLDN4
CLDN3
STX1A
LIMK1
NCF1
RFC2
TBL2
FZD9
ELN
4Filling a genomic gap in Silico
- Identify new, overlapping sequence of interest
- Characterise the new sequence at nucleotide and
amino acid level
Cutting and pasting between numerous web-based
services i.e. BLAST, InterProScan etc
5Filling a genomic gap in silico
- Frequently repeated info rapidly added to
public databases - Time consuming and mundane
- Dont always get results
- Huge amount of interrelated data is produced
handled in notebooks and files saved to local
hard drive - Much knowledge remains undocumented
- Bioinformatician does the analysis
- Advantages
- Specialist human intervention at every step,
quick and easy access - to distributed services
- Disadvantages
- Labour intensive, time consuming, highly
repetitive and error prone - process, tacit procedure so difficult to share
both protocol and - results
6Why Workflows and Services?
- Workflow general technique for describing and
enacting a process - Workflow describes what you want to do, not how
you want to do it - Web Service how you want to do it
- Web Service automated programmatic internet
access to applications - Automation
- Capturing processes in an explicit manner
- Tedium! Computers dont get bored/distracted/hungr
y/impatient! - Saves repeated time and effort
- Modification, maintenance, substitution and
personalisation - Easy to share, explain, relocate, reuse and build
- Available to wider audience dont need to be a
coder, just need to know how to do Bioinformatics
- Releases Scientists/Bioinformaticians to do other
work - Record
- Provenance what the data is like, where it came
from, its quality - Management of data (LSID - Life Science
Identifiers)
7myGrid
- E-Science pilot research project funded by EPSRC
www.mygrig.org.uk - Manchester, Newcastle, Sheffield, Southampton,
Nottingham, EBI and RFCGR, also industrial
partners. - targeted to develop open source software to
support personalised in silico experiments in
biology on a grid. -
www.mygrid.org.uk
Which means. Distributed computing machines,
tools, databanks, people Personalisation Proven
ance and Data management Enactment and
notification A virtual lab workbench, a
toolkit which serves life science communities.
8Workflow Components
Freefluo
Freefluo Workflow engine to run workflows
Scufl Simple Conceptual Unified Flow
Language Taverna Writing, running workflows
examining results SOAPLAB Makes applications
available
9Williams Workflow Plan
RepeatMasker
BLASTwrapper
Pink Outputs/inputs of a service Purple
Tailor-made services Green Emboss soaplab
services Yellow Manchester soaplab services
GenBank Accession No
Promotor Prediction
URL inc GB identifier
TF binding Prediction
Translation/sequence file. Good for records and
publications
prettyseq
Regulation Element Prediction
GenBank Entry
Amino Acid translation
Sort for appropriate Sequences only
Identifies PEST seq
epestfind
Identify regulatory elements in genomic sequence
Seqret
Identifies FingerPRINTS
pscan
MW, length, charge, pI, etc
Nucleotide seq (Fasta)
pepstats
6 ORFs
Predicts Coiled-coil regions
RepeatMasker
pepcoil
tblastn Vs nr, est, est_mouse, est_human
databases. Blastp Vs nr
GenScan
Coding sequence
BlastWrapper
Restriction enzyme map
restrict
SignalP TargetP PSORTII
sixpack
Predicts cellular location
transeq
CpG Island locations and
cpgreport
Identifies functional and structural
domains/motifs
InterPro
RepeatMasker
Repetitive elements
ORFs
Hydrophobic regions
Pepwindow? Octanol?
Blastn Vs nr, est databases.
ncbiBlastWrapper
10The Williams Workflows
A
B
C
A Identification of overlapping sequence B
Characterisation of nucleotide sequence C
Characterisation of protein sequence
11The Workflow Experience
Have workflows delivered on their promise?
YES!
- Correct and Biologically meaningful results
- Automation
- Saved time, increased productivity
- Process split into three, you still require
humans! - Sharing
- Other people have used and want to develop the
workflows - Change of work practises
- Post hoc analysis. Dont analyse data piece by
piece receive all data all at once - Data stored and collected in a more standardised
manner - Results amplification
- Results management and visualisation
12The Workflow Experience
- Activation energy versus Reusability trade-off
- Lack of available services, levels of
redundancy can be limited - But once available can be reused for the greater
good of the community - Licensing of Bioinformatics Applications
- Means cant be used outside of licensing body
- No license access third-party websites
- Instability of external services
- Research level
- Reliant on other peoples servers
- Taverna can retry or substitute before graceful
failure - Shims
13Shims
shim (sh m) n. A thin, often tapered piece of
material used to fill gaps, make something level,
or adjust something to fit properly. shimmed,
shimming, shims To fill in, level, or adjust by
using shims or a shim.
- Explicitly capturing the process
- Unrecorded steps which arent realised until
attempting to build something - Enable services to fit together
14Shims
15The Biological Results
Four workflow cycles totalling 10 hours The gap
was correctly closed and all known features
identified
WBSCR14
ELN
CTA-315H11
CTB-51J22
16Conclusions
- It works a new tool has been developed which is
being utilised by biologists - More regularly undertaken, less mundane, less
error prone - Once notification is installed wont even need to
initiate it - More systematic collection and analysis of
results - Increased productivity
- Services only as good as the individual
services, lots of them, we dont own them, many
are unique and at a single site, research level
software, reliant on other peoples services,
licenses - Activation energy
17Future Directions
- Scheduling and Notification
- Portals
- Results visualisation
- Re-use other genomic disorders, Graves Disease
18Acknowledgments
- Dr May Tassabehji
- Prof Andy Brass
- Medical Genetics team at St Marys Hospital,
Manchester - Wellcome Trust
19myGrid People
www.mygrid.org.uk
Core Matthew Addis, Nedim Alpdemir, Tim Carver,
Rich Cawley, Neil Davis, Alvaro Fernandes, Justin
Ferris, Robert Gaizaukaus, Kevin Glover, Carole
Goble, Chris Greenhalgh, Mark Greenwood, Yikun
Guo, Ananth Krishna, Peter Li, Phillip Lord,
Darren Marvin, Simon Miles, Luc Moreau, Arijit
Mukherjee, Tom Oinn, Juri Papay, Savas
Parastatidis, Norman Paton, Terry Payne, Matthew
Pockock Milena Radenkovic, Stefan
Rennick-Egglestone, Peter Rice, Martin Senger,
Nick Sharman, Robert Stevens, Victor Tan, Anil
Wipat, Paul Watson and Chris Wroe. Users Simon
Pearce and Claire Jennings, Institute of Human
Genetics School of Clinical Medical Sciences,
University of Newcastle, UK Hannah Tipney, May
Tassabehji, Andy Brass, St Marys Hospital,
Manchester, UK Postgraduates Martin Szomszor,
Duncan Hull, Jun Zhao, Pinar Alper, John Dickman,
Keith Flanagan, Antoon Goderis, Tracy Craddock,
Alastair Hampshire Industrial Dennis Quan, Sean
Martin, Michael Niemi, Syd Chapman (IBM) Robin
McEntire (GSK) Collaborators Keith Decker