Exploring WilliamsBeuren Syndrome using myGrid - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Exploring WilliamsBeuren Syndrome using myGrid

Description:

Exploring Williams-Beuren Syndrome using myGrid. R.D. Stevens,a H.J. Tipney,b C.J. ... Outgoing personality, friendly nature, charming' ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 20
Provided by: hannah102
Category:

less

Transcript and Presenter's Notes

Title: Exploring WilliamsBeuren Syndrome using myGrid


1
Exploring Williams-Beuren Syndrome using myGrid
R.D. Stevens,a H.J. Tipney,b C.J. Wroe,a T.M.
Oinn,c M. Senger,c P.W. Lord,a C.A. Goble,a A.
Brass,a M. Tassabehji b
b University of Manchester, Academic Unit of
Medical Genetics St Marys Hospital
c European Bioinformatics Institute Wellcome
Trust Genome Campus Hinxton
a Department of Computer Science University of
Manchester
2
Williams-Beuren Syndrome (WBS)
  • Congenital disorder caused by sporadic gene
    deletion
  • 1/20,000 live births
  • Effects multiple systems muscular, nervous,
    circulatory
  • Characteristic facial features
  • Unique cognitive profile
  • Mental retardation (IQ 40-100, mean60, normal
    mean 100 )
  • Outgoing personality, friendly nature, charming
  • Haploinsuffieciency of the region results in the
    phenotype

3
Williams-Beuren Syndrome Microdeletion
Eicher E, Clark R She, X An Assessment of the
Sequence Gaps Unfinished Business in a Finished
Human Genome. Nature Genetics Reviews (2004)
5345-354 Hillier L et al. The DNA Sequence of
Human Chromosome 7. Nature (2003) 424157-164
C-cen
A-cen
B-cen
C-mid
B-mid
A-mid
B-tel
A-tel
C-tel
WBSCR1/E1f4H
WBSCR5/LAB
GTF2IRD1
WBSCR21
WBSCR18
WBSCR22
WBSCR14
POM121
GTF2IRD2
BCL7B
BAZ1B
NOLR1
GTF2I
FKBP6
CYLN2
CLDN4
CLDN3
STX1A
LIMK1
NCF1
RFC2
TBL2
FZD9
ELN
4
Filling a genomic gap in Silico
  • Identify new, overlapping sequence of interest
  • Characterise the new sequence at nucleotide and
    amino acid level

Cutting and pasting between numerous web-based
services i.e. BLAST, InterProScan etc
5
Filling a genomic gap in silico
  • Frequently repeated info rapidly added to
    public databases
  • Time consuming and mundane
  • Dont always get results
  • Huge amount of interrelated data is produced
    handled in notebooks and files saved to local
    hard drive
  • Much knowledge remains undocumented
  • Bioinformatician does the analysis
  • Advantages
  • Specialist human intervention at every step,
    quick and easy access
  • to distributed services
  • Disadvantages
  • Labour intensive, time consuming, highly
    repetitive and error prone
  • process, tacit procedure so difficult to share
    both protocol and
  • results

6
Why Workflows and Services?
  • Workflow general technique for describing and
    enacting a process
  • Workflow describes what you want to do, not how
    you want to do it
  • Web Service how you want to do it
  • Web Service automated programmatic internet
    access to applications
  • Automation
  • Capturing processes in an explicit manner
  • Tedium! Computers dont get bored/distracted/hungr
    y/impatient!
  • Saves repeated time and effort
  • Modification, maintenance, substitution and
    personalisation
  • Easy to share, explain, relocate, reuse and build
  • Available to wider audience dont need to be a
    coder, just need to know how to do Bioinformatics
  • Releases Scientists/Bioinformaticians to do other
    work
  • Record
  • Provenance what the data is like, where it came
    from, its quality
  • Management of data (LSID - Life Science
    Identifiers)

7
myGrid
  • E-Science pilot research project funded by EPSRC
    www.mygrig.org.uk
  • Manchester, Newcastle, Sheffield, Southampton,
    Nottingham, EBI and RFCGR, also industrial
    partners.
  • targeted to develop open source software to
    support personalised in silico experiments in
    biology on a grid.

www.mygrid.org.uk
Which means. Distributed computing machines,
tools, databanks, people Personalisation Proven
ance and Data management Enactment and
notification A virtual lab workbench, a
toolkit which serves life science communities.
8
Workflow Components
Freefluo
Freefluo Workflow engine to run workflows
Scufl Simple Conceptual Unified Flow
Language Taverna Writing, running workflows
examining results SOAPLAB Makes applications
available
9
Williams Workflow Plan
RepeatMasker
BLASTwrapper
Pink Outputs/inputs of a service Purple
Tailor-made services Green Emboss soaplab
services Yellow Manchester soaplab services
GenBank Accession No
Promotor Prediction
URL inc GB identifier
TF binding Prediction
Translation/sequence file. Good for records and
publications
prettyseq
Regulation Element Prediction
GenBank Entry
Amino Acid translation
Sort for appropriate Sequences only
Identifies PEST seq
epestfind
Identify regulatory elements in genomic sequence
Seqret
Identifies FingerPRINTS
pscan
MW, length, charge, pI, etc
Nucleotide seq (Fasta)
pepstats
6 ORFs
Predicts Coiled-coil regions
RepeatMasker
pepcoil
tblastn Vs nr, est, est_mouse, est_human
databases. Blastp Vs nr
GenScan
Coding sequence
BlastWrapper
Restriction enzyme map
restrict
SignalP TargetP PSORTII
sixpack
Predicts cellular location
transeq
CpG Island locations and
cpgreport
Identifies functional and structural
domains/motifs
InterPro
RepeatMasker
Repetitive elements
ORFs
Hydrophobic regions
Pepwindow? Octanol?
Blastn Vs nr, est databases.
ncbiBlastWrapper
10
The Williams Workflows
A
B
C
A Identification of overlapping sequence B
Characterisation of nucleotide sequence C
Characterisation of protein sequence
11
The Workflow Experience
Have workflows delivered on their promise?
YES!
  • Correct and Biologically meaningful results
  • Automation
  • Saved time, increased productivity
  • Process split into three, you still require
    humans!
  • Sharing
  • Other people have used and want to develop the
    workflows
  • Change of work practises
  • Post hoc analysis. Dont analyse data piece by
    piece receive all data all at once
  • Data stored and collected in a more standardised
    manner
  • Results amplification
  • Results management and visualisation

12
The Workflow Experience
  • Activation energy versus Reusability trade-off
  • Lack of available services, levels of
    redundancy can be limited
  • But once available can be reused for the greater
    good of the community
  • Licensing of Bioinformatics Applications
  • Means cant be used outside of licensing body
  • No license access third-party websites
  • Instability of external services
  • Research level
  • Reliant on other peoples servers
  • Taverna can retry or substitute before graceful
    failure
  • Shims

13
Shims
shim   (sh m) n. A thin, often tapered piece of
material used to fill gaps, make something level,
or adjust something to fit properly. shimmed,
shimming, shims To fill in, level, or adjust by
using shims or a shim.
  • Explicitly capturing the process
  • Unrecorded steps which arent realised until
    attempting to build something
  • Enable services to fit together

14
Shims
15
The Biological Results
Four workflow cycles totalling 10 hours The gap
was correctly closed and all known features
identified
WBSCR14
ELN

CTA-315H11
CTB-51J22
16
Conclusions
  • It works a new tool has been developed which is
    being utilised by biologists
  • More regularly undertaken, less mundane, less
    error prone
  • Once notification is installed wont even need to
    initiate it
  • More systematic collection and analysis of
    results
  • Increased productivity
  • Services only as good as the individual
    services, lots of them, we dont own them, many
    are unique and at a single site, research level
    software, reliant on other peoples services,
    licenses
  • Activation energy

17
Future Directions
  • Scheduling and Notification
  • Portals
  • Results visualisation
  • Re-use other genomic disorders, Graves Disease

18
Acknowledgments
  • Dr May Tassabehji
  • Prof Andy Brass
  • Medical Genetics team at St Marys Hospital,
    Manchester
  • Wellcome Trust

19
myGrid People
www.mygrid.org.uk
Core Matthew Addis, Nedim Alpdemir, Tim Carver,
Rich Cawley, Neil Davis, Alvaro Fernandes, Justin
Ferris, Robert Gaizaukaus, Kevin Glover, Carole
Goble, Chris Greenhalgh, Mark Greenwood, Yikun
Guo, Ananth Krishna, Peter Li, Phillip Lord,
Darren Marvin, Simon Miles, Luc Moreau, Arijit
Mukherjee, Tom Oinn, Juri Papay, Savas
Parastatidis, Norman Paton, Terry Payne, Matthew
Pockock Milena Radenkovic, Stefan
Rennick-Egglestone, Peter Rice, Martin Senger,
Nick Sharman, Robert Stevens, Victor Tan, Anil
Wipat, Paul Watson and Chris Wroe. Users Simon
Pearce and Claire Jennings, Institute of Human
Genetics School of Clinical Medical Sciences,
University of Newcastle, UK Hannah Tipney, May
Tassabehji, Andy Brass, St Marys Hospital,
Manchester, UK Postgraduates Martin Szomszor,
Duncan Hull, Jun Zhao, Pinar Alper, John Dickman,
Keith Flanagan, Antoon Goderis, Tracy Craddock,
Alastair Hampshire Industrial Dennis Quan, Sean
Martin, Michael Niemi, Syd Chapman (IBM) Robin
McEntire (GSK) Collaborators Keith Decker
Write a Comment
User Comments (0)
About PowerShow.com