SysMO-DB: Just Enough Exchange for Systems Biology Data and Models - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

SysMO-DB: Just Enough Exchange for Systems Biology Data and Models

Description:

Go for a series of small victories Realistic Don t reinvent Migrate to standards Sustainable and extensible Provide instant gratification Address doubt and ... – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 44
Provided by: Goble
Category:

less

Transcript and Presenter's Notes

Title: SysMO-DB: Just Enough Exchange for Systems Biology Data and Models


1
SysMO-DB Just Enough Exchange for Systems
Biology Data and Models
  • Carole Goble, Katy Wolstencroft, Stuart Owen,
    Sergejs Aleksejevs - University of Manchester
  • Wolfgang Müller, O. Krebs, Isabel Rojas EML
    Research gGmbH (not for profit)
  • Jacky Snoep - University of Stellenbosch 

MS eScience Workshop, Pittsburgh, PA
2
SysMOSYStems biology of Micro Organisms
11 projects, 91 partners, 9 countries, started
2007
(4)
(1)
(9)
(2)
(22)
(29)
(2)
3
SysMO-DB
  • Started July 2008, 3 years, 3 staff 3
    investigators, 3 teams over 3 sites
  • Sensitively retrofit a data access, model
    handling and data integration platform.
  • Support and manage the diversity of data, models
    and competencies.
  • Web-based solution
  • exchange of data, models and processes (intra-
    and inter-consortia).
  • search for data, models and processes across the
    initiative.
  • dissemination of results.

4
SysMO-DB Team
EML Research gGmbH, Germany
Sergejs Aleksejevs
Wolfgang Müller
Carole Goble
Isabel Rojas
Olga Krebs
Katy Wolstencroft
University of Manchester, UK
Stuart Owen
Jacky Snoep
University of Stellenbosch, South
Africa University of Manchester, UK
5
Connect projects, connect to outside
Public
Outside data and tools
SysMO-DB, inter-project
Project
Project specific solutions
Internally used tools data
Personal
My Disk Data Models Workflows
6
Own data solutions and collaboration
environments. wikis, e-Groupware, PHProject,
BaseCamp, PLONE, Alfresco, bespoke commercial
files and spreadsheets.
  • Own solutions
  • Suspicion
  • Data issues
  • Resource Issues

Suspicion and caution over sharing. Interesting
interplay between modellers, experimentalists and
bioinformaticians.
Many do not have data, or follow the standards
that exist or know who is doing what. Much of
the data cannot be compared Different organisms,
different strains.
No extra resources for the consortiums 91
institutes, 11 consortiums, some overlapping
7
Principles
  • Go for a series of small victories
  • Realistic
  • Dont reinvent
  • Migrate to standards
  • Sustainable and extensible
  • Provide instant gratification
  • Address doubt and anxiety
  • Build it

8
Three types of people
9
Natural collaboration within SysMO
  • Short, simplified, black and white
  • Collaboration during project design
  • Varying methods of collaboration during project
  • Binomes (One modeller, one experimentalist)
  • Groups collaborating with groups
    (occasional/formalized exchange of information)
  • Varying success
  • Need for a watering hole/meeting point
  • Application where experimentalists/bioinf/
    modelers meet

(flickr titleHot Watering Hole Action
description photographerbetty x1138
photographer_locationNYC, USA
photographer_urlhttp//flickr.com/photos/9833472
1_at_N00 flickr_urlhttp//flickr.com/photos/9833472
1_at_N00/25901056 taken2005-07-14 090432)
(flickr titleHot Watering Hole Action
description photographerbetty x1138
photographer_locationNYC, USA
photographer_urlhttp//flickr.com/photos/9833472
1_at_N00 flickr_urlhttp//flickr.com/photos/9833472
1_at_N00/25901056 taken2005-07-14 090432)
(flickr titleHot Watering Hole Action
description photographerbetty x1138
photographer_locationNYC, USA
photographer_urlhttp//flickr.com/photos/9833472
1_at_N00 flickr_urlhttp//flickr.com/photos/9833472
1_at_N00/25901056 taken2005-07-14 090432)
Trying to make experimentalists, modellers,
bioinformaticians peacefully share resources
10
Some numbers Some consequences
  • 1 Software Engineer 1 Bioinformatician, 1
    Bio-database specialist
  • 11 projects, 91 partners
  • 20 programmer days/year/project
  • 2.5 programmer days/year/partner
  • just in case approach impossible
  • Focus on real needs
  • just in time, just enough
  • The right 20
  • Help people help themselves
  • Communication!

80-20-rule 80 of the featureswont be used
anyway
Useful features
11
Social Approach
  • Questionnaires
  • PALs (Project Area Liaison)
  • 21 Postdocs and PhD students
  • Bio/bioinf/modeller
  • Our design and technical collaboration team
  • Very intense face to face and virtual
    collaboration
  • UK and Continental PALS Chapters
  • Audits and Sharing
  • Methods, data, models, standards, software,
    schemas, spreadsheets, SOPs..

12
Communication via PALs
Show what is thereSuggest what is possible Ask
for requirements
Double check Transmit Disseminate
Give requirements Tell priorities Rate
outcomes Suggest improvements
Collect answers
DB team
PALS
Projects
13
Outcome of first PALs meeting
  • Need to find the guy who does xyz Yellow pages
  • Need to storeStandard Operating Procedures
  • Almost all our data is Excel

14
Whats there
  • SysMO-SEEK screenshots

15
Yellow pages
ISA tabs
Yellow pages tabs
Bookmarks
Tag clouds
16
Standard Operation Procedures
17
JWS connection for modellers
18
View Study
19
New Assay (ISA)
20
Rights and sharing
21
Rights and sharing create group
22
So much for the webapp
RightsSharing
Connection to modelers tools
Yellow pages
SOPs
23
Almost there Improved excel support
Matthew Horridge
24
Towards Just-Enough Exchange
  • Incremental steps from beta to beta

25
Towards Just-Enough Exchange
  • Largely a story about how to handle Excel sheets
    for users benefits

26
SysMO Just Enough Exchange
SysMO-LAB
BaCell-SysMO
Spread sheets
Wiki
Spread sheets
Alfresco
SABIO-RK
COSMIC
Spread sheets
MOSES
Spread sheets
Wiki
Alfresco
BASE
SABIO-RK
Public Resources
27
Need for tradeoff
  • Huge number of systems
  • Huge number of standards (MIBBI, OBO)
  • Some of them big standards
  • Too much to cope with a few people, but
  • Comparison needs standardisation
  • Search needs standardisation
  • Need to move incrementally to just-enough
    standard implementation

28
Path goalThe journey is part of the reward
  • Let people use what they use anyway
  • If changes necessary, be as unintrusive as
    possible
  • Be aware of legacy data
  • Nudge people towards best practises
  • Give instantly useful added value to as many
    users as possible Simple search, simple
    exchange, simple tool use

29
A roadmap
  • Provide convincing Web 2.0 functionality for use
    and as appetizer
  • Yellow pages
  • SOPs
  • Upload service
  • Hand-triggered upload of link/file
  • Hand-added metadata
  • Harvestingchange detection service
  • Automatic download
  • Hand-added metadata
  • Support for Excel templates
  • Promote internal standards by use tooling
  • Mappers parsers
  • Classifiers
  • Use other data types where appropriate
  • SBML, Matlab, Mathematica

30
Stability hierarchy
Increasing stability
Template for a group of experiments
Single group
Use mappers where needed
Parsers/ annotators
Project-level template
Single SysMO project
Enter into that
More stable JERM data model Template best practise
Whole SysMO
31
JERM Extraction Architecture
Data
Metad.
Extractor
Mapper
Parser
Extractor
Mapper
Parser
Mapper
Extractor
Mapper
Extractor
Metad.
Data
Classifier/Dispatcher
Template recognizer
Classifier/Dispatcher
Template recognizer
Template recognizer
Template recognizer
Metad.
Data
Harvester
Harvester
Data handler
Data handler
Data handler
Data handler
Data
Data
Project repositories
32
Oops
  • Some projects not prolongedNeed all project data
    in the system fast,
  • so

33
JERM Extraction Architecture
Data
Metad.
Data
Extractor
Mapper
Parser
Extractor
Mapper
Parser
Mapper
Extractor
Mapper
Extractor
Metad.
Data
Data
Classifier/Dispatcher
Template recognizer
Classifier/Dispatcher
Template recognizer
Template recognizer
Template recognizer
Metad.
Data
Data
Harvester
Harvester
Data handler
Data handler
Data handler
Data handler
Data
Data
Project repositories
34
Lessons were learning
  • Some interesting bits along the way

35
Subsetting Dont overwhelm
  • Standards need to be comprehensive
  • Goal Minimum information (MIBBI)
  • Tends to be superset of what is needed for a
    project
  • Example for non-applicable attributes
  • Tissue of a single cell
  • Gender
  • Useful to use adapted subset-templates

Experimental design selection list
36
From biofolksonomy to ontology
  • Observation
  • Fast growing set of standards
  • Standards are moving target
  • Incremental approach
  • Keyword annotation
  • Controlled selection lists
  • Home-brewed taxonomies
  • Use/contribution to standard ontologies
  • Provide migration tools

Tags suggestions
Home-brewed taxonomy
37
A word on software
  • Template tooling
  • Excel
  • JAVA
  • SysMO-SEEK (open source under Apache license)
  • Ruby on Rails
  • Convention over configuration
  • Libraries plugins
  • Rails specific (e.g. acts_as_authenticated)
  • SOLR Lucene introduce JAVA/Ruby
  • DatabaseMySQL also tested with SQLite(exclude
    db depedencies)

38
Summary
  • SysMO-DB as a virtual meeting point for different
    flavours of systems biologists
  • SysMO-DBs mantra Just enough just in time
  • Flexible JERM extracture architecture
  • Just enough metadata (incremental)
  • Lot done ? still a lot todo ?

39
Challenges ahead
  • Social
  • PALs work great and motivated
  • Now need moremoremore datadatadata
  • Technical
  • Publishing into public repositories
  • Search exploration The test for data quality
  • Hierarchical Faceted Search
  • Distributed search via Taverna workflows
  • More workflows via SysMO-SEEK
  • Improve modelling support

40
Bonus track what if
  • the average data quality is below par?
  • Nagging functionality
  • Remind people of potentially faulty metadata
  • Give suggestions what to improve and how
  • Give possibility to create automatic mappings

41
Thanks
  • EML People
  • Isabel
  • Olga
  • UMAN People
  • Carole
  • Katy
  • Finn
  • Stuart
  • Sergejs
  • Jacky at Stellenbosch
  • BBSRC
  • BMBF
  • KTF
  • and Microsoft for sponsoring this workshop

42
www.sysmo-db.org
  • End questons

43
END
Write a Comment
User Comments (0)
About PowerShow.com