The Seven Deadly Sins of Bioinformatics - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

The Seven Deadly Sins of Bioinformatics

Description:

EMBOSS lists more than 20 different ... http://emboss.sourceforge.net/docs/themes/SequenceFormats.html. Reinvention ... providers (like EMBOSS) in text. ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 76
Provided by: owne69
Category:

less

Transcript and Presenter's Notes

Title: The Seven Deadly Sins of Bioinformatics


1
The Seven Deadly Sins of Bioinformatics
  • Professor Carole Goble
  • carole.goble_at_manchester.ac.uk
  • The University of Manchester, UK
  • The myGrid project
  • OMII-UK

2
Roadmap
  • Sins of BioScience
  • With examples
  • Why are we like this?
  • The Selfish Scientist? E-Science is me-Science.
  • Challenges
  • Technical
  • Social

3
Intractable Problems in Bioinformatics.Have we
sinned?Are these part of the intractable
problem?
4
The traditional sins.
  • Lust
  • Gluttony
  • Greed
  • Sloth
  • Wrath
  • Envy
  • Pride

http//en.wikipedia.org/wiki/Seven_deadly_sins
Stevens and Lord
5
Methodology
  • Email a handful of bioinformaticans.
  • Stand well back.
  • Collect.
  • Edit.
  • Therapy on the cheap.
  • We all felt better.

6
I am grateful to
  • Phil Lord (University of Newcastle)
  • Anil Wipat (University of Newcastle)
  • Matthew Pocock (University of Newcastle)
  • Robert Stevens (University of Manchester)
  • Paul Fisher (University of Manchester)
  • Duncan Hull (Manchester Centre for Systems
    Biology)
  • Norman Paton (University of Manchester)
  • Marco Roos (University of Amsterdam)
  • Rodrigo Lopez (EBI)
  • Tom Oinn (EBI)
  • Andy Law (Roslin Institute)
  • Graham Cameron (EBI)

7
They came up with more than seven.But I beat
them into submission.
  • Many are highly inter-related.
  • Hopefully they are all too familiar.

8
Sins
  • Parochialism and Insularity
  • Exceptionalism
  • Autonomy or death!
  • Vanity Pride and Narcissism
  • Monolith Meglomania
  • Scientific method Sloth
  • Instant Gratification

9
Sin 1
  • Parochialism
  • being provincial, being narrow in scope, or
    considering only small sections of an issue.
    http//en.wikipedia.org/wiki/Parochialism
  • Insularity
  • a person, group of people, or a community that
    is only concerned with their limited way of life
    and not at all interested in new ideas or other
    cultures. http//en.wikipedia.org/wiki/Insularity

10
Reinvention
  • Reinventing the Wheel. Rediscovering the same
    problems. Rediscovery of techniques methods.
  • Creating
  • Yet another identity scheme. Yet another
    representation mechanism for data.
  • Yet another ontology. Yet another data warehouse.
  • Yet another integration framework. Yet another
    query or ontology or workflow language.
  • Result? Misery. Or more work for the boys.

11
Comparative Genomics? Tisk!Its Comparative
BioinformaticsBioinformatics is about mapping
one schema to another, one format to another, one
id scheme to another.What a waste of time.
What a handy distraction from doing some Real
Science.
12
Names and Identity Crisis
  • WSL-1 protein
  • Apoptosis-mediating receptor DR3
  • Apoptosis-mediating receptor TRAMP
  • Death domain receptor 3
  • WSL protein
  • Apoptosis-inducing receptor AIR
  • Apo-3
  • Lymphocyte-associated receptor of death
  • LARD
  • GENE NameTNFRSF25

Q93038 Tumor necrosis factor receptor
superfamily member 25 precursor
Annotation history
Q92983 O00275 O00276 O00277 O00278 O00279 O00280 O
14865 O14866 P78507
P78515 Q93036 Q93037 Q99722 Q99830 Q99831
Q9BY86 Q9UME0 Q9UME1 Q9UME5
http//www.expasy.org/uniprot/Q93038
13
Andy Law's Third Law
  • The number of unique identifiers assigned to an
    individual is never less than the number of
    Institutions involved in the study... and is
    frequently many, many more.

http//bioinformatics.roslin.ac.uk/lawslaws.html
14
The Selfish Scientist
  • A biologist would rather share their
    toothbrush than their (gene) names
  • Mike Ashburner
  • Professor Genetics
  • University of Cambridge
  • UK
  • Amongst the many

15
Some causes of the Identity Crisis
  • Conflation of the ID for a thing, something to
    call the thing, a description of the thing, with
    the thing itself (reference/referent)
  • Internal vs external IDs
  • Opaque vs human-interpretable IDs
  • Situation-dependent 'parts' of a resource get
    different IDs
  • e.g. the gene in a disease process vs the disease
    in a metabolic process
  • Annotation attribution and log differentiation
  • Two organisations attach annotations to two IDs,
    state they are referring to the same thing, they
    now have provenance about which of them asserted
    which facts

Pocock
16
Id Reinvention
  • Global Identity naming mechanism for data objects
    in the Life Sciences
  • LSIDs and URIs and PURLs. WS-Naming and all its
    friends
  • Half the debaters havent actually read the LSID
    or URL or PURL specs. Or provided use cases.
  • Web Pages are not Data Assets.
  • you could do this with HTTP based identifiers
    given ltinsert hackgt.
  • The debate rages! 124 messages in the last week.
  • W3C Semantic Web Health Care and Life Sciences
    Interest Group public-semweb-lifesci_at_w3.org

urnlsiduniprot.orgdbid  
http//purl.uniprot.org/db/id
17
Andy Laws First (Format) Law
  • The first step in developing a new genetic
    analysis algorithm is to decide how to make the
    input data file format different from all
    pre-existing analysis data file formats.
  • Different codes to signify the sex of animals.
  • crimap uses '0' female and '1' male.
  • Keightly algorithm. 1' female and 0' male.
  • Knott Haley QTL analysis algorithm 1' female
    and 2' male
  • When they'll use '3' and '4' and then we'll know
    they're doing it deliberately. 

http//bioinformatics.roslin.ac.uk/lawslaws.html
18
  • EMBOSS lists more than 20 different sequence
    formats.
  • Nearly every collection of sequences that dares
    call itself a database has stored its data in its
    own format.
  • http//emboss.sourceforge.net/docs/themes/Sequence
    Formats.html

19
Reinvention of Ontology tools
  • OBO and OWL ?
  • OBOEdit and Protégé-OWL ?

The Montagues and The Capulets..
Let me get my bullet-proof vest
20
The Oh No OBO
Philosophers
Spiritual guides
Aesthetics
Life Scientists Capulets
Knowledge Representation Montagues
Theoreticians
Pragmatists
A means to an end Content providers
The end Mechanism providers
The Montagues and The Capulets SOFG 2004, KCap
2005, Comparative and Functional Genomics 2004
21
Yet another database
  • Organism databases
  • Counter example
  • Generic Model Organism Database Toolkit.

FlyBase, WormBase, SGD, BeeBase and many other
large and small community databases
22
BioBabel
  • bioperl
  • biojava
  • biopython
  • bioruby
  • biophp
  • biosql
  • biouml
  • biofoo
  • biobar

23
Integration
  • Workflows Management Systems
  • Counter example
  • Taverna ?
  • http//www.mygrid.org.uk

24
  • Reinvent wheels in creating 'Transcriptional
    Units' ('genes' derived from ESTs and mRNA),
    within species and between species.
  • This holds for many genome assembly related stuff
  • Genome data compilers for E. coli, Drosophila,
    Plant species, etcetera reuse each other's code?
  • Usually something new is added, but large parts
    could have been reused.

25
Any more ?
  • Another Web 2.0 Web Site? Another Web interface
    to a database? Another portal?
  • Whole database systems. ACeDB is not a lone-case.
  • Genome data compilers for E. coli, Drosophila,
    Plant species, etcetera reuse each other's code?
  • Text miners require synonyms and reinvent the
    wheel to get them in many cases.
  • Add your favourite here.

26
Reuse Rocks. Collaboration through workflow and
web services
  • VL-e Project
  • instant collaboration with Martijn Schuemie
    (Rotterdam) through a web service that discloses
    their protein synonym data.
  • Exchanging services and (sub)workflows with food
    scientists.
  • Web services make that easier.

27
Recycling, Reuse, Repurposing
  • A Trypanosomiasis in Cattle workflow (by Paul)
    reused without change for Trichuris muris
    Infection (by Jo).
  • Identified the biological pathways believed to be
    involved in the ability of mice to expel the
    parasite.
  • Workflows are memes. Scientific commodities. To
    be exchanged and traded and vetted and mashed.
    Users add value.

28
Warning! Reuse is Hard
  • Writing reusable workflows is hard.
  • Local services
  • Permissions. Licences
  • What does it DO?
  • Writing reusable services is hard.
  • What does it DO?
  • Predicting the unknown required by the unknown.
  • Finding workflows, services and tools is hard
  • Where do you go?? What does it DO??
  • Creating web services is still a bottleneck. For
    quick solutions it is still seen as too much
    extra trouble.

29
Bullying and the Borg
  • If a group is working in a field, you get bullied
    at for trying out something different.
  • Can YOU think of an example??
  • You may actually be doing something different,
    but you use some common words.
  • Why do this? It's already been solved by Foo -
    the massively unwieldy, slow-moving, monolithic,
    meeting paralysed international effort for Things
    Mentioning Foo.

30
Reinvention or Invention? Pre-dating
  • BioMOBY pre-dates (Semantic) Web service
    revolution
  • OBO and OBO-Edit pre-dates OWL and Protégé-OWL
  • 20 years of Knowledge Representation.
  • Taverna pre-dates a reliable Open Source BPEL
    engine
  • 20 years of functional programming.
  • There ARE features that Bioinformatics needs that
    other solutions dont cater for.

31
A few months in the laboratory (or the computer)
can save a few hours in the library (or on
Google).
  • Westheimer's Law (with additions).

32
No tool is an island
  • Assume
  • only we will use it, whatever it may be.
  • that it will be freestanding and unlinked to
    anything else.
  • that it will always work and will keep on
    working.
  • That everyone will understand it.
  • Well I know what I mean. And so does my mate. So
    I dont need to specify it. Or document it
    properly. Or keep the metadata up to date.
  • Never mind the interface, just look at my
    implementation!
  • Metadata matters. Models matter.
  • Interfaces matter. Services matter.

33
I know what it means...
  • A hacker who studied ontology
  • Was famed for his sense of frivolity
  • When his program inferred
  • That Clyde ISA Bird
  • He blamed not his code but zoology

Clyde ISA Elephant
AI limericks by Henry Kautz http//www.cs.washin
gton.edu/homes/kautz/misc/limericks.html
34
Not just bioinformatics
  • Computer Science is Guilty!

35
W3C Semantic Web for Life Sciences mailing list,
2005
Why dont biologists modularise OWL ontologies
properly?
Er, well, like how should we do it properly and
where are the tools to help us?
We dont know and we havent got any. But here
are some vague guidelines.
36
I don't blame them MGED/PSI community because
to truly comprehend RDF/OWL is not an easy task,
it takes not just the understand of technology
itself but more so the vision on how things
should and can work in SW.
One thing we have to remember is that biologists
are building ontologies to do a job of work. They
are not produced as some end of CS or SW research
Principles are all well and good, but we should
know from decades of software engineering that
saying "do it properly" isn't a solution. We need
tooling and methodologies that do not in
themselves hinder a domain specialist. In many
cases it is easier to re-develop than re-use or
even cut-and-paste from an existing ontology than
it is to muck around doing it properly
There is actually a gap between the view of
ontology for CS people and for biological people.
The ontology in biologist's eyes are more of a
treaty than logical representation, that in CS
view is on the reverse of that view. It needs
dialog to bring the view to a middle ground and
mechanisms to stretch to both directions.
37
Standards are boring (but important)
  • Blue collar Science (John Quackenbush)
  • Nobody is going to win a Nobel prize for creating
    a standard schema, ontology or whatever. (Duncan
    Hull)
  • Standardise where you need standards, dont
    where you dont. Standardise messages not
    structures (Graham Cameron)
  • Drive on the left or the right?

38
Self promotion
  • Not making shareable reusable software, because
    we can publish every single monolithic software
    solution.
  • And get promoted.
  • Applies equally to databases and ontologies.
  • Production vs Novelty

Not all software and databases are equal.
39
Research Production Confusion
  • Novelty vs Standards
  • Neither the funding nor the social structures of
    bioinformatics allow us to treat these two
    differently in any principled manner
  • How do you get funding for production software
    other than claiming to be researching stuff?
  • How do you get a publication out of a bit of
    research software without claiming a potential
    user-base?

40
Trust
  • I dont trust your code
  • I dont trust your data
  • I dont trust you will still be around in 1 year

41
Sin 2
  • Exceptionalism
  • Biologist exceptionalism
  • Biological exceptionalism
  • Biology exceptionalism
  • A cause of Reinvention Syndrome
  • Bioinformatics is special
  • Domain specific outcomes requires-specific
    approaches and technologies

42
Biologist exceptionalism
Im different. We are all individuals.
  • I know there is already a gene name for that
    gene, but, I don't like it and it doesn't fit in
    with my schema.
  • It would be better if I wrote the script I need
    so I know what it does, how it does it and how to
    modify it later because I havent specified what
    it was supposed to do in the first place.

43
Biological exceptionalism
  • Biology is all exception.
  • Dont complicate everyones life for the sake of
    a few esoteric cases. Camerons 5th Commandment
    of Curation
  • Exceptionalism paralysis.
  • Gather requirements expansively, prune ruthlessly
  • The EMBL/GenBank/DDBJ/Feature Table

44
We are so much more complex
  • There are proteins, and there are records about
    proteins. Records come in different formats. If I
    make a statement using this url, is it about the
    record? or the protein? Alan Ruttenberg
  • Usually we have one entry per gene. We have
    several entries for a single gene when
    description of variations are too complicated to
    describe in FT lines (of course, this criteria
    depends on the annotator). For viruses, it is
    much more messy, due to ribosomal frame-shifts.
    Formalise that! Eric Jain UniProtDB
  • erdecomposition and untangling?

45
Other Sciences.
  • CERN UML meta-modelling mechanisms in order to
    migrate models over time without losing data.
  • Ensembl Our data models are complicated - I
    don't think specifying them will help. We need to
    understand them instead.
  • And?
  • Confusing meta-mechanisms with models

46
Biology Exceptionalism
  • Biology is harder than anything else in the whole
    wide world because there is lots of it and its
    complicated.
  • Drawing graphs of data sets over time.
  • Physics wipes you off the map.
  • The real problem is complexity not scale.
  • The number of data sets, their diversity and how
    they overlap.
  • How they change.
  • Their Reliability.

47
Sin 3
  • Autonomy or death!
  • Combined with churn and indifference to users.
  • Compounded by the Early Adopter tendency of the
    community and a monopoly mentality.
  • Hell is other peoples systems as John Paul
    Sartre would have said if he had been a
    bioinformatician.

48
Autonomy is death!
  • Change my interface / format whenever I feel like
    it, despite the fact I wanted lots of users and I
    have lots of users who depend on this. And I
    wont bother to debug either or provide backwards
    compatibility.
  • BioMART changed 4 times in the past year.
  • NCBI changes as it fancies.
  • Ensembl relational schema.
  • Early BioJava.
  • This is just unprofessional.
  • Stable Metadata matters. Stable Models matter.
    Stable Interfaces matter. Stable Services matter.

49
Lincoln Stein said a while ago
  • An interface is a contract between data provider
    and data consumer
  • Document interface warn if it is unstable
  • Do not make changes lightly
  • Even little fiddly changes can break things
  • Provide plenty of advance warning
  • When possible, maintain legacy interfaces until
    clients can port their scripts
  • Support as many interfaces as you can
  • HTML (least desired)
  • Text only (better)
  • HTTP-XML (even better)
  • SOAP-XML (sweet!)
  • Easy Interfaces Power User Interfaces

and he could say it again today.
50
Law's Second Law
  • Error messages should never be provided
    corollary... If error messages are provided,
    they should be utterly cryptic so as to convey as
    little information as possible to the end user

51
Workflow commodities
  • Workflow published with its paper and its data
    set.
  • So what happens when I want to run this workflow
    again?
  • Is the service dead?
  • Is the dataset still there?
  • Was it designed to be reproduced or reused in the
    first place?

52
The myGrid Semantic Sweatshop
  • Services and Workflows in the wild.
  • Curated by experts using an ontology.
  • Supplied by service providers (like EMBOSS) in
    text.
  • Or annotations (like BioMOBY, but they arent
    good annotations!)
  • Tagged by the Masses.
  • Multi-perspective
  • Scientist for finding.
  • Machinery for validation.
  • Hard work. Look how tired they are.

53
The myGrid Semantic Sweatshop notice how tired
they look
Franck Tanoh
Katy Wolstencroft
54
Churn, Churn, Churn
  • Stability is more important than Standards or
    Smartness. Discuss
  • Constant churn and change for change sake.
  • Impact on everyone else who uses the previous
    mechanism.
  • A few voices, very loud, vested interest, for
    their application, win.
  • You know what? Why dont we stick with something
    for a while and rally behind it? Or at least
    figure out the cost of change.
  • Maybe this is a sin inherited from Computer
    Science.

55
Churn, Churn, Churn
  • We expect the content to change, but why does
    everything else.
  • Constant churn and change for change sake.
  • Maybe this is a sin inherited from Computer
    Science.
  • The W3C Identity War. Web Services vs REST
  • Impact on everyone else who uses the previous
    mechanism.
  • A few voices, very loud, vested interest, for
    their application, win.
  • You know what? Why dont we stick with something
    for a while and rally behind it? Or at least
    figure out the cost of change.
  • Stability is more important than Standards or
    Smartness. Discuss

56
Sin 4
  • Vanity
  • Pride
  • Narcissism
  • conceit, egotism or simple selfishness.
  • Applied to a social group, denotes elitism or an
    indifference to the plight of others

57
I know it all.
  • Claiming to know everything about biology and
    everything about computers.
  • This is really irritating to both biologists and
    computer scientists.
  • Even they dont claim to know everything about
    biology or computer science.
  • Computer scientists do know a lot of stuff. And
    they publish too.
  • Biologists are the experts on everything because
    we produce the data

58
Think like me!
  • Building interfaces that only you can use.
  • Not actually using your tools in the field.
  • I understand workflows
  • Workflows are for biologists.
  • My granny can do workflows...
  • Designing good experiments is hard.
  • Workflows are computational experimental
    protocols. Ergo.
  • Writing workflows should be expected to be hard.
  • Writing good workflows is really hard.
  • Writing good reusable workflows is really really
    hard.

Misunderstanding and disrespecting users
59
A good User Experience outweighs smart features.
  • Can I use it?
  • Is the user interface familiar?
  • Does it fit with my needs?

60
Gain-Pain pay-off
  • Just enough, just in time

Very BAD
Pain
Just right
Good, but Unlikely
Gain
61
Sin 5
  • Monolith Meglomania
  • delusions of grandeur.
  • obsession with grandiosity and extravagance.
  • Data mining - my data is mine, and your data is
    mine

62
More, more, more!
  • Integration the more the merrier. No.
  • Every link is a potential dead link.
  • Every dependency can find its way on to your
    critical path.
  • Monolithic solutions always fail.
  • Put it all in a warehouse.
  • ATLAS, MRS, e-Fungi, GIMS, Medicel Integrator,
    MIPS, BioMART blah blah blah
  • Toolkits Information Integrator, GMOD, BioMART,
    BioWarehouse, blah blah
  • 50 warehouses fail.
  • Uber-tools and Uber-databases
  • Biomart, Ensembl, etc etc.

Cameron
63
The trouble with warehouses
  • 30 of data migration projects fail (Source
    Standish Group)
  • 50 of data warehousing / Business Intelligence
    projects fail (Source NCR)
  • Warehouses work? Piffle. They never manage to
    maintain synchrony with the source data. Mostly
    they fall down of their own weight! Graham
    Cameron, EMBL-EBI
  • "Our ability to capture and store data far
    outpaces our ability to process and exploit it.
    This growing challenge has produced a phenomenon
    we call the data tombs, or data stores that are
    effectively write-only data is deposited to
    merely rest in peace, since in all likelihood it
    will never be accessed again. Data tombs also
    represent missed opportunities." Usamma Fayyad
    Yahoo! Research! Laboratories!
  • We believe that attempts to solve the issues of
    scientific data management by building large,
    centralised, archival repositories are both
    dangerous and unworkable Microsoft 2020 Science
    report.

64
More More More
  • Emacs of Biology
  • End-user apps/libraries in bioinformatics
    workbenches with loads of crap bundled in, none
    of it kept up to date, none of it properly
    integrated.
  • Keep it simple and modular
  • Dont reinvent Eclipse.

65
Mash-Up Data Marshalling
objects
Protocol
Mash Up Application
User interface
Protocol
Protocol
  • Content syndication and feeds
  • Emphasis shifts to the user creating specific
    integration by mapping.
  • Just in time, just enough design
  • On demand integration or rather, aggregation.

66
Distributed Annotation SystemMash-Up
http//www.biodas.org
67
Sin 6
  • Scientific Method Sloth
  • Its easier to think of a new name than use
    someone elses.
  • I want my own view over data and views are
    difficult, so Ill create my own database.
  • Leads to Reinvention, Exceptionalism
  • Often the result of Instant Gratification

68
Ennui
  • Garbage in, garbage out
  • Running analysis over the wrong datasets
  • E.g. Identifying chicken proteins in mouse cells.
  • Configuration traditionalism
  • Not changing the parameters of BLAST. Ever.
  • Top list ennui
  • If there is a list only looking at the first one.
  • Look no further than the first Blast hit / first
    Google hit.
  • Arbitrary cut-offs on rank-ordered result list
  • Absolute truth above, absolute falsehood below
  • E.g. differentially expressed genes in microarray
    analyses.

69
Its black and white
  • Arbitrary cut-offs on rank-ordered result list
  • Everything above is absolute truth and everything
    below complete falsehood. 
  • sequence similarity when looking for orthologs.
  • protein identifications using Mascot scores.
  • differentially expressed genes in microarray
    analyses.

70
Quality Delusions
  • The bioinformatics does not have to be sound,
    because we only trust wet-lab results anyway.
  • Worrying about errors in experimental data but
    believing that derived data is always true.
  • Believing Trembl is always right.
  • Believing computational gene predictions are
    always correct.

71
Quality Delusions
  • The bioinformatics does not have to be sound,
    because we only trust wet-lab results anyway.
  • Worrying about errors in experimental data but
    believing that derived data is always true.
  • Believing Trembl is always right.
  • Believing computational gene predictions are
    always correct.

72
Black Box Science
  • Producing irreproducible bioinformatics analyses
  • Not collecting the provenance of the analysis.
  • Not testing during software development.
  • Try re-running experiments described in the
    journal Bioinformatics from before 5 years ago
  • UniGene
  • What is happening during UniGene clustering?
  • Human descriptions (via NCBI), are not exact.
  • The Human Transcriptome Map project and other
    microarray analysts ended up reclustering UniGene
    Marco Roos.

73
No experiment is reproducible.
  • Wyszowski's Law

An experiment is reproducible until another
laboratory tries to repeat it.
Alexander Kohn
74
Sin 7
  • Instant Gratification
  • Greed? Gluttony?
  • Always the immediate return.
  • Never investing for the future.
  • The quick and dirty fix.
  • Refusing to model or abstract.
  • Refusing to plan for recording and exchanging.
  • Just getting the next quick fix.
  • The pressure to deliver now and pay later

www.CartoonStock.com .
75
Hackery
  • Deliver now, pay later
  • Producing crap, non-reusable, software because
    only the biological results matter for
    publication X.
  • Collect! Analyse! Ernow what?
  • Spaghetti-ism
  • Over-indulgence in PERL
  • Over-indulgence in Ascii Art flat files.
  • Modelling a system by hacking up XSD fragments on
    a whiteboard.
  • Writing perl scripts that resemble my high-school
    BASIC of the 80s.

76
I am sure one could reuse large parts of
re-annotation for building transcriptome maps, if
they only used workflows and ontologies.
  • Marco Roos
  • A Biologist and Bioinformatician
  • VL-e Project, Amsterdam

77
Bioinformaticians have reached the standards of
the 1980s, while computer scientists are working
on the standards of the 2020s, leaving roughly 40
years to bridge.
  • Marco Roos
  • A Biologist and Bioinformatician
  • VL-e Project, Amsterdam

78
Blind faith in XML
  • Its in XML, thus all data integration problems
    are solved.
  • Erno.
  • All those vocabularies e.g. SBML, GenBank XML etc
  • The good thing about XML is that it is human
    readable.
  • Arrrrgh!
  • Insisting that XML is not text.
  • Insisting that XML is text

XML
79
Blind Faith in Foo.
  • There's a new thing to use.
  • we don't understand it yet.
  • so it sucks up all the stuff we already know we
    don't understand.
  • Lack of appreciation about exactly what the new
    technology addresses in itself before trying to
    make it work for us.

80
Pioneering development methods
  • Development by anecdote
  • I heard in the pub that the way to go was Foo.
  • Though I have no idea what Foo is or why it is
    the way to go.
  • Design by hacking
  • It would be better if I wrote the script I need
    so I know what it does, how it does it and how to
    modify it later because I havent specified what
    it was supposed to do in the first place.
  • Hmmm..We call that Extreme Programming or
    Emergent Semantics or Web 2.0 in CS ?.

81
Open Source Blinkers
  • Why does Open source have special merit?
  • Commercial solutions with added special sauce can
    rock too.
  • Shall I duck?

82
Sin Summary
Reinvention
Parochialism and Insularity
Scientific method Sloth
Exceptionalism
Autonomy or death!
Churn
Instant Gratification
Vanity Pride and Narcissism
Monolith Meglomania
Maybe only one original sin in bioinformatics.
83
Can we become less sinful? Why do these sins
exist?
  • Are bioinformaticians particularly naughty?
  • No naughtier than Computer Scientists.
  • And its all very hard.
  • Though they are naughty

84
Why?
  • Selfish Scientist Self-interested Scientist
  • Reputation, need to get results right now, win.
  • Fear of dependency, fear of being left behind.
  • Understand the incentives and barriers to
    adoption.
  • Bioinformatics as it is practiced
  • Social and funding structure perpetuates this.
  • Production vs Research.
  • Real, inherent issues. It is hard.
  • Hybrid exhaustion and pressure.
  • Biology Computing Bioinformatics

85
Luddism? Surely not!
  • Refusing to have biology go beyond a cottage
    industry.
  • Being scared to do it properly.
  • Railing against big science
  • The cult of amateurism.

Stevens
86
Research Production Confusion
  • Novelty vs Standards
  • Neither the funding nor the social structures of
    bioinformatics allow us to treat these two
    differently in any principled manner
  • How do you get funding for production software
    other than claiming to be researching stuff?
  • How do you get a publication out of a bit of
    research software without claiming a potential
    user-base?

87
Practical Steps?
  • Create means to share know-how
  • Understanding outside my expertise. e.g. sources
    of error.
  • A comprehensive catalogue of web services
  • A Facebook for workflow builders.
  • Learn from others. Even Computer Science. And
    other Sciences.
  • Try and create a culture of raising quality.
    Somehow.

88
FaceBook Bazaar for Workflow e-Scientists
Trials start August 2007!
myexperiment.org
89
Delivery Bulge
90
Practical Steps for IT Platforms?
  • Stop building monolithic solutions
  • Strong force in business enterprises
  • Component-ise Bioinformatics
  • Loosely coupled systems
  • Stable APIs, standardised metadata.
  • Design to combine.
  • Sort out the bdy naming/id problem
  • If you cant agree, agree on the bridge.
  • Raise the level of abstraction
  • Less Perl, more workflows ?
  • Enable users to extract the data they need
    without hassling you.

91
Practical Steps?
  • Presume and design for incremental change
  • Minimise disruption.
  • Presume others use our stuff
  • And respect that
  • Describe to build Trust
  • Presume others add value to our stuff
  • Be easily part of loosely coupled systems.
    Lightweight programming models.
  • Presume, and enable, content and function
    mashing.

92
Web 2.0 Design Patterns
  • The Long Tail
  • Data is the Next Intel Inside
  • Users Add Value
  • Network Effects by Default
  • Some Rights Reserved
  • The Perpetual Beta
  • Cooperate, Don't Control
  • Software Above the Level of a Single Device
  • http//www.oreillynet.com/pub/a/oreilly/tim/news/2
    005/09/30/what-is-web-20.html

26/2/2007 myExperiment Slide 92
93
Practical Steps?
  • Presume scientific practice naughtiness
  • Try to deal with it, or expose it?
  • Transparency and accurate collection and
    reporting.
  • Provenance.
  • A prerequisite to publication.
  • The end of Black Box Science.
  • Peer pressure.
  • E.g. Workflows, but will a scientist give away
    their secrets or expose their mistakes?

94
The Final Word
  • Sin writes histories, goodness is silent.
  •  
  • Thomas Fuller
Write a Comment
User Comments (0)
About PowerShow.com