DAS2: Next Generation Distributed Annotation System - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

DAS2: Next Generation Distributed Annotation System

Description:

Sequence ontology (SO) is the default (song.sourceforge.net) Can be changed & extended ... Tony Cox, Ed Griffiths (Sanger Institute) Allen Day, Brian O'Connor (UCLA) ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 25
Provided by: Gregg123
Category:

less

Transcript and Presenter's Notes

Title: DAS2: Next Generation Distributed Annotation System


1
DAS/2 Next Generation Distributed Annotation
System
  • Gregg Helt1, Steve Chervitz1, Tony Cox2, Andrew
    Dalke3, Allen Day4, Ed Erwin1, Ed Griffiths2, and
    Lincoln Stein4

(1) Affymetrix, Inc. (2) Sanger Institute (3)
Dalke Scientific (4) Cold Spring Harbor
Laboratory (5) University of Alabama
2
Distributed Annotation System (DAS) Overview
  • A specification designed for sharing genome
    annotations
  • Defines client requests and server responses
  • Simplified Web Services approach HTTP GET, URLs,
    XML
  • Intended to be simple to implement
  • No central annotation authority
  • Intended to support client-side integration of
    annotations from different servers
  • First draft specification Spring 2000
  • Last major change to DAS1 was Spring 2002
  • Grant from NIH awarded June 2004 for development
    of next-generation DAS/2

3
DAS Multiple Servers, Multiple Clients
AC003027
M10154
AC005122
4
Widespread Adoption of DAS/1
  • Server Implementations
  • Dazzle, ProServer, LDAS
  • Server sites
  • Ensembl, UCSC, TIGR, KEGG, WormBase, Affymetrix,
    etc.
  • Clients
  • GBrowse, Ensembl, Dasty, IGB,
  • Libraries
  • BioPerl, BioJava, JDAS
  • DAS Extensions
  • GeneDAS (non-positional annotations)
  • DAS web services registry
  • SPICE (protein structures)
  • DALEC (asynchronous analysis)

5
Ensembl is an ensemble of DAS servers
6
GBrowse on Ensembl
7
Distributed GBrowse
MODs
GBrowse 1
GBrowse 2
DAS
DAS
My GBrowse
DAS
Ensembl
UCSC
8
DAS Limitations
  • No ontology (controlled vocabulary) of feature
    types.
  • Is a gene from DAS server 1 the same as a
    gene from DAS server 2?
  • Not particularly extensible.
  • Ambiguous semantics for retrieving features that
    overlap a range on the genome.

9
Development of DAS/2 Specification
  • Enhancements have largely been motivated by
    initial discussions on the DAS mailing list.
  • Series of RFCs collected
  • Though informal, still a long process!
  • Most recent DAS/2 draft specification is
    available at http//biodas.org/documents/das2/das2
    _protocol.html (tied to CVS repository), so
    anyone can review and comment
  • Feedback from the DAS developer and user
    communities will continue to guide future
    iterations of the DAS/2 specification

10
Preserving DAS1 Strengths in DAS/2
  • Specification is independent of implementation
  • Many server implementations
  • Many client implementations
  • Simple, simple, simple
  • HTTP for transport
  • URLs for queries
  • XML for responses
  • REST-like style
  • Ontologies are integral
  • Focus on location-based annotations of
    biological sequences

11
Basic DAS/2 Queries
  • Sources query what genomes and versions of those
    genomes are available?
  • http//server/das/genome
  • Regions query what annotated sequences are
    available for a given version of a genome?
  • http//server/das/genome/genome/version/region
  • Types query what annotation types are availabe
    for a given genome version?
  • http//server/das/genome/genome/version/type
  • Range query return all annotations of a given
    type that overlap a genomic region
  • http//server/das/genome/genome/version/featur
    e?
  • overlapsseq/minmaxtypetype

12
DAS/2 Enhancements Ontologies
  • All features are required to be described by an
    ontology
  • What is the feature?
  • Gene, mRNA, transposable_element
  • What are attributes of the feature?
  • Polycistronic_mRNA, programmed_frameshift
  • Sequence ontology (SO) is the default
    (song.sourceforge.net)
  • Can be changed extended
  • 500 terms in all
  • Standard OBO format
  • Feature hierarchy allows features to be contained
    within others e.g. gene-gtmRNA-gtCDS

13
DAS/2 Enhancements Performance
  • One of the biggest complaints about DAS1
  • Very verbose annotation XML
  • DAS/2 Solution 1 Refactoring annotation XML
  • Much smaller minimum footprint
  • DAS/2 Solution 2 Alternative return formats
  • All servers can return defined das2xml annotation
    format
  • Servers can also specify additional return
    formats per annotation type
  • Clients can choose from alternative formats if
    they desire
  • Not restricted to XML, or even text
  • Examples GFF3, BED, PSL, GAME
  • Extreme performance improvements possible

14
DAS/2 Enhancements Resolving Ambiguities
Example Ambiguous Range Queries
Overlap or containment? Parent based or separate?
query range xy
x
y
Server 1 Response
Server 2 Response
Server 3 Response
Server 4 Response
15
DAS/2 Solution 1 remove spec ambiguity
  • Specify that if parent meets region filter, also
    return all children
  • Specify whether overlap, containment, etc.
  • Add different region filters for different
    possibilities
  • Overlaps
  • Contains
  • Within
  • Identical
  • Allow boolean combinations of these and other
    filters in the query URL

16
DAS/2 filter spec allows client query optimization
QueryL
QueryR
QueryC
x
y
L
R
Keep track of overlap bounds of all previous
queries Instead of filter overlapsS/xy, use
filter overlapsS/xy withinS/LR If
annotation A not contained within LR, then
either i) bounds crosses L, in which case must
overlap QueryL ii) bounds crosses R, in which
case must overlap QueryR iii) both Therefore if
client has used this approach for all previous
queries (and restricts other filtering to single
type filter), then for QueryC no annotations
will be returned that were already returned in a
previous query
17
Solution 2 DAS/2 Validation Suite
  • Verify whether a DAS/2 server is compliant with
    the specification.
  • Critical for improving interoperability between
    clients and servers developed by different
    groups.
  • Standalone tool and web application, written in
    Python
  • Enter a URL for a DAS/2 server
  • Get an HTML report about DAS/2 compliance
  • Reference dataset
  • Sequences and annotations that can be loaded into
    a DAS/2 server for additional validation of
    server implementation/configuration
  • Source code available at http//sourceforge.net/p
    rojects/dasypus/

18
More DAS/2 Spec Enhancements
  • Writeback spec to allow DAS/2 clients to create
    and edit annotations on DAS/2 servers
  • Still undergoing development
  • IDs are URIs
  • Could be LSIDs or URLs
  • Allows for integration with many other web
    technologies
  • xmlbase
  • Feature hierarchies
  • And more

19
DAS/2 UML Modeling
20
DAS/2 Reference Server
  • Implemented as an Apache/mod_perl 2.0 content
    handler
  • Annotations are converted to Bioperl objects and
    subsequently text-transformed using Template
    Toolkit.
  • Datasources are accessible using an adaptor
    pattern
  • Current adapter is for CHADO (GMOD schema)
  • Soon any datasource accessible to the Generic
    Genome Browser (Gbrowse) will be be accessible
    from the DAS/2 server.
  • Flatfile formats GenBank, GFF
  • Databases Ensembl, GMOD/Chado, BioDBGFF
  • DAS1 web service
  • Source code released under Artistic License
  • Available via anonymous CVS as part of GMOD
  • See http//www.gmod.org for access details.

21
DAS/2 Reference Client
  • Implemented in Java in the Integrated Genome
    Browser
  • IGB (ig-bee) - A visualization app developed at
    Affymetrix
  • Supports data loading via a variety of formats
    and mechanisms
  • Full implementation of DAS/2 read client, partial
    implementation of DAS/2 writeback.
  • Handles large amounts of genome-scale data
  • Loads hundreds of thousands of sequence
    annotations at once
  • Loads dense quantitative graphs with millions of
    data points
  • Maintains real-time responsiveness to user
    interactions
  • Includes features to support exploratory data
    analysis
  • Plugin architecture for customized extensions
  • Source code released under Common Public License
  • http//genoviz.sourceforge.net

22
Upcoming DAS/2 Developments
  • Writeback protocol
  • Ready for implementation
  • Registry and discovery protocol
  • Various alternatives have been discussed
  • A playpen server available at EBI

23
DAS/2 caBIG
  • Project 1 Add DAS/2 support to caCORE
  • Will enable caCORE to read genome annotations
    from DAS/2 servers and re-export as caCORE
    objects.
  • Uses a flexible plug-in architecture that will be
    generally useful.
  • Project 2 Export HapMap database as DAS/2
  • Will make HapMap human variation data available
    to caBIG grid via caCORE.
  • Project 3 Export Vertebrate Promoter Database as
    DAS/2
  • Will make curated information on vertebrate
    transcription factors and their binding sites
    available to caB IG grid via caCORE.

24
Acknowledgements
  • DAS DAS2 mailing list participants!
  • Lincoln Stein (CSHL)
  • Ed Erwin, Steve Chervitz, Eric Blossom, Hari
    Tammara (Affymetrix)
  • Tony Cox, Ed Griffiths (Sanger Institute)
  • Allen Day, Brian OConnor (UCLA)
  • Andrew Dalke (Dalke Consulting)
  • Suzanna Lewis (LBL)
  • Ann Loraine (U. of Alabama)
Write a Comment
User Comments (0)
About PowerShow.com