Title: Genome Browsing and AJAX: Advancing GMODs GBrowse to the Next Level
1Genome Browsing and AJAXAdvancing GMODs
GBrowseto the Next Level
- by Andrew Uzilov
- for Holmes Lab group meeting
- October 13, 2006
2Genome browsers are not just a good idea... they
are The Way
- Necessary for visualizing and understanding
large amounts of genomic information - genome organization (including synteny)
- multiple splicing
- comparing predictions against known data
- some insights may be more obvious visually than
through flat files, database queries, or writing
custom programs for data analysis
3What else are they good for?
- Retrieving information
- point-and-click on features of interest better
interface for exploring - BLAST and other database searches get you a
visual of the genomic context, not just text - Prepare pretty pictures for publications
- annotation upload feature is a must for this
- Better interface for community annotation (genome
wiki) - Genome feature WYSIWYG editor?
4What are the problems with current genome
browsers? (1)
- Most Web-based genome browsers are static HTML
pages the entire page is refreshed (HTML
generated anew by server) anytime the user
navigates, changes layout, etc. - Delay incurred while page reloads annoying
- Vertical scroll position lost also annoying
- Sometimes, JavaScript or Flash is used to provide
some dynamic content (you can change certain
things without triggering reload), but usually
navigation still causes reloads
5What are the problems with current genome
browsers? (2)
- Most (all?) Web-based genome browsers rely on
the server renders graphics from scratch upon
client request model - Images for genome views are rendered on demand,
after user navigates, changes layout, etc.,
making the user wait - Rendered images arent reused or reusable not
saved or cached, rendered anew each time - There are difficulties in preparing pre-rendered
content
6Pre-rendering difficulties
- It would be is to have all images rendered ahead
of time, then just serve them up, requiring no
live rendering overhead/delay - obstacle is that pixel width of genome views is
quite, quite, quite large cant render view as
single image, will run out of memory - cant render in small parts either, as
BioPerl/GBrowse will not produce parts that
concatenate into a nicely contiguous genome view - and probably other rendering frameworks
7The Insight
- Make BioPerl think it is rendering a massively
wide single image, but instead intercept all
rendering calls to the graphics library (i.e. the
graphics primitives) and store them in database - Now, we can query the database for only a
manageable subset of primitives (i.e. only those
required for a single tile the basic unit out
of which the total genome view is constructed)
and render only them, producing a
reasonably-sized tile image - primitives coordinates are offset if they start
in tiles prior to (left of) the current one
8Enter THE WEB 2.0 GBROWSE
- Basic philosophy
- the client is an application
- maintains internal state (no longer a static
page) - knows how to render itself (old way server
generates the whole pages HTML for you) - knows how to change itself dynamically (old way
server generates new HTML for you) - the server is a well, literally, a server
- pre-processes as much as possible to reduce
session-time delays/overhead - off-loads as much work as possible on the client
- all this reduces server load, speeding up session
- less trite name under review
9So how does it work?
- Based on GMODs GBrowse framework
- The server-side GBrowse Perl code for rendering
genome views (i.e. the gbrowse_img script for the
CGI) was hacked apart and back together to be a
standalone pre-rendering program that uses
BioPerl and GD libraries in the same way as
GBrowse - except TiledImage.pm intercepts calls to cache
primitives and render tiles - The client was written in JavaScript from scratch
10Server side - the original way
The GBrowse framework (from Stein LD, Mungall C,
Shu S, Caudy M, Mangone M, Day A, Nickerson E,
Stajich JE, Harris TW, Arva A, Lewis S (2002).
The generic genome browser a building block for
a model organism system database. Genome
Research 12(10), 1599-1610.
11Server side - the new way
or at least one current proposed new way
(subject to change)
12Server side features currently implemented (1)
- MySQL database
- tile rendering Perl TiledImage.pm
- intercepts BioGraphicsPanel calls to
GDImage (using AUTOLOAD) and stores them in
database, keyed on the bounding box to which they
apply - now, if we want to know which GD primitives need
to be rendered for some tile, we just search the
database for all primitives overlapping with the
tile bounding box
13Server side features currently implemented (2)
- tile rendering Perl generate-tiles.pl
- uses TiledImage.pm to
- fill MySQL database with graphics primitives
- render tiles from a given database of primitives
- generate XML containing client config info
- do any combination of the above, including on
subsets of tiles (allows to break rendering down
into jobs, suitable for rendering on multiple
CPU)
14Server side features to be implemented,
short-term (1)
- tile serving module
- pre-fill the database with all primitives, but
only render them selectively as users request
tiles - store already rendered tiles to prevent
re-rendering - maybe idle server CPU cycles can be used to
render arbitrary tiles, always filling the tile
space - generate-tiles.pl should process externally
rendered tiles, e.g. - dotplot tracks
- histograms
- supporting material for features such as pictures
of fluorescent gene expression profiles,
physiological changes due to gene knockout
experiments, etc.
15Server side features to be implemented,
short-term (2)
- database optimizations
- query for primitives in the slow step rendering
takes much longer than loading database - key on tile number (1 key), not bounding box (4
keys) - go to whiteboard
- gridlines account for gt50 of the primitives, but
are the same for every tile - maybe load gridlines for just one tile, and
return them for every query? - GUI to wrap generate-tiles.pl
- should be built into Web interface for annotation
upload
16Server side features to be implemented,
long-term (1)
- how to serve up feature info?
- short-term solution
- have generate-tiles.pl produce an XML file with
feature data (bounding boxes, etc.) for each
tile, since it has easy access to that info - client loads and parses an XML tile for each tile
- more robust solution
- need a database of features (but what kind?)
- necessary to support efficient search for
features - necessary for community annotation, because
people will be changing the feature info
constantly
17Server side features to be implemented,
long-term (2)
- community annotation
- concurrency is an issue (updating changes,
notifying client of updates since start of
session, locking features for editing, etc.) - feature upload seems (to me) to be a special case
of community annotation and should use its
framework - quality control (registration, security)
- are there existing database schemas or other
frameworks that can serve this purpose?
18Client side features currently implemented
- Dragging works, but with bugs when large views
are involved (fix is non-trivial, in progress) - Also work jumping, centering, zooming, dynamic
resize - Tracks can be toggled hidden/visible
- Hovering labels (either all on, or pop up on
mouseover), with adjustable transparency
19Brief aside what is AJAX?
- Asynchronous JavaScript and XML
- A combinations of technologies to make clients
behave more like applications - JavaScript client code that uses XMLHttpRequest
to asynchronously query the server for things - Implies XHTML (well-formed HTML) and DHTML (DOM
manipulation), use of CSS
20Why I am avoiding existing AJAX frameworks
- Useful for flashy graphics effects, but dont
help with the engine of the client (except maybe
Prototype and Google Web Tookit) - but, GWT is closed source and an early version
even online demo has bugs - None support dragging, track management, tile
caching, etc so that needs to be done ourselves
(and has, so far, consumed most of the effort) - But Im willing to consider them for
- adding graphics effects after engine is more
developed - for asynchronous communication with the server
21DOM from XHTML to a tree
- lttablegt
- lttbodygt
- lttrgt
- lttdgtShady Grovelt/tdgt
- lttdgtAeolianlt/tdgt
- lt/trgt
- lttrgt
- lttdgtOver the River, Charlielt/tdgt
- lttdgtDorianlt/tdgt
- lt/trgt
- lt/tbodygt
- lt/tablegt
This is from a W3 page, so you know its
right http//www.w3.org/TR/2004/REC-DOM-Level-3-C
ore-20040407/introduction.html
22Client the nitty-gritty (1)
- Code is broken down into multiple JavaScript
classes - by which I mean just separate .js files, most of
which are object instances that provide - class functions and methods
- namespaces
- modularity, organization
- Static classes (standalone file, no instance)
- Other.js misc. helpers
- Load.js loads XML when it is loaded,
instantiates all objects in the correct order
23Client the nitty-gritty (2)
- The Component system
- An attempt to bring order to chaos
- Each discrete UI element (e.g. main view,
navigation panel, panel with track control
buttons, etc.) is a Component - code for each component in its own file
- Components are
- instantiated by Load.js
- connected through ComponentInterface.js
- should not modify other Component properties
directly (although JavaScript allows this), but
rather use ComponentInterface.js for sanity!
24Client the nitty-gritty (3)
- Each Component must define
- constructor
- renderComponent()
- returns the DOM node for this Component
- will (eventually) be called by Load.js, which
will then take the DOM node and append it to
document - once fully implemented, there will be no need for
content in ltbodygt of XHTML JavaScript will
render everything dynamically - which allows for possibility of having a
server-side config file specifying client-side
layout, thus further removing users from the
necessity of doing any programming
25Client the nitty-gritty (4)
- Each Component must also define
- getState()
- for setting bookmarks/history
- setState()
- for restoring bookmarked/history points
- some bookmarking object will eventually use the
above methods to store/load bookmarked states by
polling all Components
26Client the nitty-gritty (5)
- If a programmer writes a new Component, they have
to - add accessors/modifiers for its object properties
to ComponentInterface.js - add calls to constructor and renderComponent() to
Load.js - However, eventually, accessor/modifier
construction will be done automatically by
ComponentInterface.js (in theory, its possible) - this means that a Component programmer never has
to look outside their own Component code, using
the API for the other Components to access/modify
them
27Gods below! Was it really necessary to take 5
slides for this?
- Yes, because object-oriented programming in
JavaScript requires discipline, and its important
to work these things out early on - with multiple people working on this code, it
needs to be compartmentalized somehow - otherwise, debugging may cause blood pressure to
rise to dangerous levels (although Venkmans
debugger will alleviate that) - see ComponentTemplate.js in SVN for a template,
with guidelines on how to write a component of
your own
28Client the nitty-gritty (6)
- Current components
- ViewerComponent.js
- NavigationComponent.js
- TrackControlComponent.js
- DebugComponent.js
- Other classes
- View.js
- stores limited information about current view
- intended to be the class that manages feature
info fetching, caching, etc. - TracksAndZooms.js
- just a data structure to hold config info from
the XML file and current state info about what
zoom level were at, and what tracks are
hidden/visible - These should really be prototypes for other
objects
29Client the nitty-gritty (7)
- Dragging and genome view events
- brace yourself, this is going to be ugly
- go to whiteboard
- Ideally, no one should have to deal with this
after its been programmed, as it will be wrapped
up in ViewerComponent.js, and navigation can be
accomplished by using accessors to move view
around
30Client side features to be implemented (1)
- Client has no idea what the information on the
tiles actually means (no knowledge of where and
what the features are) - must be made aware of what it is displaying
short-term solution is load this from XML file
for each tile (remember the server-side to do?) - the client JavaScript class for doing this can
be later replaced with something more
sophisticated, e.g. an XSL transformational
grammar and XHR for fetching feature info from
database there are many possibilities
31Client side features to be implemented (2)
- How can the user actually see the information
about features? - pop-up menu on mouseover?
- would have option to pop up details in separate
window, manage annotation, etc. - displayed in a sidebar a la Google Local?
- There is no one True Answer, so maybe we can
build all of the above and provide options to
toggle between things
32Client side features to be implemented (3)
- Feature search
- by feature, keyword, regular expression, etc.
- search results display
- pop open a table (load Component) displaying
results clicking on results in table will center
the view on them - multiple views can open up stacked on one another
- can be used to display synteny link them all to
a single horizontal dragging ruler
33Client side features to be implemented (4)
- Posting things to server (what protocol? XML?
JSON?) - community annotation
- feature upload
- automated bug reporting system
- Needs to check for changes in server-side
database, tiles rendered, etc., since community
annotation may change contents that you are
looking at
34Client side features to be implemented (5)
- Bookmarking
- entire state of browser encoded in URL
- can use Web browser bookmarking to save
- have internal tracking of history
- internal back/forward buttons, log of what you
did - every Component must have getState() and
setState() defined to implement this - JSON would be perfect for this, no?
- Output current view to image (PNG, SVG, etc.)
35Client side features to be implemented (6)
- The genome browser as a plug-in
- runs in a little box on someone elses website to
show an example
36This was written to the sounds of
- Tortoise Standards
- Jazz History Vol. 5 Now As Then-Revival
- Tosca Suzuki
- Aphex Twin - I Care Because You Do
- Squarepusher - Ultravisitor