Title: Abstract
1 Some of the growth curve data from multi-plate
(Omnilog) experiments have been uploaded to
Biofiles. One of these data sets from an
experiment investigating the effect of different
levels of zinc concentration on several strains
of D. vulgaris has been loaded into theEDSS
database and can be accessed on-line. The
averages of the replicates for each treatment and
all organisms, or all organisms and each
treatment, can be viewed as text files, HTML
tables, or plots using the interface shown in
Fig. 4A. Fig. 4B shows a sample plot for six
treatments applied to D. vulgaris Hildenborough.
In addition to the plot, approximate generation
times have been calculated and are shown in the
table below the plot. These are approximate
because the calculations are based on intensity
measurements, not on cell density. The 's on
the curves show the locations of the times used
to calculate the generation times. Note that
the plot and the table of generation times are
generated from data stored in EDSS when the user
requests them.
Abstract The Hazen Lab at Lawrence Berkeley
National Laboratory produces biomass for use by
other VIMSS laboratories. In addition, the Hazen
Lab is conducting growth curve experiments to
examine the effect of varying treatments or
stressor levels on different strains of
bacteria. As information and data continue to be
generated, there is a pressing need for recording
information about samples and experiment
designs, for making the information and data
available to other project participants, and for
tools that facilitate quality assurance/quality
control assessments. To address these needs, we
have developed two prototype Web interfaces
collectively referred to as the Hazen Lab Data
Center. The Web interfaces provide access to
data stored in a database management system and
tools for data processing and data display. In
addition, we are developing easy-to-use, but
comprehensive spreadsheet templates to record
both general and detailed experiment design
information.
We are in the process of designing an Excel
template for Omnilog growth curve experiments.
These experiments involve up to fifty 96-well
plates. In general, these experiments will
involve a single organism grown on a single
medium and subjected to a number of experimental
factors or several levels of treatment. It is
possible, however, that experiments may be run
that use different strains of an organism, or the
same organism grown on different media. Because
of the large number of wells involved, it is not
possible or desirable to have a template for all
of the wells on each plate. Instead we have
opted to use a per plate layout (Fig. 7A) for
experimental factors that do not vary over a
single plate (e.g., organism, medium,
environment) , and a per column layout (Fig. 7B)
for factors that vary from column to column on a
single plate.
More detailed information about organisms and
inoculums is captured on a separate sheet (Fig.
7C). The sheet is divided into two sections.
The first section is for basic information, e.g.,
name, locally used identifiers, source, etc., and
the second section is for comments.
The draft template also includes separate sheets
to record - experiment design information
(e.g., Fig. 5B), - a list of filenames that
correspond to each plate (the Omnilog
generates one file per plate), and -
information about wells that showed no
growth. The template also includes a sheet to
cover the case in which more than one experiment
factor may vary over a plate. In the case of
phenotype microarray plates, on which every well
on a plate is unique, e.g., different carbon
sources, different phosphate sources, etc., the
vendor who supplies the plate also provides an
Excel file with descriptions of every well on the
plate. Both the Biolog and the Omnilog Excel
template files are read using a Perl module
SpreadsheetParseExcel available via the World
Wide Web from the Comprehensive Perl Archive
Network (CPAN) at www.cpan.org.
Introduction Access to the on-line Hazen Lab
Data Center begins with the Web page shown in
Fig. 1. From this page a user can select one of
three lists of information biomass production
experiments, growth curve experiments, or
organisms. The information about the experiments
and data from the growth curve experiments is
stored in the Experimental Data Storage System
(EDSS), a relational database (Olken, et al.,
2004) for capturing experiment data and metadata.
When a user selects one of the lists, the EDSS
database is queried and a detailed list such as
Fig. 2 is retrieved and formatted. The list of
biomass production experiments shown in Fig. 2
can be sorted by clicking on the column headings.
The list of recipients (col. 6) can be expanded
and sorted to show, for example, which samples
were shipped to which labs. More detailed
information about an experiment can be viewed by
clicking on one of the experiment links in the
first column. Fig. 3 in the section below show
the detailed experiment design information that
has been stored in EDSS. Note that the list of
experiments shown in Fig. 2 does not include all
biomass production experiments conducted by the
Hazen Lab.
Currently, biomass production experiments are
described in detail by the Hazen Lab staff using
Excel spreadsheets and text descriptions inserted
into the cells in the spreadsheet. While using
Excel spreadsheets is an effective way of keeping
large amounts of disparate information organized
and together in one file, Excel spreadsheets do
not provide a convenient way to display and print
the information, nor are the contents of Excel
files easily searched. As part of EDSS, we
created tables in the database to hold the
different kinds of information collectively known
as the experiment design. Different aspects of
the experiment design, such as 'inoculum
culture', 'treatment and control', etc. are
stored as lab procedures. Storing the experiment
design in the database means that specific
information can be searched for. For example,
one can search for all experiments for which a
batch culture method was used, or all experiments
for which samples where shipped, or all
experiments for which oxygen was the
stressor. Moreover, the Web interface to the
Hazen Data Center provides quick and convenient
access to the experimental design information.
Fig. 3 shows how the experiment design for a
biomass production experiment is displayed using
the Web interface. Note that the display of
information is generated when the user requests
it from information pulled from the database.
What is displayed is not a previously generated,
i.e., static, Web page the page is generated 'on
the fly'.
An Excel template was developed to capture the
experimental design and treatment and replicate
information for single-plate (Biolog) growth
curve experiments. The advantages of using the
Excel template are - the electronic format
makes it easier to share, organize, and save
information about the experiment - recording
information in the template takes little more
time than writing it in a lab notebook - Excel
files can be read by a parsing program - each
page of the template can be formatted so that it
fits on an 8.5 x 11sheet of paper - Excel is
ubiquitous. The Excel sheet to the right (Fig.
5A) records information about which treatment is
in which well on a Biolog plate. The sheet below
and to the right (Fig. 5B) records information
about the design of the experiment. A Web-based
interface was developed so that the user can
upload the template and data files(Fig. 6A). A
computer program parses both files, and based on
the replicate design in the template file,
calculates the average and standard deviation of
each group of replicates for each treatment (Fig.
6B). The results may then be downloaded to the
user's computer, displayed in an HTML table, or
plotted (Fig. 6C).
Future Work Data loading tools have already been
developed to upload the biomass production
experiment design information into EDSS.
Database tables and data loaders for the culture
and media information and the QA/QC data still
need to be developed, as well as a Web interface
for access and query tools. The template for the
Omnilog growth curve experiments needs to be
tested and refined, and a parser written to read
the template file and store the experiment design
information, replicate design, and organism and
inoculum information in the EDSS database. Once
that has been completed, then the Hazen Lab Data
Center Web interface needs to be extended to
display more detailed information about the
growth curve experiments in the list of
experiments. At this time, the Web interface for
the Biolog growth curve data allows the user to
upload the template and data files and processes
and displays the data. An information/data load
function needs to be added to the Biolog Web
interface so that the experiment design
information in the template file and the growth
curve data can be loaded into the EDSS database
if the user chooses. Recommendations In general
terms, the goals of data management are to record
and preserve all important information and data
about an experiment and to organize and make
accessible the information and data. Recording
and preserving information and data are important
because without detailed information, an
experiment cannot be reproduced and questions
about the effect of experimental factors cannot
be answered. Organizing and making information
and data accessible facilitate data sharing,
contextual data analysis, and development of a
knowledge base. Unfortunately, there are few
widely accepted and widely used data standards
for the types of data being produced by VIMSS
experiments. As a result, there is little-to-no
standardization with respect to how even the most
basic information about experiments is being
recorded and reported. The templates that we
are developing help to address some of these
problems by providing easy-to-use, human- and
computer-readable templates for recording
information about experiments. Advantages of
using such a template are that the template can
be easily shared with colleagues, it provides a
standard format (but one that can be easily
refined), and its contents can be uploaded into a
database. Not only do we recommend that other
VIMSS projects participants consider using a
similar approach, but we are willing to make the
templates that we have developed available for
others to use as a starting point in their own
labs, to work with others to help define the
related database schema, and to provide sample
Perl scripts to parse the templates. Reference O
lken, F. and Keck, K. VIMSS Biological
Experimental Design Schema, Version 003i. June
30, 2004. Available from Frank Olken at
olken_at_lbl.gov.
Summary The goals for developing the template
files, Web interface, and software tools for the
Hazen Lab data are to - capture the
description of the experiment design in
sufficient detail and in a consistent
format - capture the description of the
replicates in an easy-to-read, easy-to-parse
format - average replicate data (and
calculate the associated standard deviations)
without tedious manipulation of the data in
spreadsheets - provide Web-based display
tools so that the information can be viewed
on-line and data may be viewed in different
formats - automate estimating generation
times and minimum inhibitory concentration
(MIC) values for growth curves. Work is underway
to develop on-line access, processing, display,
and analysis tools consistent with the goals
above. On-line access is available for data sets
that have been loaded into the EDSS database.
These include experiment design information for a
subset of the biomass production experiments and
one growth curve data set. Access consists of
lists of experiments that have been loaded into
EDSS. Processing the growth curve data stored in
EDSS consists of averaging data for each set of
replicates. This has required a description of
what constitutes a set of replicates for a growth
curve experiment, and storing that information
together with the growth curve from each well on
each plate. In addition, the standard deviation
at each time point for each replicate group is
calculated to flag replicate groups that include
outliers. Display of information and data in
EDSS is data specific. Information about a
biomass production experiment is extracted from
EDSS on demand and presented as formatted Web
pages (Fig. 3). Display of growth curve data
consists of text, HTML tables, or plots (Figs. 4
and 6). The only analysis program available
on-line at this time is a program that calculates
the approximate generation time from a growth
curve (Fig. 4). The calculation is approximate
because it uses the intensity measured by the
Omnilog or the optical density measured by the
Biolog as a surrogate for cell density. The
program calculates a range of generation times
and the accompanying plot is marked with the
times used to calculate the generation times in
order to help the user choose which calculation
is the best. Another reason for developing the
program is that the growth curve data for the
phenotype microarray plates will consist of
hundreds of growth curves. Determining the
generation time by visual inspection of hundreds
of growth curves would be time consuming and
tedious. An automatic calculation will provide a
smaller starting set of growth curves that can be
examined in more detail or compared to one
another. The Web interfaces, and underlying
software, are the beginning of an effort to
standardize an approach for uploading, storing,
processing, and displaying data. Such Web
interfaces cannot take the place of in-depth
analysis and data discovery tools, but help to
close the gap between the generation and analysis
of data.