Monica Bradford - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Monica Bradford

Description:

The journal and data integrity. Some background on policies at Science ... No longer allow data not shown, in-press, in-prep citations. Science and Data Integrity ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 17
Provided by: dha142
Category:

less

Transcript and Presenter's Notes

Title: Monica Bradford


1
Sciences Perspectiveon Assuring the Integrity
of Research Data
  • Monica Bradford
  • Executive Editor, Science
  • 16 April 2007

2
The journal and data integrity
  • Some background on policies at Science
  • Comments on the key issues as outlined in the
    Statement of Task
  • Observations on the Human Element
  • Suggested responses

3
General Observations Related to Data Policies
  • Sciences policies have evolved over time, for
    example
  • Depositing coordinates for structures without a
    hold of one year
  • No longer allow data not shown, in-press, in-prep
    citations.

4
General Observations Related to Data Policies
  • Most often in response to consensus that develops
    in the scientific community
  • Much easier for a journal to enforce policies
    that have community buy-in. For example
    Microarray data. Data should be presented in
    MIAME-compliant standard format. Approved
    databases are Gene Expression Omnibus and
    ArrayExpress.
  • Community monitors compliance
  • Standards dependent on the context of how the
    data would be used
  • Large economic/public health/public policy impact
  • Basic research not publishing lab notebooks

5
Data Characteristics
  • The amount of raw data per experiment has
    increased significantly
  • In many cases raw data does not make sense
    without processing or some sort of computation
  • Satellite data correct for orbit, etc.
  • Microarray data
  • Ice melting data devil in the details hidden
    calibrations
  • Papers dont include raw data from instruments or
    subjects
  • Submitted data has been subjected to synthesis
    and analysis
  • No real assurances about the calibrations
  • Distributed responsibility among co-authors
    requiring trust about data handling

6
Complexity
  • The more complex the data sets and the more
    complex the deriving technologies, the less
    likely the reader is able to determine if there
    are intrinsic problems that will call into
    question the results.
  • Often only see the endpoint of experiments that
    require large computing times, such as molecular
    simulations.
  • Modeling presents unique challenges because the
    code is evolving over time.

7
Policies for Large Data Sets
  • Require deposition in public databases for large
    data sets (protein DNA sequences, microarray
    data, atomic coordinates structure factors)
  • Accession number must be included in publication
  • Deposited information must be released at time of
    publication

8
Policies for Large Data Sets
  • When an approved public database does not exist,
    the author is required to make data available
    from institutions web site
  • Database must be kept intact and unchanged for 5
    years
  • Copy of database must be provided to Science and
    authors agreed that Science may release the data
    if the authors do not provide access as
    stipulated
  • This solution is not ideal. An interactive
    website built by a research group, which may
    become more common as systems analyses grow, is
    not something we can archive. For example 635 MB
    of raw data from a signal transduction pathway
    analysis 5DVDs of material from a
    websitetranscriptional maps of 10 human
    chromosomes

9
Policies for Large Data Sets
  • Our policies have evolved overtime
  • MOU with authors of human genome and rice genome
    data provided for an escrowed copy at Science and
    the written agreement that we would make the data
    available if the commercial entities did not
    provide access
  • Now we would not have such an arrangement if an
    approved public database exists.
  • Concerns over the long-term financial stability
    of some of the public databases
  • Quality control, curation, and upgrading of the
    databases to new technologies are expensive but
    essential cost
  • Funders seem to like to support the creation, but
    not the maintenance of these databases
  • BIND lost funding and PhysioNet seeking support

10
Supplemental Data
  • Supplementary online material (SOM) has grown at
    a rapid pace. In 2000, about 15 of the papers
    had SOM and by 2006 the percentage had grown to
    90
  • Ability to thoroughly review SOM is questionable
    as the PDF can be very large (range from 1 to
    over 100 pages).
  • Availability of data in SOM has helped to catch
    errors in published papers
  • Long-term access to this material on journal
    sites needs to be assured
  • Lack of full-text search for our SOM limits its
    usefulness and retrieval.

11
Recent Changes in Response to Concerns
  • Figures for all revised manuscripts are being
    reviewed in Photoshop.
  • Initial review by editorial assistant
  • Concerns sent to Deputy Editor and Art Director
    for further examination
  • About one problem paper per month over the last
    year.
  • Authors are required to inform Science before
    acceptance of any restrictions on sharing of
    materials (MTAs, for example). Unreasonable
    restrictions may preclude publication.
  • Statements regarding data sharing are stronger.
    All data necessary for a reader of Science to
    understand and assess the conclusions of the
    manuscript must be available to any reader.

12
How do the issues manifest themselves to the
journal?
  • Decisions regarding data integrity have been made
    long before a paper is submitted.
  • Co-authors can seem to be blindsided by data
    handling practices of other authors.
  • No longer can assume that terminology that is
    shorthand for a series of steps means the same
    thing to all authors.
  • Fine line between producing publication quality
    figures and data manipulation. Ethics training
    may not be in sync with modern lab practice
  • Specialization, massive data sets, and
    interdisciplinary, international teams have made
    in nearly impossible for co-authors to take joint
    responsibility for all results

13
How do the issues manifest themselves to the
journal?
  • Conflicts between MTAs and journal policies not
    thought through
  • Mergers and acquisitions can change the rules of
    the game and make material sharing more
    difficult.
  • No agreement on what constitutes reasonable
    requests.
  • Not clear that all fields agree that data must be
    made available for all uses, not just replication

14
Human Element
  • Responsibility for data generation is dispersed.
    Communication between co-authors problematic
    for 1-2 papers a week discover that all authors
    have not agreed w/ submission.
  • Authors may come with different concepts about
    how science is performed
  • Discipline-based practices
  • Cultural practices
  • Assumptions made regarding common practices
  • Training and experience in different lab
    environments
  • Disconnect between PIs and scientists at the
    bench we see repeated instances where close
    oversight is not occurring.
  • Money is tight, competition and pressure on
    scientists is growing

15
Final Observations
  • Our biggest problem is not the use of technology
    to defraud, but the fact that the way science is
    done has been changing.
  • Training mechanisms to protect the integrity of
    data must evolve in pace with the significant
    changes in the practice of science.
  • Journals react to problems pointed out by the
    community as articulated by authors, reviewers,
    editors and members.
  • Top journals can accelerate the acceptance of new
    community-driven standards, but we are not the
    cure-all

16
Final Observations
  • Support of public databases is essential.
    Archives for data of various kinds need to be
    expanded--various groups have started databases
    for metagenomics, protein interactions, human
    variation, - need continued support so we are
    sure they will be maintained and free.
  • Nature contacted 89 databases operating in 2000.
    Seven have folded and more than half are
    struggling financially. (Nature 435, 1010-1011,
    23 June 2005)
Write a Comment
User Comments (0)
About PowerShow.com