Title: Text mining and Open Access publishing
1Text mining and Open Access publishing
Matthew CockerillTechnical Director, BioMed
Central
2Summary
- What is Open Access publishing?
- Open Access publishing and text mining
- About BMC Bioinformatics
- The BioCreative supplement
3Summary
- What is Open Access publishing?
- Open Access publishing and text mining
- About BMC Bioinformatics
- The BioCreative supplement
4The current model of publishing scientific
research
- Scientists carry out research
- They write up their results
- They submit them to a journal
- Other scientists act as peer reviewers and
editorial advisers - Finally, the publisher sells access to that
research back to the scientific community
5Whats wrong with this status quo?
- Restricted access to scientific research is
contrary to the interests of - the scientists who do the research
- the funders who pay for it
- society as a whole
- It is an historical artefact of the economics of
print publishing - It is a serious obstacle to mining of full text
information
6BioMed Central The Open Access publisher
- Commercial organization
- Published first article in mid-2000
- Strict policy of immediate Open Access to all
research articles
7Growth of BioMed Central
8Momentum for Open Access
- PubMed Central
- Public Library of Science
- Open Access declarationsBudapest/Bethesda/Berlin
- Software open-source movement
- Mass cancellation of titles from traditional
publishers
9BioMed Centrals business model for open access
publishing
- Keep costs down via
- Online submission and peer review
- Automated tools to streamline article processing,
conversion and layout - Processing charge (currently 525) for accepted
articles - No processing charge for authors at member
institutions
10Institutional membership
More than 400 institutions are members of BioMed
Central, including, to name just a few
- CalTech
- Cancer Research UK
- Columbia University
- Cornell University
- University of California
- Dana-Farber Cancer Institute
- Harvard University
- INSERM
- Imperial College
- Institut Pasteur
- John Innes Centre
- Johns Hopkins University
- Kyoto University
- Max Planck Institutes
- Memorial Sloan-Kettering Cancer Center
- MRC Laboratory of Molecular Biology
- National Institutes of Health
- National Institute for Medical Research
- NHS England
- Princeton University
- Rockefeller University
- TIGR
- TSRI
- Tufts University
- Wellcome Trust Sanger Institute
- University of Wisconsin
- World Health Organization
- Yale University
11Summary
- What is Open Access publishing?
- Open Access publishing and text mining
- About BMC Bioinformatics
- The BioCreative supplement
12Mining the full text
- Analysing results of high-throughput experiments
means biologists increasingly need text-mining
tools - PubMed is currently the primary resource for text
mining (its whats available) but - Abstracts omit critical information
- Techniques developed for abstracts may not
effectively use extra information in full text - Fully Open Access corpora, in standard XML
formats, will help
13Data mining - BioMed Central
http//www.biomedcentral.com/info/about/datamining
- Entire corpus of full text XML downloadable by
ftp as a single zip file - Various groups working with the data
- E.g Pre-BIND (automatic extraction of possible
protein-protein interaction information from full
text) - No restrictions on redistribution
- This means other groups can use same corpus to
repeat and build on results
14Data mining - BioMed Central (screen shot)
15Data mining - PubMed Central
http//www.pubmedcentral.com/about/oai.html
- Standard NLM archiving/interchange XML DTD
common format across multiple publishers - Only a subset of PubMed Central participating
publishers allow download of full text XML - BioMed Central
- Public Library of Science
- Hopefully, more will follow.
- XML made available via OAI interface
16Data mining - PubMed Central
17Adding structure to full text data
- Some examples of useful structure
- Structure of article itself (figure legends,
materials and methods, references etc) - MathML, CML etc
- Disambiguated references to genes/proteins
18Authoring tools are key
- Manuscript structureEndNote, TeX/BibTeX pretty
good already - MathML
- Publicon, TeX etc.
- CML
- Chemsketch etc.
- Gene/protein reference markup?
- Semi-automatic markup during authoring
- Author reviews and confirms markup
- System prompts author to clarify ambiguity c.f.
grammar checker, code intelligence
19Summary
- What is Open Access publishing?
- Open Access publishing and text mining
- BMC Bioinformatics
- The BioCreative supplement
20BMC series of online journals
- BMC Biochemistry
- BMC Bioinformatics
- BMC Biotechnology
- BMC Cell Biology
- BMC Chemical Biology
- BMC Developmental Biology
- BMC Ecology
- BMC Evolutionary Biology
- BMC Genetics
- BMC Genomics
- BMC Immunology
- BMC Microbiology
- BMC Molecular Biology
- BMC Neuroscience
- BMC Pharmacology
- BMC Physiology
- BMC Plant Biology
- BMC Structural Biology
- BMC Anesthesiology
- BMC Blood Disorders
- BMC Cancer
- BMC Cardiovascular Disorders
- BMC Clinical Pathology
- BMC Clinical Pharmacology
- BMC Complementary and Alternative Medicine
- BMC Dermatology
- BMC Ear, Nose and Throat Disorders
- BMC Emergency Medicine
- BMC Endocrine Disorders
- BMC Family Practice
- BMC Gastroenterology
- BMC Geriatrics
- BMC Health Services Research
- BMC Infectious Diseases
- BMC International Health and Human Rights
- BMC Medical Education
- BMC Medical Ethics
- BMC Medical Imaging
- BMC Medical Informatics and Decision Making
- BMC Medical Research Methodology
- BMC Musculoskeletal Disorders
- BMC Nephrology
- BMC Neurology
- BMC Nuclear Medicine
- BMC Nursing
- BMC Ophthalmology
- BMC Oral Health
- BMC Palliative Care
- BMC Pediatrics
- BMC Pregnancy and Childbirth
- BMC Psychiatry
- BMC Public Health
- BMC Pulmonary Medicine
- BMC Surgery
- BMC Urology
- BMC Women's Health
21BMC Bioinformatics
22RSS feeds
23Open access leads to high visibility
- Indexing/Linking
- PubMed
- MEDLINE
- ISI
- BIOSIS
- CAS
- CrossRef
- Scirus
- Open Archive Initiative
- Citebase
- Google
- Archiving
- PubMed Central
- INIST
- LOCKSS
- Max Planck
- OhioLINK
24BMC Bioinformatics - citation impact
25Summary
- What is Open Access publishing?
- Open Access publishing and text mining
- About BMC Bioinformatics
- The BioCreative supplement
26Process for publishing in BMC Bioinformatics
supplement
- Follow BMC Bioinformatics Research Article
instructions for authors - Send articles to BioCreative organizers who will
coordinate peer reviewdo not submit articles
online - Supplement passed on to BioMed Central for XML
markup and publication - 400 processing charge/article
27Instructions for authors
28Access to supplement
- All articles in supplement covered by BioMed
Centrals Open Access licence agreement - Free access
- Free re-distribution/re-use
- Supplement indexed in PubMed and permanently
archived in PubMed Central
29Thats it