A bibliometric analysis of chemoinformatics - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

A bibliometric analysis of chemoinformatics

Description:

Presented at the 25th Anniversary Meeting of the Molecular Graphics and ... of MolScript that includes greatly enhanced coloring capabilities, J. Mol. ... – PowerPoint PPT presentation

Number of Views:166
Avg rating:3.0/5.0
Slides: 33
Provided by: jonp1
Category:

less

Transcript and Presenter's Notes

Title: A bibliometric analysis of chemoinformatics


1
A bibliometric analysis of chemoinformatics
  • Presented at the 25th Anniversary Meeting of the
    Molecular Graphics and Modelling Society, School
    of Oriental and African Studies, London 13th
    March 2007
  • Peter Willett, University of Sheffield, UK

2
Overview of talk
  • Bibliometrics
  • Chemoinformatics
  • Growth of the subject
  • Subject coverage
  • Author productivity
  • The Journal of Molecular Graphics (and Modelling)

3
Bibliometrics what is it?
  • Bibliometrics is
  • The application of mathematical and statistical
    methods to books and other media (A. Pritchard
    (1969), Statistical bibliography or
    bibliometrics?, J. Docum., Vol. 25, pp. 348-349)
  • The study, or measurement, of texts and
    information (Wikipedia)
  • See also
  • Webometrics
  • the study of the quantitative aspects of the
    construction and use of information resources,
    structures and technologies on the Web drawing on
    bibliometric and informetric approaches" (L.
    Björneborn and P. Ingwersen (2004), Toward a
    basic framework for webometrics. J. Amer. Soc.
    Inf. Sci. Technol., Vol. 55, pp. 1216-1227)
  • Cybermetrics, informetrics, scientometrics

4
Bibliometrics subjects of study
  • Bibliometric distributions
  • Highly skewed frequency distributions (Bradford,
    Lotka, Zipf) and their implications
  • Citation analysis
  • Analysis of individuals, institutions and
    journals
  • Use as performance indicators for the evaluation
    of research
  • Philosophy of science
  • Subject coverage
  • Academic collaborations
  • Now extension to linkages between Web sites
  • Sitations, cf citations

5
From chemical documentation to chemoinformatics
  • Chemical documentation is long established
  • Chemisches Journal started in 1778
  • Chemical Abstracts started in 1907
  • First computer-based information systems and
    services in Sixties
  • Chemical Titles in 1961
  • Morgan and Sussenguth algorithms in 1965
  • Recent emergence of chemoinformatics
  • M. Hann and R. Green (1999), Chemoinformatics - a
    new name for an old problem?, Curr. Opin. Chem.
    Biol., Vol. 3, pp. 379-383.

6
Chemoinformatics definitions
  • The use of information technology and management
    has become a critical part of the drug discovery
    process. Chemoinformatics is the mixing of those
    information resources to transform data into
    information and information into knowledge for
    the intended purpose of making better decisions
    faster in the area of drug lead identification
    and optimization F.K. Brown (1998),
    Chemoinformatics What is it and how does it
    impact drug discovery?, Ann. Reports Med. Chem.,
    Vol. 33, pp. 375-384
  • Take 1998 as the starting point for the
    bibliometric analyses
  • Many alternatives, e.g.
  • Chem(o)informatics is a generic term that
    encompasses the design, creation, organization,
    management, retrieval, analysis, dissemination,
    visualization and use of chemical information
    G Paris (August 1999 ACS meeting), quoted by
    W.A. Warr at http//www.warr.com/warrzone.htm
  • Chemoinformatics is the application of
    informatics methods to solve chemical problems
    J. Gasteiger and T. Engels (2003),
    Chemoinformatics a Textbook, Wiley-VCH.

7
Bibliometric studies in chemoinformatics
  • Onodera (2001)
  • Analysis of the subject coverage of Journal of
    Chemical Information and Computer Sciences
  • Redman et al. (2001)
  • Applications of the Cambridge Structural Database
  • Bishop et al. (2003)
  • Citations to Sheffield chemoinformatics research
  • Warr (2005)
  • Most cited papers in Journal of Chemical
    Information and Computer Sciences
  • Behrens and Luksch (2006)
  • Contents of the Inorganic Crystal Structure
    Database

8
Data sources for bibliometric research
  • Web of Knowledge (WOK)
  • Long established as the data source for
    bibliometric analyses
  • Recent addition of analysis tools (Analyse
    Results and Citation Reports)
  • Probably still the most comprehensive
  • New sources
  • Google
  • Google Scholar restricted to the scholarly
    literature
  • Scopus
  • New service from Elsevier, offering similar
    facilities to WOK

9
What shall we call it?
10
Google postings from http//www.molinspiration.com
/chemoinformatics.html
11
  • WOK search of the title, keyword and abstract
    fields for
  • chemoinformatics OR cheminformatics OR chemical
    informatics
  • This search retrieved 197 records for the period
    1998-2006 in 87 different sources
  • Of these, Journal of Chemical Information and
    Modeling (and its predecessor) is clearly the
    core journal

12
Most frequently occurring sources
13
Inter-journal relationships
  • L. Leydesdorff (2007), "Visualization of the
    citation impact environment of scientific
    journals", J. Amer. Soc. Inf. Sci. Technol., Vol.
    58, pp. 25-38.
  • Analysis of 2003-04 WOK data to identify journals
    that provide gt 1 of the citations to/from a
    given journal
  • For Journal of Chemical Information and Computer
    Sciences
  • 14 other to journals but only 5 other from
    journals
  • Multi-disciplinary nature of the field means that
    a wide range of sources are used

14
Author productivity I
  • Analysis of the authors of all articles published
    1998-2006 in
  • Bioinformatics, Combinatorial Chemistry and
    High-Throughput Screening and Journal of
    Biomolecular Screening
  • Journal of Chemical Information and Modeling,
    Journal of Computer-Aided Molecular Design,
    Molecular Diversity and QSAR Combinatorial
    Science
  • Journal of Molecular Graphics and Modelling,
    Journal of Molecular Modeling and SAR and QSAR in
    Environmental Research
  • Identification of the 20 most productive authors
    for each of these journals in 1998-2006

15
Author productivity II
  • Productive authors in the first group of journals
    did not publish frequently in the other two
    groups of journals, but fair degree of overlap
    between the journals in the other two groups
    (Molecular Diversity the least)
  • There is one author in the top-20 for four
    journals, two authors in the top-20 for three
    journals and 12 authors in the top-20 for two
    journals
  • Eight of the top-20 authors in Journal of
    Chemical Information and Computer Sciences are
    also top-20 authors in other journals
  • Main degrees of overlap between
  • Journal of Chemical Information and Modeling and
    Journal of Computer-Aided Molecular Design
  • QSAR Combinatorial Science and SAR and QSAR in
    Environmental Research

16
Overlap in top-20 authors
17
The core literature
  • A basic principle of bibliometrics is that
    citation corresponds to use, i.e., frequently
    cited papers are the most scientifically valuable
  • NB the many exceptions
  • Classic citations
  • Critical citations
  • Self-citation and close collaborators
  • Journal Impact Factor games
  • but generally a valid assumption
  • Analysis of citations to 4411 articles in seven
    chemoinformatics journals for 1998-2006 attracted
    a total of 35,228 citations

18
Most-cited papers I
19
Most-cited papers II
  • Certain types of article strongly represented in
    the top-30 positions
  • Software descriptions (9)
  • Reviews (4)
  • Drug-likeness (4)
  • Binding energies (4)
  • The first of these might be thought of as the
    fields classic citations (cf Journal of
    Chemical Information and Computer Sciences two
    most-cited articles)

20
Institutional productivity
  • The following institutions all provide at least
    1 of the papers in all of the seven journals
  • National Institute of Chemistry, Ljubljana,
    University of Erlangen-Nurnberg, University of
    Sheffield, University of Minnesota, Environmental
    Protection Agency, Russian Academy of Sciences,
    Liverpool John Moores University, Pennsylvania
    State University, Chinese Academy of Sciences and
    the University of Cambridge
  • Of top-50 institutions, only Tripos (no. 27) and
    Pfizer (no. 36) are for-profit organisations

21
National productivity the ten countries
providing the most articles in the seven journals
22
The Journal of Molecular Graphics and Modelling
  • The journal, then the Journal of Molecular
    Graphics, was started in 1983 and changed to its
    current name with Volume 15 in 1997
  • The journal is
  • devoted to the publication of papers on the uses
    of computers in theoretical investigations of
    molecular structure, function, interaction, and
    design. The scope of the journal includes all
    aspects of molecular modelling and computational
    chemistry, including, for instance, the study of
    molecular shape and properties, molecular
    simulations, protein and polymer engineering,
    drug design, materials design, structure-activity
    and structure-property relationships, database
    mining, and compound library design
  • See http//www.elsevier.com/wps/find/journaldescri
    ption.cws_home/525012/descriptiondescription

23
Bibliometric distributions I
  • Many bibliometric distributions are characterised
    by inverse, highly skewed frequency distributions
  • Zipfs Law for word occurrences
  • Lotkas Law for author productivity
  • Bradfords Law for subject spread in journals
  • Many other examples
  • Design of storage systems
  • Language acquisition
  • Income distribution (Pareto distribution)

24
Bibliometric distributions II
  • All of the bibliometric distributions can be
    represented by an equation of the form
  • where f(k) is the frequency of occurrence of
    some bibliometric item that is associated with
    each member of a population (k1,2...) that is
    producing examples of these items, and where C
    and ? are constants

25
Lotkas Law
  • The original formulation (A. Lotka (1926), The
    frequency distribution of scientific
    productivity, Journal of the Washington Academy
    of Sciences, Vol. 16, pp. 317-323) suggested ?2
    but wide range of values observed in practice,
    e.g., 1.78-3.78 (M.L. Pao (1986), An empirical
    examination of Lotka's Law, J. Amer. Soc. Inf.
    Sci., Vol. 37, pp. 26-33)
  • WOK lists 859 articles appearing in Vols. 2-24 of
    the journal
  • Reasonable Lotka plot with C0.834 and ? 3.02
  • Well know authors with gt 6 papers Arteca,
    Bajorath, Brasseur, Chatterjee, Ferrin, Flower,
    Gaber, Goodsell, Griffith, Maigret, Martin,
    Mornon, Nakamura, Olson, Richards, Tapia, Toma,
    Umeyama, Welsh, White, Willett

26
Lotka data for 859 articles published in Volumes
2-24 of the journal
27
Types of paper in Volumes 4 (1986), 14 (1996) and
24 (2006)
28
Most-cited papers
29
Inter-journal relatedness
  • The Journal Citation Reports database provides a
    further way of analysing the degree of
    co-citation between journals
  • Let A and B be journals publishing PA and PB
    articles let CAB be the number of times that A
    cites B and let CTA be the total number of
    citations in A. Then the relatedness of A to B
    is defined as
  • A similar calculation can be made of the
    relatedness of B to A

30
Relatedness values ( 106)
31
Countries providing at least 3 of the articles
in Volumes 2-24 of the journal
32
Conclusions
  • Most academics are interested in their personal
    citation counts and in the impact factors for
    their favourite journals
  • Bibliometrics has more general applications
  • Subject coverage
  • Key players and articles
  • Relationships between journals
  • Recent developments facilitate the carrying-out
    of such analyses
Write a Comment
User Comments (0)
About PowerShow.com