Title: A bibliometric analysis of chemoinformatics
1A bibliometric analysis of chemoinformatics
- Presented at the 25th Anniversary Meeting of the
Molecular Graphics and Modelling Society, School
of Oriental and African Studies, London 13th
March 2007 - Peter Willett, University of Sheffield, UK
2Overview of talk
- Bibliometrics
- Chemoinformatics
- Growth of the subject
- Subject coverage
- Author productivity
- The Journal of Molecular Graphics (and Modelling)
3Bibliometrics what is it?
- Bibliometrics is
- The application of mathematical and statistical
methods to books and other media (A. Pritchard
(1969), Statistical bibliography or
bibliometrics?, J. Docum., Vol. 25, pp. 348-349) - The study, or measurement, of texts and
information (Wikipedia) - See also
- Webometrics
- the study of the quantitative aspects of the
construction and use of information resources,
structures and technologies on the Web drawing on
bibliometric and informetric approaches" (L.
Björneborn and P. Ingwersen (2004), Toward a
basic framework for webometrics. J. Amer. Soc.
Inf. Sci. Technol., Vol. 55, pp. 1216-1227) - Cybermetrics, informetrics, scientometrics
4Bibliometrics subjects of study
- Bibliometric distributions
- Highly skewed frequency distributions (Bradford,
Lotka, Zipf) and their implications - Citation analysis
- Analysis of individuals, institutions and
journals - Use as performance indicators for the evaluation
of research - Philosophy of science
- Subject coverage
- Academic collaborations
- Now extension to linkages between Web sites
- Sitations, cf citations
5From chemical documentation to chemoinformatics
- Chemical documentation is long established
- Chemisches Journal started in 1778
- Chemical Abstracts started in 1907
- First computer-based information systems and
services in Sixties - Chemical Titles in 1961
- Morgan and Sussenguth algorithms in 1965
- Recent emergence of chemoinformatics
- M. Hann and R. Green (1999), Chemoinformatics - a
new name for an old problem?, Curr. Opin. Chem.
Biol., Vol. 3, pp. 379-383.
6Chemoinformatics definitions
- The use of information technology and management
has become a critical part of the drug discovery
process. Chemoinformatics is the mixing of those
information resources to transform data into
information and information into knowledge for
the intended purpose of making better decisions
faster in the area of drug lead identification
and optimization F.K. Brown (1998),
Chemoinformatics What is it and how does it
impact drug discovery?, Ann. Reports Med. Chem.,
Vol. 33, pp. 375-384 - Take 1998 as the starting point for the
bibliometric analyses - Many alternatives, e.g.
- Chem(o)informatics is a generic term that
encompasses the design, creation, organization,
management, retrieval, analysis, dissemination,
visualization and use of chemical information
G Paris (August 1999 ACS meeting), quoted by
W.A. Warr at http//www.warr.com/warrzone.htm - Chemoinformatics is the application of
informatics methods to solve chemical problems
J. Gasteiger and T. Engels (2003),
Chemoinformatics a Textbook, Wiley-VCH.
7Bibliometric studies in chemoinformatics
- Onodera (2001)
- Analysis of the subject coverage of Journal of
Chemical Information and Computer Sciences - Redman et al. (2001)
- Applications of the Cambridge Structural Database
- Bishop et al. (2003)
- Citations to Sheffield chemoinformatics research
- Warr (2005)
- Most cited papers in Journal of Chemical
Information and Computer Sciences - Behrens and Luksch (2006)
- Contents of the Inorganic Crystal Structure
Database
8Data sources for bibliometric research
- Web of Knowledge (WOK)
- Long established as the data source for
bibliometric analyses - Recent addition of analysis tools (Analyse
Results and Citation Reports) - Probably still the most comprehensive
- New sources
- Google
- Google Scholar restricted to the scholarly
literature - Scopus
- New service from Elsevier, offering similar
facilities to WOK
9What shall we call it?
10Google postings from http//www.molinspiration.com
/chemoinformatics.html
11- WOK search of the title, keyword and abstract
fields for - chemoinformatics OR cheminformatics OR chemical
informatics - This search retrieved 197 records for the period
1998-2006 in 87 different sources - Of these, Journal of Chemical Information and
Modeling (and its predecessor) is clearly the
core journal
12Most frequently occurring sources
13Inter-journal relationships
- L. Leydesdorff (2007), "Visualization of the
citation impact environment of scientific
journals", J. Amer. Soc. Inf. Sci. Technol., Vol.
58, pp. 25-38. - Analysis of 2003-04 WOK data to identify journals
that provide gt 1 of the citations to/from a
given journal - For Journal of Chemical Information and Computer
Sciences - 14 other to journals but only 5 other from
journals - Multi-disciplinary nature of the field means that
a wide range of sources are used
14Author productivity I
- Analysis of the authors of all articles published
1998-2006 in - Bioinformatics, Combinatorial Chemistry and
High-Throughput Screening and Journal of
Biomolecular Screening - Journal of Chemical Information and Modeling,
Journal of Computer-Aided Molecular Design,
Molecular Diversity and QSAR Combinatorial
Science - Journal of Molecular Graphics and Modelling,
Journal of Molecular Modeling and SAR and QSAR in
Environmental Research - Identification of the 20 most productive authors
for each of these journals in 1998-2006
15Author productivity II
- Productive authors in the first group of journals
did not publish frequently in the other two
groups of journals, but fair degree of overlap
between the journals in the other two groups
(Molecular Diversity the least) - There is one author in the top-20 for four
journals, two authors in the top-20 for three
journals and 12 authors in the top-20 for two
journals - Eight of the top-20 authors in Journal of
Chemical Information and Computer Sciences are
also top-20 authors in other journals - Main degrees of overlap between
- Journal of Chemical Information and Modeling and
Journal of Computer-Aided Molecular Design - QSAR Combinatorial Science and SAR and QSAR in
Environmental Research
16Overlap in top-20 authors
17The core literature
- A basic principle of bibliometrics is that
citation corresponds to use, i.e., frequently
cited papers are the most scientifically valuable - NB the many exceptions
- Classic citations
- Critical citations
- Self-citation and close collaborators
- Journal Impact Factor games
- but generally a valid assumption
- Analysis of citations to 4411 articles in seven
chemoinformatics journals for 1998-2006 attracted
a total of 35,228 citations
18Most-cited papers I
19Most-cited papers II
- Certain types of article strongly represented in
the top-30 positions - Software descriptions (9)
- Reviews (4)
- Drug-likeness (4)
- Binding energies (4)
- The first of these might be thought of as the
fields classic citations (cf Journal of
Chemical Information and Computer Sciences two
most-cited articles)
20Institutional productivity
- The following institutions all provide at least
1 of the papers in all of the seven journals - National Institute of Chemistry, Ljubljana,
University of Erlangen-Nurnberg, University of
Sheffield, University of Minnesota, Environmental
Protection Agency, Russian Academy of Sciences,
Liverpool John Moores University, Pennsylvania
State University, Chinese Academy of Sciences and
the University of Cambridge - Of top-50 institutions, only Tripos (no. 27) and
Pfizer (no. 36) are for-profit organisations
21National productivity the ten countries
providing the most articles in the seven journals
22The Journal of Molecular Graphics and Modelling
- The journal, then the Journal of Molecular
Graphics, was started in 1983 and changed to its
current name with Volume 15 in 1997 - The journal is
- devoted to the publication of papers on the uses
of computers in theoretical investigations of
molecular structure, function, interaction, and
design. The scope of the journal includes all
aspects of molecular modelling and computational
chemistry, including, for instance, the study of
molecular shape and properties, molecular
simulations, protein and polymer engineering,
drug design, materials design, structure-activity
and structure-property relationships, database
mining, and compound library design - See http//www.elsevier.com/wps/find/journaldescri
ption.cws_home/525012/descriptiondescription
23Bibliometric distributions I
- Many bibliometric distributions are characterised
by inverse, highly skewed frequency distributions - Zipfs Law for word occurrences
- Lotkas Law for author productivity
- Bradfords Law for subject spread in journals
- Many other examples
- Design of storage systems
- Language acquisition
- Income distribution (Pareto distribution)
24Bibliometric distributions II
- All of the bibliometric distributions can be
represented by an equation of the form - where f(k) is the frequency of occurrence of
some bibliometric item that is associated with
each member of a population (k1,2...) that is
producing examples of these items, and where C
and ? are constants
25Lotkas Law
- The original formulation (A. Lotka (1926), The
frequency distribution of scientific
productivity, Journal of the Washington Academy
of Sciences, Vol. 16, pp. 317-323) suggested ?2
but wide range of values observed in practice,
e.g., 1.78-3.78 (M.L. Pao (1986), An empirical
examination of Lotka's Law, J. Amer. Soc. Inf.
Sci., Vol. 37, pp. 26-33) - WOK lists 859 articles appearing in Vols. 2-24 of
the journal - Reasonable Lotka plot with C0.834 and ? 3.02
- Well know authors with gt 6 papers Arteca,
Bajorath, Brasseur, Chatterjee, Ferrin, Flower,
Gaber, Goodsell, Griffith, Maigret, Martin,
Mornon, Nakamura, Olson, Richards, Tapia, Toma,
Umeyama, Welsh, White, Willett
26Lotka data for 859 articles published in Volumes
2-24 of the journal
27Types of paper in Volumes 4 (1986), 14 (1996) and
24 (2006)
28Most-cited papers
29Inter-journal relatedness
- The Journal Citation Reports database provides a
further way of analysing the degree of
co-citation between journals - Let A and B be journals publishing PA and PB
articles let CAB be the number of times that A
cites B and let CTA be the total number of
citations in A. Then the relatedness of A to B
is defined as - A similar calculation can be made of the
relatedness of B to A
30Relatedness values ( 106)
31Countries providing at least 3 of the articles
in Volumes 2-24 of the journal
32Conclusions
- Most academics are interested in their personal
citation counts and in the impact factors for
their favourite journals - Bibliometrics has more general applications
- Subject coverage
- Key players and articles
- Relationships between journals
- Recent developments facilitate the carrying-out
of such analyses