Title: F.Consolaro1, P.Gramatica1, H.Walter2 and R.Altenburger2
1RANKING OF EEC PRIORITY LIST 1 FOR STRUCTURAL
SIMILARITY
AND MODELLING OF ALGAL TOXICITY
F.Consolaro1, P.Gramatica1, H.Walter2 and
R.Altenburger2 1QSAR Research Unit - DBSF -
University of Insubria - VARESE - ITALY 2UFZ
Centre for Environmental Research - LEIPZIG -
GERMANY e-mail fedec_at_mailserver.unimi.it Web
http//fisio.dipbsf.uninsubria.it/dbsf/qsar/QSAR.h
tml
INTRODUCTION Environmental exposure situations
are often characterized by a multitude of
heterogeneous chemicals with different mechanisms
of action and type of effect. The EEC priority
List 1 (Council Directive 76/464/EEC) consists of
heterogeneous environmental chemicals with mostly
unknown or unspecific modes of action, so it was
used to select components for mixture experiments
in the EEC PREDICT (Prediction and Assessment of
the Aquatic Toxicity of Mixtures of Chemicals)
project. A list of 202 compounds was studied for
structural similarity to identify the most
representative and dissimilar chemicals and to
find an objective method to group them on the
basis of their structural aspects. These
chemicals have been then tested for their algal
toxicity and the experimental results have been
modelled by the already cited molecular
descriptors. The comparison with analogous models
obtained on congeneric environmental chemicals
will be discussed.
STRUCTURAL DESCRIPTION OF COMPOUNDS Molecular
descriptors represent the way chemical
information contained in the molecular structure
is transformed and coded. Among the theoretical
descriptors, the best known, obtained simply from
the knowledge of the formula, are molecular
weight and count descriptors (1D-descriptors,
i. e. counting of bonds, atoms of different kind,
presence or counting of functional groups and
fragments, etc.). Graph-invariant descriptors
(2D-descriptors, including both topological and
information indices), are obtained from the
knowledge of the molecular topology. WHIM
molecular descriptors 1 contain information
about the whole 3D-molecular structure in terms
of size, symmetry and atom distribution. All
these indices are calculated from the
(x,y,z)-coordinates of a three-dimensional
structure of a molecule, usually from a spatial
conformation of minimum energy 37
non-directional (or global) and 66 directional
WHIM descriptors are obtained. A complete set of
about two hundred molecular descriptors has been
obtained 2. 1 Todeschini R. and Gramatica P.
Quant.Struct.-Act.Relat. 1997, 16, 113-119 2
Todeschini R. and Consonni V. - DRAGON - Software
for the calculation of the molecular
descriptors., Talete srl, Milan (Italy) 2000.
Download http//www.disat.unimib.it/chm.
CHEMOMETRIC METHODS Several chemometric analyses
have been applied to the compounds (represented
by molecular descriptors) to group the more
similar ones, in accordance with a multivariate
structural approach, and with the final aim to
highlight the structurally most dissimilar
compounds. The analyses performed
are Hierarchical Cluster Analysis hierarchical
clustering was performed with the aim of finding
clusters of the studied compounds in high
dimensional space, using molecular descriptors as
variables. Different distance metrics (Euclidean,
Manhattan, Pearson) and different linkages
(Complete, average, single, etc.) were used and
compared to find the best way to cluster these
compounds. Principal Component Analysis (PCA)
this analysis was used to calculate just a few
components from a large number of variables.
These components allow the highlighting of the
distribution of the compounds according to
structure, and find the similarity between
compounds assigned to the same cluster. Kohonen
Maps this is an additional way of mapping
similar compounds by using the so-called
self-organized topological feature maps, which
are maps that preserve the topology of a
multidimensional representation within a toroidal
two-dimensional representation. The position of
the compounds in this map shows the similarity
level of the structure of the EEC List 1
compounds.
These different chemometric approaches have shown
that the structurally most dissimilar compounds
are N. Substance Chemical
Class 1 atrazine
Triazine 2 biphenyl
Aromate 3 chloralhydrat Chlorinated
aliphatics 4 2,4,5-trichlorophenol Benzene
derivative 5 fluoranthene
PAH 6 lindane
HCH 7 naphthalene
PAH 8 parathion Organophosphate 9
phoxime Organophosphate 10 tributyl
tin chloride Organotin 11 triphenyltin
chloride Organotin
REGRESSION MODELS QSAR models were developed by
Ordinary Least Square regression (OLS) method.
The selection of the best subset variables for
modelling the algal toxicity of the studied
compounds was done by a Genetic Algorithm
(GA-VSS) approach and all the calculations have
been performed by using the leave-one-out (LOO)
and leave-more-out (LMO) procedures and the
scrambling of the responses for the validation of
the models.
CONGENERIC COMPOUNDS (NITROBENZENES)
HETEROGENEOUS CONGENERIC COMPOUNDS
HETEROGENEOUS COMPOUNDS
R2 77 Q2LOO 69.7 Q2LMO
69.7 SDEP 0.709 SDEC 0.619
R2 78 Q2LOO 62.1 Q2LMO 61.7 SDEP
0.751 SDEC 0.573
R2 93.9 Q2LOO 91.8 Q2LMO 87.5 SDEP
0.342 SDEC 0.296
nO is the number of O atoms, IDDM is the mean
information content on the distance degree
magnitude while E1e is a directional 3D-WHIM
descriptor of atomic distribution weighted on the
electronegativity. Here are selected a
topological descriptor (IDDM) that probably
represents the heterogeneous compounds and a
3D-WHIM descriptor (E1e) that probably represents
the homogeneous compounds. The performances of
this model are satisfactory, considering that the
data set is composed by structurally different
compounds and that for many of them the mechanism
of action is unknown.
nOH is the number of OH groups, Sp is the sum of
polarizabilities and Ds is the 3D-WHIM
considering the global electrotopological
distribution. The information explained by these
descriptors are related to the electronic
distribution of the molecular atoms and are more
specific in respect to the mode of action than
the selected descriptors in the heterogeneous set
models. The quality of this model is very
satisfactory both in fitting and in prediction.
nO is the number of O atoms and IDE is the mean
information content on the distance equality. A
QSAR model has been obtained, with acceptable
fitting properties but without an adequate
predictive capability. This is probably due to
the presence of structurally dissimilar and with
unknown mechanism of action chemicals.
CONCLUSIONS The chemometric analyses here applied
have been turned up to be very useful in ranking
the studied chemicals in according to their
structural similarity or dissimilarity. In
modelling of structural heterogeneous compounds
with unknown mode of action, not very
satisfactory QSAR models have been obtained. The
role of specific parameters, such as directional
WHIMs, capable to describe particular molecular
features relevant for explaining the specific
mode of action, is always important in QSAR
models for congeneric chemicals. Increasing
heterogeneity increases the role of structural
and topological descriptors, accounting for
general molecular features, not related to
specific mode of action.