Communications
analysis adopts two approaches to assess the strengths and
weaknesses of the proposed shape similarity method in
identifying compounds of interest.
1
2
3
We begin with an analysis of three data sets; (i) the initial de
novo design set (323 computer-generated molecules), (ii) the
thirty top-ranked compounds in terms of global fractal
dimensionality distance (GFD distance), and (iii) the thirty top-
ranked compounds according to their topological pharmaco-
phore similarity (CATS distance) to (À )-englerin A.[15,19] Set (iii)
was included to compare the GFD ranking approach with the
CATS approach described previously.[13] As a first approach, we
extracted the molecular scaffolds (’Murcko scaffolds’)[20] of these
compounds and analyzed their scaffold diversity in terms of the
pairwise Jaccard-Tanimoto coefficient (Tc; with values in the
interval [0,1]) based on Morgan structural fingerprints (radius=
2; equivalent to ECFP4[16]). The 323 initial de novo designs
consisted of 152 unique scaffolds (47%) with high scaffold
diversity (Tc=0.18; lower values indicate greater diversity). The
30 top-ranked molecules according to GFD distance contained
24 unique (80%) and diverse (Tc=0.17) scaffolds, whereas the
30 top-ranked compounds by CATS distance comprised 19
unique scaffolds (63%) with slightly lower diversity (Tc=0.24).
Only two scaffolds were present in both top-ranking sets
(Supplementary Information).
Second, we employed an experimentally-validated target-
prediction software developed in-house (self-organizing map-
based prediction of drug equivalence relationships, SPiDER)[21,22] to
provide an estimate of the likelihood of a given compound
being active against the target family ‘Transient Receptor
Potential Ion Channel’. The top 30 compounds retrieved by
screening with the GFD, USR, SHAEP, and ECFP4 methods were
analyzed to determine their predicted activity (number of
compounds with an annotated p<0.05) for the target family,
and the proportion and diversity of the unique molecular
scaffolds for the predicted active compounds. Out of 30 top-
ranked designs, GFD retrieved nine compounds predicted as
active, each with a unique scaffold and a high scaffold diversity
(predicted actives=9, proportion of unique scaffolds=1.0,
diversity of unique scaffolds (pairwise Tc)=0.22). The SHAEP
approach retrieved fewer predicted-active compounds, also all
having unique scaffolds (6, 1.0, 0.21). USR retrieved the same
number of predicted actives as the SHAEP approach, with fewer
unique, but highly diverse, retrieved scaffolds (6, 0.66, 0.12).
ECFP4 retrieved ten predicted actives, but with fewer, less
diverse, unique scaffolds (10, 0.8, 0.33). Given that topological
approaches were used in the processes of de novo library
generation and target prediction, it is corroborative that the
GFD approach, which treats sub-structural information implic-
itly, achieved a similar predicted-active retrieval performance
under evaluation with topological methods. We also performed
activity prediction and diversity analysis for the library in its
entirety (predicted actives=25%, scaffold diversity Tc=0.18). In
summary, SHAEP and USR had a slightly smaller proportion of
predicted actives in their top-ranked lists (20% for each) than
the entire set, with variation in number and diversity of
retrieved scaffolds. ECFP4 and GFD retrieved ten and nine
predicted actives (33% and 30% respectively), with GFD having
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
Figure 1. a) Molecular structure of the natural product (À )-englerin A, b) a
discrete representation of its Connolly surface (grey dots) for a generated
conformer, and c) an illustration of the behavior of the point-inclusion
sphere, and the calculation of the fractal dimension, D. For each point in the
surface representation, the relationship between the proportion of points (Ĉ
(δ)) within a distance δ inclusion sphere and that distance is stored. These
relationships are then combined, and described in terms of gradient,
providing an unbiased estimation of the molecule’s fractal dimensionality.
GFD can be calculated for any small- or macro-molecule, and allows for
shape-based screening based on a simple distance-from-template measure.
growth inhibition of cancer cell lines.[11,12] Utilizing (À )-englerin
A as a template, we previously generated small molecule
mimetics by ligand-based, chemical reaction-driven de novo
design.[13] By topological pharmacophore-based scoring and
manual refinement of the computational designs, we identified
natural product mimetics inhibiting the TRP melastatin 8
(TRPM8) calcium permeable cation channel, also inhibited by
(À )-englerin A.[12,13] We here extend that preliminary original
study by introducing the fractal dimensionality pseudo-metric.
Given that the previously employed computational design
method (design of genuine structures, DOGS)[14] and the
pharmacophore similarity metric (chemically advanced template
search, CATS)[15] each rely on two-dimensional molecular
representations, we investigated the use of fractal dimension-
ality as an orthogonal similarity ranking approach, to take the
spatial disposition of molecules into account. The library of 903
in silico structures employed in our previous study resulted in a
set of 323 unique de novo designed small molecules, owing to
redundancy in the original set of suggested molecules.
We ranked these computer-generated designs according to
their Euclidean distance from (À )-englerin A in terms of their
global fractal dimensionality (GFD) (Supplementary Information,
Eq. (1)). To assess the potential of GFD as a shape-based
descriptor for this target case, we conducted a comparative,
retrospective, analysis of the chemical space retrieved by this
method, against gold-standard structural fingerprint (extended-
connectivity fingerprints, ECFP4),[16] and two open-source meth-
ods, an alignment-free (ultrafast shape recognition, USR),[17] and
an alignment-based shape (molecular overlay based on shape
and electrostatic potential, SHaEP)[18] approach. Given that we
lack a ground-truth in this case, i.e. experimental activity data
for each molecule in our compound library, our retrospective
ChemMedChem 2020, 15, 1–6
2
© 2020 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim
��
These are not the final page numbers!