ACS Infectious Diseases
Article
inspected and compared to its relevant paper. Using this
quality control process, any erroneous data could be noted or
discarded from our study. Following the comprehensive
manual inspection process, we found 72 data sets pertaining
to drug-sensitive S. aureus, 46 data sets pertaining to drug-
resistant S. aureus, and 21 data sets that were classified as
unknown. The drug-resistant sets contained assay results
primarily for MRSA but also included assay results against
fluoroquinolone-resistant or linezolid-resistant S. aureus, as
well. Unfortunately, one of the CSV files from the drug-
resistant set, AID 548647, had erroneous data and was,
therefore, discarded. The data within the CSV files and SDF’s
for each of the drug-resistant sets were collated into a
spreadsheet in Discovery Studio 4.0 (BIOVIA, Inc., San Diego,
CA). For consistency, the μM sets were converted to μg/mL.
We then deleted duplicate compounds using a custom script in
Pipeline Pilot 9.1 (BIOVIA, Inc., San Diego, CA) and
compounds with a molecular weight >850 g/mol.
These descriptors were then combined in different ways, with
different weights applied to the sets of different descriptor
combinations, until the most accurate model was produced,
according to 5-fold internal cross-validation.
identify and discard any reactive compounds or promiscuous
inhibitors (e.g., Michael acceptors, rhodanines, etc.). Also
removed were compounds that were visually deemed too
similar to the training set to be novel (e.g., tetracyclines or
almost all fluoroquinolones), or compounds with chemotypes
that were not of general interest (i.e., peptides or fatty
23,60
acids).
49 of the top 50 compounds from MRSA_1a did
not pass inspection. Therefore, we also inspected the next 50
top scoring compounds from this model. Unsurprisingly, the
highest-ranking compound from MRSA_1a was a fluoroqui-
nolone which, instead of dismissing, we included in this
model’s predictions to be experimentally validated. We
selected an additional top 9 compounds from MRSA_1a, as
well as the top 5 ranking compounds from MRSA_1b that
passed inspection. All candidate compounds were purchased
and assessed by LC/MS for ≥95% purity and the expected
parent ion in the mass spectrum.
Training and Test Sets. In total, our full training set from
PubChem contained 1633 compounds, of which 1043
compounds (63.8%) were designated as active (MIC ≤ 10
μg/mL). This full training set (MRSA_1a) encompassed a
small yet fairly diverse portion of chemical space, which
included known antibacterial scaffolds such as tetracyclines,
fluoroquinolones, and β-lactams. However, it also included
rhodanines (which are known to be promiscuous Pan Assay
Drug Susceptibility Assays. MICs were performed in 96-
well microtiter plates. A DMSO stock solution of each
compound was added to the first column and serially diluted
across the columns of the plate. The last column of the plate
contained no drug and served as a no-drug control. Overnight
cultures of the bacterium being tested (MRSA ATCC 43300,
MSSA ATCC 25923, or the aforementioned VRSA/VISA
23,60
Interference compounds, called “PAINS”
). We hypothe-
sized that removing (or pruning) these problematic com-
pounds as well as known antibacterial chemotypes from the
training set would help teach the Bayesian to further focus on
new chemical scaffolds, which should increase the likelihood
26
that they display novel mechanisms of action. It is important
to note, however, that our M. tuberculosis Bayesian models have
routinely found hits which differ significantly from the model
testing set (pairwise Tanimoto similarity <0.7). Utilizing this
strategy, we manually pruned the tetracyclines, fluoroquino-
lones, rhodanines, and β-lactams from the actives subset of
MRSA_1a to produce a new training set (MRSA_1b) that
contained 1247 compounds, of which 657 compounds were
classified as active (52.7%).
3
strains) were diluted 1000 fold (2 × 10 cells), and 100 μL was
used as an inoculum in each well. MICs were determined by
visual inspection for a pellet after 18 h incubation at 37 °C.
Compounds were tested for cytotoxicity against Vero cells
using the CellTiter 96 AQueous One Solution kit (Promega).
4
Vero cells were seeded in 96-well plates at a density of 2 × 10
cells per well, and the plates were incubated for 4 h at 37 °C to
allow attachments of the Vero cells. Compounds were then
added to the wells starting from a final concentration of 50 μg/
mL and making 12 1:2 dilutions. Cells were incubated for 72 h
at 37 °C. Then 20 μL of freshly prepared MTS:PMS reagents
was added to each well. The plates were incubated for 2 h and
then read at an absorbance of 490 nm.
Another training set that we used as a reference was based
on the Broad Institute’s assay results against methicillin-
sensitive S. aureus (Broad_MSSA). This training set has 10 934
compounds, of which 193 compounds (1.77%) were defined as
“active” according to the Z-factor threshold selected by the
researchers at the Broad Institute (Tali Mazor, assay names:
known antibacterials, such as levofloxacin, gatifloxacin, β-
lactams, tetracyclines, and fluoroquinolones.
Mouse Pharmacokinetic Study. Animals and ethics
assurance: Animal studies were carried out in accordance
with the guide for the care and use of Laboratory Animals of
the National Institutes of Health, with approval from the
Institutional Animal Care and Use Committee (IACUC) of
the New Jersey Medical School, Rutgers University, Newark.
All animals were maintained under specific pathogen-free
conditions and fed water and chow ad libitum, and all efforts
were made to minimize suffering or discomfort. Two female
CD-1 mice received a single dose of experimental compound
administered orally at 25 mg/kg in 20% DMA/80% PEG300,
These data sets were used as independent training sets to
create two different, new drug-resistant S. aureus machine
learning models and the reference MSSA model in Pipeline
Pilot 9.1. These Bayesian models utilized nine different
descriptors: AlogP, molecular weight, number of rings, number
of aromatic rings, number of rotatable bonds, number of
hydrogen bond donors, number of hydrogen bond acceptors,
molecular fractional polar surface area, and molecular function
class fingerprints of maximum diameter 6 (FCFP_6, character-
izing the 2D substructures up to and including 6 rings/zones/
and blood samples were collected in K EDTA coated tubes
2
predose, 0.5, 1, 3, and 5 h postdose. Blood was kept on ice and
centrifuged to recover plasma, which was stored at −80 °C
until analyzed by HPLC coupled to tandem mass spectrometry
(LC-MS/MS).
6
1,62
dimensions of topology).
Together, these descriptors
define the physiochemical properties of each compound as a
whole and the 2D substructures of different regions within it.
LC/MS-MS analytical methods: LC/MS-MS quantitative
analysis for all molecules was performed on a Sciex Applied
H
ACS Infect. Dis. XXXX, XXX, XXX−XXX