ChemComm
COMMUNICATION
Discovery of SARS-CoV-2 main protease inhibitors
using a synthesis-directed de novo design model†
Cite this: Chem. Commun., 2021,
57, 5909
Aaron Morris,‡a William McCorkindale,‡b The COVID Moonshot Consortium,c
f
a
Nir Drayman,d John D. Chodera,e Savas- Tay,d Nir London
and Alpha A. Lee
*
Received 5th January 2021,
Accepted 4th May 2021
DOI: 10.1039/d1cc00050k
The SARS-CoV-2 main viral protease (Mpro) is an attractive target for and covalent inhibitors incur additional idiosyncratic toxicity risks.
antivirals given its distinctiveness from host proteases, essentiality We launched the COVID Moonshot consortium in March 2020,
in the viral life cycle and conservation across coronaviridae. We aiming to find oral antivirals against COVID-19 in an open-science,
launched the COVID Moonshot initiative to rapidly develop patent- patent-free manner.6
free antivirals with open science and open data. Here we report the
Here we report the prospective use of a simple model to
use of machine learning for de novo design, coupled with synthesis rapidly expand hits. Starting from 42 compounds with IC50
route prediction, in our campaign. We discover novel chemical within assay dynamic range (o100 mM) and 515 inactives, our
scaffolds active in biochemical and live virus assays, synthesized model designed 5 new compounds predicted to have higher
with model generated routes.
activity, together with predicted synthetic routes. All designs
were were chemically synthesized and experimentally tested,
Coronaviruses are a family of pathogens that is frequently asso- and 3 have measurable activity against Mpro. The top com-
ciated with serious and highly infectious human diseases, from the pound has comparable Mpro inhibition to the best in the
common cold to the SARS-CoV pandemic (2003, 774 deaths, 11% training set, but with a different scaffold, and is active against
fatality rate), MERS-CoV pandemic (2012, 858 deaths, 34% fatality the OC43 coronavirus in a live virus assay.
rate) and most recently the COVID-19 pandemic (ongoing pan-
Algorithmic de novo design aims to automatically generate
demic, 1.7 million deaths up to Dec 2020). The main protease compounds that are chemically diverse, synthetically accessible
(Mpro) is one of the best characterized drug targets for direct-acting and biologically active.7 Classic approaches apply heuristics to
antivirals.1,2 Mpro is essential for viral replication and its binding fragment and modify known active compounds, with the region
site is distinct from known human proteases, thus inhibitors are of chemical space explored and synthetic accessibility con-
unlikely to be toxic.3,4 Moreover, the high degree of conserva- strained by those rules.8,9,10 Recent machine learning
tion across different coronaviruses renders Mpro targeting a fruitful approaches explore chemical space in more abstract molecular
avenue towards pan-cornavirus antivirals.5 To date, most representation space,11,12 but this often comes at the expense of
reported Mpro inhibitors are peptidomimetics, covalent, or both.2 synthetic accessibility.13 Our approach builds on rule-based
Peptidomimetics are challenging to develop into oral therapeutics, fragmentation and molecule generation, but employs a method
that combines regression and classification amid noisy data,
and use of machine learning to predict synthesis routes. Our
model comprises two parts: compound prioritisation and
chemical space exploration.
a PostEra Inc, 2 Embarcadero Centre, San Franciso, CA 94111, USA.
E-mail: alpha.lee@postera.ai
b Department of Physics, University of Cambridge, CB3 0HE, UK
c The COVID Moonshot Consortium. Web: www.postera.ai/covid
d The Pritzker School for Molecular Engineering, The University of Chicago,
Chicago, IL, USA
e Computational and Systems Biology Program Sloan Kettering Institute,
Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
f Department of Organic Chemistry, The Weizmann Institute of Science, Rehovot,
76100, Israel
Our compound prioritisation model aims to predict whether
a designed compound is likely to be an improvement in activity
over the incumbent. However, as is typical in the hit-expansion
stage, bioactivity modelling is hindered by insufficient data
where the majority of compounds are inactive, and noisy
data as measurement variability increases for lower affinity
compounds. Thresholding the data and framing the problem
as classification of active/inactive would not allow us to
rank compounds based on predicted improvement over the
incumbent, yet the amount of measured bioactivity data
† Electronic supplementary information (ESI) available: Experimental and assay
details, and the full list of contributors in the COVID Moonshot Consortium. Our
training set, de novo design method and generated molecules are available on
‡ These authors contributed equally to this work.
This journal is © The Royal Society of Chemistry 2021
Chem. Commun., 2021, 57, 5909–5912 | 5909