Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

A common challenge in drug design pertains to finding chemical modifications to a ligand that increases its affinity to the target protein. An underutilized advance is the increase in structural biology throughput, which has progressed from an artisanal endeavor to a monthly throughput of hundreds of different ligands against a protein in modern synchrotrons. However, the missing piece is a framework that turns high-throughput crystallography data into predictive models for ligand design. Here, we designed a simple machine learning approach that predicts protein-ligand affinity from experimental structures of diverse ligands against a single protein paired with biochemical measurements. Our key insight is using physics-based energy descriptors to represent protein-ligand complexes and a learning-to-rank approach that infers the relevant differences between binding modes. We ran a high-throughput crystallography campaign against the SARS-CoV-2 main protease (MPro), obtaining parallel measurements of over 200 protein-ligand complexes and their binding activities. This allows us to design one-step library syntheses which improved the potency of two distinct micromolar hits by over 10-fold, arriving at a noncovalent and nonpeptidomimetic inhibitor with 120 nM antiviral efficacy. Crucially, our approach successfully extends ligands to unexplored regions of the binding pocket, executing large and fruitful moves in chemical space with simple chemistry.

Original publication




Journal article


Proc Natl Acad Sci U S A

Publication Date





crystallography, drug design, machine learning, Humans, Ligands, COVID-19, SARS-CoV-2, Antiviral Agents, Biology