Small Molecule Guidelines

Crafted by an industry steering committee. Read the guidelines on
method evaluation, method comparison, and dataset curation.

The Nuances of Benchmarking

Recognizing the unique challenges of applying ML to small-molecule, predictive modeling tasks in drug discovery—such as complex, limited datasets and the need for interdisciplinary expertise—we formed an industry steering committee to develop comprehensive guidelines and resources for the community.

By pooling insights from decades of experience, our steering committee aims to set new standards for method evaluation (e.g., dataset splitting, evaluation metrics), method comparison (e.g., statistical tests), and data curation.

We started with a call to action. In our letter, we outline common pitfalls and challenges in benchmarking that contribute to a growing gap between ML innovations and their practical impact on drug discovery programs. We believe that an open-science, cross-industry, and interdisciplinary effort is a crucial first step toward addressing these challenges, and we invite other experts to join us.

Read The Letter

Correspondence in Nature Machine Intelligence

Meet the Steering Committee

Pat Walters

Jeremy Ash

Alan Cheng

Cas Wognum

Djork-Arné Clevert

Raquel Rodríguez-Pérez

Daniel Price

Ola Engkvist

Cheng Fang

Matteo Aldeghi

Pat Walters

Jeremy Ash

Alan Cheng

Cas Wognum

Djork-Arné Clevert

Raquel Rodríguez-Pérez

Daniel Price

Ola Engkvist

Cheng Fang

Matteo Aldeghi

Publication Timeline

This is what we’re starting with. Have some ideas? Let us know!

November 2024

Guideline Paper

Method Comparison

To contextualize the results of a new ML method, its performance is typically compared to the state of the art and baselines. This paper proposes guidelines for small-molecule, predictive modeling on how to do this comparison in a robust way such that you can expect your conclusions to generalize to similar datasets and real-world use cases.

Read the Paper

April 2025

Guideline Paper

Splitting Methods

To prevent models from merely memorizing training data—a problem known as overfitting — it's crucial to ensure that the similarity between training and test sets reflects real-world applications. This paper provides guidelines for small-molecule predictive modeling on how to measure generalization in a way that aligns with practical use cases.

July 2025

Guideline Paper

Data Curation

Large industrial datasets are rarely published due to competitive and intellectual property concerns. Therefore, drug discovery benchmarks often rely on public databases like ChEMBL. Curating datasets from these sources requires deep expertise in data generation processes and data modalities. To address this challenge, we propose guidelines for curating small-molecule datasets.

We Want Your Feedback

Have thoughts on the papers? Think we’re missing something? We’d love to collaborate with you on how we can improve the state of benchmarking for small molecules. Reach out to the steering committee using the button below.

Want updates on the guidelines?

We care about your data. Read our privacy policy.

The Nuances of Benchmarking

Read The Letter