Runs N' Poses Dataset
A dataset with 2,600 high-resolution protein-ligand systems released after 30 September 2021, the training cutoff used by AlphaFold 3, Protenix, Chai-1, and Boltz-1.
| No. of | Total | 
|---|
| Systems | 2600 | 
| Ligands | 3,047 | 
| Ligands (incl. ions and artifacts) | 4,282 | 
| Multi-ligand systems | 401 | 
| Multi-protein systems | 790 | 
See the Github repository for more details.
Benchmark
This dataset was primarily created to accompany the plinder-org/runs-n-poses benchmark.
Format
| Column | Data Type | Description | 
|---|
| ligand_ccd_code | str | The ID of the ligand in the Chemical Component Dictionary | 
| ligand_pose | Mol | The 3D structure of the ligand | 
| ligand_smiles | str | The 2D graph structure of the ligand | 
| plinder_id | str | The ID of the system in the Plinder dataset | 
| plinder_metadata | dict | Additional metadata of the system from the Plinder dataset | 
| receptor | AtomArray | The 3D protein structure of the receptor, without the ligand | 
| sequences | dict | The canonical sequence (SEQRES) of all the receptor's chains | 
| similarity_to_train | float | The similarity to the closest system released published before the training cutoff, using sucos_shape_pocket_qcov | 
| system_sequences_and_smiles | dict | The description of the complete system, including the canonical sequence of the receptor's chains and the 2D graph structure of all bound ligands | 
| system_structures_and_poses | AtomArray | The complete 3D structure of the bound system, including both receptor and ligand | 
The source data for this ML-ready dataset can be found on Zenodo:

All systems in this dataset are derived from PLINDER's ingestion pipeline.
Citation
If you use this dataset in your research, please cite Runs N' Poses:
Škrinjar, P., Eberhardt, J., Durairaj, J., & Schwede, T.
"Have protein-ligand co-folding methods moved beyond memorisation?"
bioRxiv (2025): 2025-02