Guidelines for Method ComparisonRead the first pre-print from the Small Molecule Steering Committee

This dataset has not yet been certified by approved reviewers. It may contain issues related to data completeness and quality.

Dataset

molecularml/moleculeace-chembl3979-ec50

Bioassay CHEMBL3979 of protein Nuclear hormone receptor subfamily 1 group C member 2 molecular machine learning with activity cliffs.

Created on: July 24, 2024Dataset size: 32 KBNumber of datapoints: 1,125
Public

Tags

Activity Cliff
QSAR

Modalities

MOLECULE

Details

README

Background

This dataset comprises collected and curated bioactivity data for the target Nuclear hormone receptor subfamily 1 group C member 2from ChEMBL assay CHEMBL3979, utilized to evaluate the performance of various machine learning algorithms on activity cliffs. We employed classical machine learning methods combined with common molecular descriptors, as well as neural networks based on unstructured molecular data such as molecular graphs or SMILES strings.

Activity cliffs are molecules with small differences in structure but large differences in potency. Activity cliffs play an important role in drug discovery, but the bioactivity of activity cliff compounds are notoriously difficult to predict.

Description of readouts

  • exp_mean [nM]: Agonism [Half-Maximal Effective Concentration, EC50]
  • y: Negative of log transform of the bioactivity value.
  • split: Train-test split based on activity cliff.

Data resource: