Introducing Certified DatasetsReviewed against basic checks, certified datasets are more visible on Polaris.

This dataset has not yet been certified by approved reviewers. It may contain issues related to data completeness and quality.

Dataset

polaris/molprop-leadlike-250k-v2

Leadlike molecule properties computed for ZINC22 250K dataset. Those molecular properties are used to examine the usefulness of any pretrained models. Especially, any model for generation purpose should not fail on these tasks.

Created on: July 10, 2024Dataset size: 19 MBNumber of datapoints: 249,999
Public

Tags

Representation
Molecular Properties

Modalities

MOLECULE

Details

README

molprop

Background

Molecular representations are crucial for understanding molecular structure, predicting properties, QSAR studies, toxicology and chemical modeling and other aspects in drug discovery tasks. Therefore, benchmarks for molecular representations are critical tools that drive progress in the field of computational chemistry and drug design. In recent years, many large models have been trained for learning molecular representation. The aim is to evaluate large pretrained models are capable of predicting various “easy-to-compute” molecular properties.

Description of readout

The computed properties are molecular weight, fraction of sp3 carbon atoms (fsp3), number of rotatable bonds, topological polar surface area, computed logP, formal charge, number of charged atoms, refractivity and number of aromatic rings. These properties are widely used in molecule design and molecule prioritization.

Number of molecules after curation: 250000

Data resource

ZINC is a widely utilized public access database and tool set, playing a crucial role in various applications including virtual screening, ligand discovery, pharmacophore screens, benchmarking, and force field development. The MolProp250Kleadlike dataset consists of 250,000 leadlike compounds randomly selected from ZINC22.

Reference: https://pubs.acs.org/doi/10.1021/acs.jcim.2c01253

Raw data: https://cartblanche22.docking.org

Data curation

To maintain consistency with other benchmarks in the Polaris Hub, a thorough data curation process is carried out to ensure the accuracy of molecular presentations.

The full curation and creation process is documented here.

Related benchmarks

Note: It's recommanded to evaluate your methods agaisnt all the benchmarks related to this dataset.

User Attributes

These are custom, user-defined attributes that are not required by the Polaris data model.

AttributeValue
year2022