The Antiviral Competition ResultsSee the results of the Polaris competition, organized by ASAP Discovery and OpenADMET.

Benchmark

polaris/molprop-250k-r-1

A multitask benchmark designed to predict nine molecular properties for 250,000 compounds sourced from ZINC15, with a focus on molecular representation.

Created on: December 08, 2023Train size: 202,887Test size: 46,568
Public
Multi Task
Regression

Participants

Tags

representation
properties

Related dataset

Leaderboard

Test set
Task
#
NameContributors
mean_absolute_error
mean_squared_error
r2
spearmanr
pearsonr
explained_var
References
1

molprop-250k-r-1_ecfp4_FCModel

17.111
508.870
0.864
0.922
0.930
0.864
No references provided

Details

README

molprop

Background

Molecular representations are crucial for understanding molecular structure, predicting properties, QSAR studies, toxicology and chemical modeling and other aspects in drug discovery tasks. Therefore, benchmarks for molecular representations are critical tools that drive progress in the field of computational chemistry and drug design. In recent years, many large models have been trained for learning molecular representation. The aim is to evaluate large pretrained models are capable of predicting various “easy-to-compute” molecular properties.

Benchmarking

The objective is to comprehend the proficiency of a model in predicting these 'easy' properties, gauging its effectiveness. Ideally, any pre-trained models should, at the very least, demonstrate good performance in those tasks before applying them to the downstream tasks.

Description of readout

The computed properties are molecular weight, fraction of sp3 carbon atoms (fsp3), number of rotatable bonds, topological polar surface area, computed logP, formal charge, number of charged atoms, refractivity and number of aromatic rings. These properties are widely used in molecule design and molecule prioritization. Number of data points: train: 202887, test: 46568

Data resource

Reference: https://pubs.acs.org/doi/10.1021/acs.jcim.5b00559

Raw data: https://raw.githubusercontent.com/aspuru-guzik-group/chemical_vae/master/models/zinc_properties/250k_randm_zinc_drugs_clean_3.csv

Train/test split

The objective is to comprehend the proficiency of a model in predicting these 'easy' properties. In order to select the predictive models which is able to generalize to new chemical space, a scaffold split is used to generate trian/test sets.

Distribution of the train/test in the chemical space image

Related links

The full curation and creation process is documented here.

Related benchmarks

molprop-250k-r-1 | Polaris