Introducing Certified DatasetsReviewed against basic checks, certified datasets are more visible on Polaris.

This dataset has been certified!

Dataset

polaris/drewry2017-pkis2-subset-v2

A subset of PKIS 2 dataset only including EGFR, RET, KIT, LOK and SLK kinases. Profile of kinases PKIS2 which contains 640 small molecule for 468 kinases.

Created on: July 10, 2024Dataset size: 54 KBNumber of datapoints: 640
Public
Certified

Details

README

Backgroud:

Kinases play a crucial role in cellular signalling, making them important targets for drug development. Dysregulation of kinases is frequently implicated in diseases like cancer, inflammation, and neurodegenerative disorders. Therefore, targeting kinases with specific drugs has emerged as a crucial strategy in modern drug discovery. Kinase-related task includes inhibition prediction, selectivity prediction, or kinase-ligand binding affinity prediction. In the early release version of Polaris, benchmarks were established for kinases such as EGFR, KIT, and RET, along with their respective mutations, as well as for LOK and SLK.

An example of Kinase screening (image from here): kinase

Description of readout

  • Readouts: EGFR, KIT, RET, LOK, SLK
  • Bioassay readout: Percentage of inhibition (%).
  • Optimization objective: Higher potency (higher %inhibition).

Data resource:

PKIS2: A second chemogenomics set of kinase inhibitors from GSK, Takeda, and Pfizer was assembled as PKIS2. This set contained 645 inhibitors and included many additional chemotypes that were not represented in the original set.

Reference: https://www.ncbi.nlm.nih.gov/pubmed/28767711

Data curation

To maintain consistency with other benchmarks in the Polaris Hub, a thorough data curation process is carried out to ensure the accuracy of molecular presentations.

The full curation and creation process is documented here.

Disclaimer

Here are some additional details that may be of use when deciding whether or not to use this dataset.

Some advantages include:

  • The assays were carried out by one group under a consistent set of conditions.
  • The dataset contains only a small number of molecules with unspecified stereocenters.
  • There are no duplicate structures in the dataset.
  • The data is based on well-defined biochemical endpoint.

Some limitations to consider:

  • The assay endpoint is % inhibition, which is less desirable than a dose-response but similar to what is commonly encountered with HTS data.
  • The dataset is relatively small, containing only 640 compounds. This, combined with the fact that the data is highly clustered, will make it difficult to see statistically significant differences between methods. This will be highly acute when the splits are based on clusters or scaffolds.
  • The compounds are highly clustered with the largest cluster containing 50 compounds. The compounds are highly clustered with the largest cluster containing 50 compounds.

User Attributes

These are custom, user-defined attributes that are not required by the Polaris data model.

AttributeValue
year2017