Backgroud:
Kinases play a crucial role in cellular signalling, making them important targets for drug development. Dysregulation of kinases is frequently implicated in diseases like cancer, inflammation, and neurodegenerative disorders. Therefore, targeting kinases with specific drugs has emerged as a crucial strategy in modern drug discovery. Kinase-related task includes inhibition prediction, selectivity prediction, or kinase-ligand binding affinity prediction. In the early release version of Polaris, benchmarks were established for kinases such as EGFR, KIT, and RET, along with their respective mutations, as well as for LOK and SLK.
An example of Kinase screening (image from here):
Description of readout
- Readouts:
EGFR
, KIT
, RET
, LOK
, SLK
- Bioassay readout: Percentage of inhibition (%).
- Optimization objective: Higher potency (higher %inhibition).
Data resource:
PKIS2: A second chemogenomics set of kinase inhibitors from GSK, Takeda, and Pfizer was assembled as PKIS2. This set contained 645 inhibitors and included many additional chemotypes that were not represented in the original set.
Reference: https://www.ncbi.nlm.nih.gov/pubmed/28767711
Data curation
To maintain consistency with other benchmarks in the Polaris Hub, a thorough data curation process is carried out to ensure the accuracy of molecular presentations.
The full curation and creation process is documented here.
Disclaimer
Here are some additional details that may be of use when deciding whether or not to use this dataset.
Some advantages include:
- The assays were carried out by one group under a consistent set of conditions.
- The dataset contains only a small number of molecules with unspecified stereocenters.
- There are no duplicate structures in the dataset.
- The data is based on well-defined biochemical endpoint.
Some limitations to consider:
- The assay endpoint is % inhibition, which is less desirable than a dose-response but similar to what is commonly encountered with HTS data.
- The dataset is relatively small, containing only 640 compounds. This, combined with the fact that the data is highly clustered, will make it difficult to see statistically significant differences between methods. This will be highly acute when the splits are based on clusters or scaffolds.
- The compounds are highly clustered with the largest cluster containing 50 compounds. The compounds are highly clustered with the largest cluster containing 50 compounds.