ASAP Discovery x OpenADMET CompetitionTake part in the first prospective benchmark on Polaris.

This dataset has been certified! Learn why this matters here.

Dataset

recursion/RxRx3-core

To accompany OpenPhenom-S/16, Recursion is releasing the RxRx3-core dataset, a challenge dataset in phenomics optimized for the research community. RxRx3-core includes labeled images of 735 genetic knockouts and 1,674 small-molecule perturbations drawn from the RxRx3 dataset, image embeddings computed with OpenPhenom-S/16, and associations between the included small molecules and genes. The dataset contains 6-channel Cell Painting images and associated embeddings from 222,601 wells but is less than 18Gb, making it incredibly accessible to the research community.

Created on: December 11, 2024Number of datapoints: 222,601
Public
V2

Status

Certified

This artifact has been reviewed in line with our Dataset 101 guidelines and was found to meet all criteria.

Learn more here.

Tags

phemomics
perturbation
gene
compound

Modalities

MOLECULE
IMAGE

Related benchmarks

No related benchmarks yet.

You're looking at a v2.0 dataset!

Our goal at Polaris is to build a universal format for ML-ready datasets in drug discovery. With our V2 implementation, we're drastically improving scalability, but there's still work to be done!

Details

README

Recursion logo

RxRx3-core

Microscopy image

To accompany OpenPhenom-S/16, Recursion is releasing the RxRx3-core dataset, a challenge dataset in phenomics optimized for the research community. RxRx3-core includes labeled images of 735 genetic knockouts and 1,674 small-molecule perturbations drawn from the RxRx3 dataset and image embeddings computed with OpenPhenom-S/16. The dataset contains 6-channel Cell Painting images and associated embeddings from 222,601 wells but is less than 18Gb, making it incredibly accessible to the research community.

Mapping the mechanisms by which drugs exert their actions is an important challenge in advancing the use of high-dimensional biological data like phenomics. We are excited to release the first dataset of this scale probing concentration-response along with a benchmark and model to enable the research community to rapidly advance this space.

Building Maps of Biology and Chemistry

At Recursion, we build maps of biology and chemistry to explore uncharted areas of disease biology, unravel its complexity, and industrialize drug discovery. Just as a map helps to navigate the physical world, our maps are designed to help us understand as much as we can about the connectedness of human biology so we can navigate the path to new medicines more efficiently.

Our maps are built using image-based high-dimensional data generated in-house. We conduct up to 2.2 million experiments every week in our highly automated labs, where we use deep learning models to embed high dimensional representations of billions of images of human cells that have been manipulated by CRISPR/Cas9-mediated gene knockouts, compounds, or other reagents. This allows us to create representations that can be compared and contrasted to predict trillions of relationships across biology and chemistry — even without physically testing all of the possible combinations. Recursion's Maps and associated applications help navigate complex biology and chemistry by revealing relationships across genes and chemical compounds.

Map Benchmarking

Test your maps with the rxrx-compound-gene-activity-benchmark benchmark!

User Attributes

These are custom, user-defined attributes that are not required by the Polaris data model.

AttributeValue
Wells222,601
Cell lineHUVEC
Gene perturbationCRISPR/Cas9-mediated gene knockouts
Treatment perturbationCompound at 8 concentrations each