Nature Reviews Drug Discovery article:

Phenotypic drug discovery screens offer physiologically relevant disease-linked readouts and account for polypharmacology. However, they are more complex than target-based screens and difficult to conduct at scale. Reporting in Science, DeMeo et al. apply transcriptome datasets as a proxy for cell phenotypes and design a deep learning framework to prioritize compounds for phenotypic screens.

First, the investigators trained a deep learning model ‘DrugReflector’ to predict small-molecule modulators of cellular phenotypes. Training data comprised gene expression profiles induced by over 9,500 compounds in 52 cell lines, from the Broad Institute Connectivity Map. DrugReflector outperformed four other models that match gene signatures to compounds and could be applied to cell types not represented in the training data, indicating it is a robust and generalizable model.

For screening, the authors used primary human CD34+ haematopoietic stem and progenitor cells (HSPCs), which are clinically relevant for haematological disorders but challenging to source and expand ex vivo.

Single-cell RNA-seq and surface protein marker datasets were used to define the megakaryocyte and erythroid progenitors that arise from cultured HSPCs. The transcriptional signatures were then used as input for DrugReflector.

The model nominated 107 compounds as potential inducers of megakaryocytes. These compounds were tested experimentally on HSPCs, with cell state evaluated by flow cytometry based on the surface protein markers. The DrugReflector nominations resulted in a hit-rate of 19.6%, compared with only 1.1% for randomly selected compounds. The model was also applied to nominate compounds that induce erythroid lineages and achieved a 16% hit-rate, resulting in a 13–17-fold improvement in hit-rate across both assays.

Several megakaryocyte-inducing compounds bound the same target, revealing a novel mechanism to induce this phenotype. Another megakaryocyte inducer targeted multiple receptor tyrosine kinases; more selective compounds or CRISPR knockouts could not reproduce the effect, indicating that phenotypic modulation required polypharmacology.

DrugReflector could also prioritize compounds that induce cancer cells to revert to a normal phenotype, nominating inhibitors of disease-relevant signalling pathways.

To refine the gene expression signature fed into DrugReflector, an active learning step was added. Closed-loop active reinforcement learning using paired transcriptional and phenotypic signatures of 12 hit and 8 non-hit compounds from the megakaryocyte screen led to a doubling of the hit-rate in new screens.

The authors suggest that in future, predictive power could be improved by using a reference perturbation dataset that has greater biological relevance to the phenotypic screen.

Omics with active learning optimizes phenotypic screens