Cross-domain information fusion for enhanced cell population delineation in single-cell spatial-omics data

Computational Methods Computational Methods
Alex K. Shalek Alex K. Shalek
Bokai Zhu Bokai Zhu

Zhu et al.▾ Zhu, B., Gao, S., Chen, S., Yeung, J., Bai, Y., Huang, A. Y., Yeo, Y. Y., Liao, G., Mao, S., Jiang, S., Rodij, S. J., Shalek, A. K., Nolan, G. P., Ma, Z.

bioRxiv

May, 2024

Abstract

Cell population delineation and identification is an essential step in single-cell and spatial-omics studies. Spatial-omics technologies can simultaneously measure information from three complementary domains related to this task: expression levels of a panel of molecular biomarkers at single-cell resolution, relative positions of cells, and images of tissue sections, but existing computational methods for performing this task on single-cell spatial-omics datasets often relinquish information from one or more domains. The additional reliance on the availability of “atlas” training or reference datasets limits cell type discovery to well-defined but limited cell population labels, thus posing major challenges for using these methods in practice. Successful integration of all three domains presents an opportunity for uncovering cell populations that are functionally stratified by their spatial contexts at cellular and tissue levels: the key motivation for employing spatial-omics technologies in the first place.

In this work, we introduce Cell Spatio- and Neighborhood-informed Annotation and Patterning (CellSNAP), a self-supervised computational method that learns a representation vector for each cell in tissue samples measured by spatial-omics technologies at the single-cell or finer resolution. The learned representation vector fuses information about the corresponding cell across all three aforementioned domains. By applying CellSNAP to datasets spanning both spatial proteomic and spatial transcriptomic modalities, and across different tissue types and disease settings, we show that CellSNAP markedly enhances de novo discovery of biologically relevant cell populations at fine granularity, beyond current approaches, by fully integrating cells’ molecular profiles with cellular neighborhood and tissue image information.