Inference of cell–cell communication from single-cell RNA sequencing data is a powerful technique to uncover intercellular communication pathways, yet existing methods perform this analysis at the level of the cell type or cluster, discarding single-cell-level information. Here we present Scriabin, a flexible and scalable framework for comparative analysis of cell–cell communication at single-cell resolution that is performed without cell aggregation or downsampling. We use multiple published atlas-scale datasets, genetic perturbation screens and direct experimental validation to show that Scriabin accurately recovers expected cell–cell communication edges and identifies communication networks that can be obscured by agglomerative methods. Additionally, we use spatial transcriptomic data to show that Scriabin can uncover spatial features of interaction from dissociated data alone. Finally, we demonstrate applications to longitudinal datasets to follow communication pathways operating between timepoints. Our approach represents a broadly applicable strategy to reveal the full structure of niche–phenotype relationships in health and disease.
High-throughput phenotypic screens leveraging biochemical perturbations, high-content readouts, and complex multicellular models could advance therapeutic discovery yet remain constrained by limitations of scale. To address this, we establish a method for compressing screens by pooling perturbations followed by computational deconvolution. Conducting controlled benchmarks with a highly bioactive small molecule library and a high-content imaging readout, we demonstrate increased efficiency for compressed experimental designs compared to conventional approaches. To prove generalizability, we apply compressed screening to examine transcriptional responses of patient-derived pancreatic cancer organoids to a library of tumor-microenvironment (TME)-nominated recombinant protein ligands. Using single-cell RNA-seq as a readout, we uncover reproducible phenotypic shifts induced by ligands that correlate with clinical features in larger datasets and are distinct from reference signatures available in public databases. In sum, our approach enables phenotypic screens that interrogate complex multicellular models with rich phenotypic readouts to advance translatable drug discovery as well as basic biology.
COVID-19, caused by SARS-CoV-2, can result in acute respiratory distress syndrome and multiple-organ failure, but little is known about its pathophysiology. Here, we generated single-cell atlases of 23 lung, 16 kidney, 16 liver and 19 heart COVID-19 autopsy donor tissue samples, and spatial atlases of 14 lung donors. Integrated computational analysis uncovered substantial remodeling in the lung epithelial, immune and stromal compartments, with evidence of multiple paths of failed tissue regeneration, including defective alveolar type 2 differentiation and expansion of fibroblasts and putative TP63+ intrapulmonary basal-like progenitor cells. Viral RNAs were enriched in mononuclear phagocytic and endothelial lung cells which induced specific host programs. Spatial analysis in lung distinguished inflammatory host responses in lung regions with and without viral RNA. Analysis of the other tissue atlases showed transcriptional alterations in multiple cell types in COVID-19 donor heart tissue, and mapped cell types and genes implicated with disease severity based on COVID-19 GWAS. Our foundational dataset elucidates the biological impact of severe SARS-CoV-2 infection across the body, a key step towards new treatments.
Angiotensin-converting enzyme 2 (ACE2) and accessory proteases (TMPRSS2 and CTSL) are needed for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) cellular entry, and their expression may shed light on viral tropism and impact across the body. We assessed the cell-type-specific expression of ACE2, TMPRSS2 and CTSL across 107 single-cell RNA-sequencing studies from different tissues. ACE2, TMPRSS2 and CTSL are coexpressed in specific subsets of respiratory epithelial cells in the nasal passages, airways and alveoli, and in cells from other organs associated with coronavirus disease 2019 (COVID-19) transmission or pathology. We performed a meta-analysis of 31 lung single-cell RNA-sequencing studies with 1,320,896 cells from 377 nasal, airway and lung parenchyma samples from 228 individuals. This revealed cell-type-specific associations of age, sex and smoking with expression levels of ACE2, TMPRSS2 and CTSL. Expression of entry factors increased with age and in males, including in airway secretory cells and alveolar type 2 cells. Expression programs shared by ACE2+TMPRSS2+ cells in nasal, lung and gut tissues included genes that may mediate viral entry, key immune functions and epithelial–macrophage cross-talk, such as genes involved in the interleukin-6, interleukin-1, tumor necrosis factor and complement pathways. Cell-type-specific expression patterns may contribute to the pathogenesis of COVID-19, and our work highlights putative molecular pathways for therapeutic intervention.
The SARS-CoV-2 pandemic has caused over 1 million deaths globally, mostly due to acute lung injury and acute respiratory distress syndrome, or direct complications resulting in multiple-organ failures. Little is known about the host tissue immune and cellular responses associated with COVID-19 infection, symptoms, and lethality. To address this, we collected tissues from 11 organs during the clinical autopsy of 17 individuals who succumbed to COVID-19, resulting in a tissue bank of approximately 420 specimens. We generated comprehensive cellular maps capturing COVID-19 biology related to patients’ demise through single-cell and single-nucleus RNA-Seq of lung, kidney, liver and heart tissues, and further contextualized our findings through spatial RNA profiling of distinct lung regions. We developed a computational framework that incorporates removal of ambient RNA and automated cell type annotation to facilitate comparison with other healthy and diseased tissue atlases. In the lung, we uncovered significantly altered transcriptional programs within the epithelial, immune, and stromal compartments and cell intrinsic changes in multiple cell types relative to lung tissue from healthy controls. We observed evidence of: alveolar type 2 (AT2) differentiation replacing depleted alveolar type 1 (AT1) lung epithelial cells, as previously seen in fibrosis; a concomitant increase in myofibroblasts reflective of defective tissue repair; and, putative TP63+ intrapulmonary basal-like progenitor (IPBLP) cells, similar to cells identified in H1N1 influenza, that may serve as an emergency cellular reserve for severely damaged alveoli. Together, these findings suggest the activation and failure of multiple avenues for regeneration of the epithelium in these terminal lungs. SARS-CoV-2 RNA reads were enriched in lung mononuclear phagocytic cells and endothelial cells, and these cells expressed distinct host response transcriptional programs. We corroborated the compositional and transcriptional changes in lung tissue through spatial analysis of RNA profiles in situ and distinguished unique tissue host responses between regions with and without viral RNA, and in COVID-19 donor tissues relative to healthy lung. Finally, we analyzed genetic regions implicated in COVID-19 GWAS with transcriptomic data to implicate specific cell types and genes associated with disease severity. Overall, our COVID-19 cell atlas is a foundational dataset to better understand the biological impact of SARS-CoV-2 infection across the human body and empowers the identification of new therapeutic interventions and prevention strategies.
Haplotype reconstruction of distant genetic variants remains an unsolved problem due to the short-read length of common sequencing data. Here, we introduce HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms. It introduces the observation that differential allele-specific expression can link genetic variants from the same physical chromosome, thus even enabling using reads that cover only individual variants. We demonstrate HapTree-X’s feasibility on in-house sequenced Genome in a Bottle RNA-seq and various whole exome, genome, and 10X Genomics datasets. HapTree-X produces more complete phases (up to 25%), even in clinically important genes, and phases more variants than other methods while maintaining similar or higher accuracy and being up to 10× faster than other tools. The advantage of HapTree-X’s ability to use multiple lines of evidence, as well as to phase polyploid genomes in a single integrative framework, substantially grows as the amount of diverse data increases.
There is pressing urgency to understand the pathogenesis of the severe acute respiratory syndrome coronavirus clade 2 (SARS-CoV-2) which causes the disease COVID-19. SARS-CoV- 2 spike (S)-protein binds ACE2, and in concert with host proteases, principally TMPRSS2, promotes cellular entry. The cell subsets targeted by SARS-CoV-2 in host tissues, and the factors that regulate ACE2 expression, remain unknown. Here, we leverage human, non-human primate, and mouse single-cell RNA-sequencing (scRNA-seq) datasets across health and disease to uncover putative targets of SARS-CoV-2 amongst tissue-resident cell subsets. We identify ACE2 and TMPRSS2 co-expressing cells within lung type II pneumocytes, ileal absorptive enterocytes, and nasal goblet secretory cells. Strikingly, we discover that ACE2 is a human interferon- stimulated gene (ISG) in vitro using airway epithelial cells, and extend our findings to in vivo viral infections. Our data suggest that SARS-CoV-2 could exploit species-specific interferon-driven upregulation of ACE2, a tissue-protective mediator during lung injury, to enhance infection.
The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, creates an urgent need for identifying molecular mechanisms that mediate viral entry, propagation, and tissue pathology. Cell membrane bound angiotensin-converting enzyme 2 (ACE2) and associated proteases, transmembrane protease serine 2 (TMPRSS2) and Cathepsin L (CTSL), were previously identified as mediators of SARS-CoV2 cellular entry. Here, we assess the cell type-specific RNA expression of ACE2, TMPRSS2, and CTSL through an integrated analysis of 107 single-cell and single-nucleus RNA-Seq studies, including 22 lung and airways datasets (16 unpublished), and 85 datasets from other diverse organs. Joint expression of ACE2 and the accessory proteases identifies specific subsets of respiratory epithelial cells as putative targets of viral infection in the nasal passages, airways, and alveoli. Cells that co-express ACE2 and proteases are also identified in cells from other organs, some of which have been associated with COVID-19 transmission or pathology, including gut enterocytes, corneal epithelial cells, cardiomyocytes, heart pericytes, olfactory sustentacular cells, and renal epithelial cells. Performing the first meta- analyses of scRNA-seq studies, we analyzed 1,176,683 cells from 282 nasal, airway, and lung parenchyma samples from 164 donors spanning fetal, childhood, adult, and elderly age groups, associate increased levels of ACE2, TMPRSS2, and CTSL in specific cell types with increasing age, male gender, and smoking, all of which are epidemiologically linked to COVID-19 susceptibility and outcomes. Notably, there was a particularly low expression of ACE2 in the few young pediatric samples in the analysis. Further analysis reveals a gene expression program shared by ACE2+TMPRSS2+ cells in nasal, lung and gut tissues, including genes that may mediate viral entry, subtend key immune functions, and mediate epithelial-macrophage cross- talk. Amongst these are IL6, its receptor and co-receptor, IL1R, TNF response pathways, and complement genes. Cell type specificity in the lung and airways and smoking effects were conserved in mice. Our analyses suggest that differences in the cell type-specific expression of mediators of SARS-CoV-2 viral entry may be responsible for aspects of COVID-19 epidemiology and clinical course, and point to putative molecular pathways involved in disease susceptibility and pathogenesis.
Cellular immunity is critical for controlling intracellular pathogens, but individual cellular dynamics and cell–cell cooperativity in evolving human immune responses remain poorly understood. Single-cell RNA-sequencing (scRNA-seq) represents a powerful tool for dissecting complex multicellular behaviors in health and disease and nominating testable therapeutic targets. Its application to longitudinal samples could afford an opportunity to uncover cellular factors associated with the evolution of disease progression without potentially confounding inter-individual variability. Here, we present an experimental and computational methodology that uses scRNA-seq to characterize dynamic cellular programs and their molecular drivers, and apply it to HIV infection. By performing scRNA-seq on peripheral blood mononuclear cells from four untreated individuals before and longitudinally during acute infection, we were powered within each to discover gene response modules that vary by time and cell subset. Beyond previously unappreciated individual- and cell-type-specific interferon-stimulated gene upregulation, we describe temporally aligned gene expression responses obscured in bulk analyses, including those involved in proinflammatory T cell differentiation, prolonged monocyte major histocompatibility complex II upregulation and persistent natural killer (NK) cell cytolytic killing. We further identify response features arising in the first weeks of infection, for example proliferating natural killer cells, which potentially may associate with future viral control. Overall, our approach provides a unified framework for characterizing multiple dynamic cellular responses and their coordination.
We develop single-cell transcriptomic approaches to comprehensively profile human tissues and model systems. Previously, we focused on establishing, validating, scaling, and simplifying single-cell RNA-seq, often through the development of microdevices, to enable genome-wide identification of the cell types/states contained within complex biological samples. More recently, we helped both enhance the detection of phenotype-defining transcripts using these methods and simplify their on-site processing for clinical applications. In parallel, we have also worked to democratize these techniques, providing open access to resources and protocols, training thousands locally and abroad, and establishing infrastructure and on-site collaborations spanning across 6 continents and 26+ countries.
As many factors define cellular phenotype and influence disease beyond mRNA, we develop complementary methods for co-assaying other cellular attributes to enrich our understanding of the drivers of cellular behaviors. Examples including the abundance of additional ‘-omes’, the sequence and amount of important transcripts, cellular history, biophysical properties, spatial position, and functional output. Recently, we have worked to: 1. detect pathogens in cells and potentially actionable associated host factors; 2. query for specific mutations to identify cancer cells; and, 3. extract T cell receptor sequences to examine clonality. We have also formulated computational methods to derive deeper insights from these data (e.g., to examine viral dynamic in infected cells, reproducible features hidden by inter-individual variability, multicellular immune dynamics, intercellular communication, or alteration in cellular ecosystems associated with pathology).
We explore how the extracellular milieu influences cellular decision-making. Here, we have employed controlled culture conditions with cells and organoids, chemical and genetic perturbations, and constant microfluidic perfusion. We also have leveraged natural microenvironmental variation within and across tissues via microdissection and by using photoactivatable probes that retain spatial information through dissociation. In each instance, we aim to understand the degree to which extracellular environments modulate, and can be used to rationally control, the responses of individual cells or the overall distribution thereof, with an eye toward engineering tissue responses.
We examine the impact of intercellular interactions on cellular function. We have used coculture, imaging and perturbation strategies, as well as matched computational methods, to reinforce findings from dissociated samples, validate inferred cell-cell communication in vivo (e.g., between sensory neurons and lymph node resident cells), and manipulate multicellular systems (e.g., organoids). We are currently working on building arrayed, synthetically-designed cellular ensembles to examine how ‘tissue’ structure impacts functional response. Our overall goal is to understand cellular co-dependencies that influence niche- and tissue-level response dynamics.
We broadly study how intra- and extracellular circuits collectively drive healthy and diseased tissue states. By leveraging the massive genomic datasets we and others have generated from complex tissues (like melanoma tumors, inflamed gut, and nasal polyps), we have begun to identify common and unique cell types/states and circuits associated with pathology that may be important for regulating biological function and stability. Our current findings suggest multiple overlaps among distinct diseases, pointing to the possibility of a finite set of evolved response strategies and thus common interventions based on adjusting specific cell states, cell frequencies, and/or cell-cell communication pathways.
We lack effective treatments and preventions for many of the most challenging infectious diseases, many of which disproportionately impact those in low- and middle-income countries or traditionally marginalized communities.
To help address this, we have established and enabled multi-group, multi-country partnerships to deploy and adapt cutting-edge genomic tools. By examining how cells dynamically alter their states, individually and collectively, during disease and/or its resolution in acute and chronic infections—e.g., tuberculosis, HIV/SHIV, hepatitis, malaria, leprosy, flu, SARS-CoV-2, and ebola—we have uncovered cellular and molecular features of pathogen control or pathology to potentiate or counteract, respectively. Illustratively, in tuberculosis, we identified a functional role for cytotoxic CD8 and hybrid type1-type17 T cells in control of infection in the lung and links between mast, plasma, and endothelial cell abundance (type-2 immune responses) and bacterial burden. We have also built methods for examining pathogens within individual host cells to define their dynamic interdependence and identify potentially restrictive host factors.
We are currently working to identify the drivers of common host responses to distinct perturbations and their targetability, as well as the impact of different interventions (e.g., vaccines).
Immune responses play a critical role in preventing tumorigenesis. Sometimes, however, they are ineffectual and can even drive/support malignancy.
We have examined how cancer cells alter and are influenced by their tumor microenvironments (TMEs), and the impact this has on therapeutic responses. Illustratively, in Pancreatic Ductal Adenocarcinoma (PDAC), by profiling liver metastases and matched organoid models, we showed: 1. associations between TME and malignant cell state composition; 2. that autocrine and paracrine signaling can drive malignant cell state transitions, even in an isogenic background, altering the efficacy of frontline chemotherapies; and, 3. that microenvironmental manipulations can be used to control malignant state, and thereby drug responses, rationally, and to improve model fidelity for screening potential therapies. This and related work highlight the potential utility of modulating indirect target cells (T cells in the PDAC TME or basal cells in allergic inflammation) to enhance cures and preventions.
We are now systematically expanding this work to define how additional environmental and cell-intrinsic factors influence malignant cell state plasticity in PDAC and other cancers toward enhancing treatments.
We are exposed to a constant flux of external biochemical and physical stimuli as we age. Despite variability in our overall experiences and exact constitutions, our individual tissues typically manage to maintain functionality, though each can differ in its resilience to distinct stressors.
We have characterized how differences in cellular composition and communication impact tissue fitness and have identified responses and subsequent adaptations that drive chronic dysfunction. For example, although aberrant immune activity can precipitate allergic inflammatory diseases, therapies targeting immune cells and signaling are only successful in some, suggesting chronicity may involve alternative mechanisms. Previously, we helped demonstrate that dysregulated type-2 immune signaling, driven by environmental allergens, can impact tissue health in the upper airway through generating dysfunctional basal epithelial stem cells. These stem cells can then contribute to persistence by serving as repositories for allergic inflammatory memories, altering the integrity and functional output of the nasal epithelium. Our work, with that of others, suggests generalizable principles for cellular memory, and informs where and how tissues should be targeted to support health or restore function. We have since further investigated how tissue-resident cellular subsets participate in, and are shaped by, environmental exposures at barrier tissues and the functional consequences of these experiences.
We are now working to develop a more holistic appreciation for how different intra- and extracellular factors (e.g., genetics and integrated exposure history, respectively) influence barrier tissue function.
Single-cell transcriptomics reveals gene expression heterogeneity but suffers from stochastic dropout and characteristic bimodal expression distributions in which expression is either strongly non-zero or non-detectable. We propose a two-part, generalized linear model for such bimodal data that parameterizes both of these features. We argue that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation. Our model provides gene set enrichment analysis tailored to single-cell data. It provides insights into how networks of co-expressed genes evolve across an experimental treatment. MAST is available at https://github.com/RGLab/MAST.