Crohn’s disease is an inflammatory bowel disease (IBD) which most often presents with patchy lesions in the terminal ileum and colon and requires complex clinical care. Recent advances in the targeting of cytokines and leukocyte migration have greatly advanced treatment options, but most patients still relapse and inevitably progress. Although single-cell approaches are transforming our ability to understand the barrier tissue biology of inflammatory disease, comprehensive single-cell RNA-sequencing (scRNA-seq) atlases of IBD to date have largely sampled pre-treated patients with established disease. This has limited our understanding of which cell types, subsets, and states at diagnosis are predictive of disease severity and response to treatment. Here, through a combined clinical, flow cytometric, and scRNA-seq study, we profile diagnostic human biopsies from the terminal ileum of treatment-naive pediatric patients with Crohn’s disease (pediCD; n=14) and from non-inflamed pediatric controls with functional gastrointestinal disorders (FGID; n=13). To fully resolve and annotate epithelial, stromal, and immune cell states among the 201,883 single-cell transcriptomes, we develop and deploy a principled and unbiased tiered clustering approach, ARBOL, yielding 138 FGID and 305 pediCD end cell clusters. Notably, through both flow cytometry and scRNA-seq, we observe that at the level of broad cell types, treatment-naive pediCD is not readily distinguishable from FGID in cellular composition. However, by integrating high-resolution scRNA-seq analysis, we identify significant differences in cell states that arise during pediCD relative to FGID. Furthermore, by closely linking our scRNA-seq analysis with clinical meta-data, we resolve a vector of lymphoid, myeloid, and epithelial cell states in treatment-naive samples which can distinguish patients with less severe disease (those not on anti-TNF therapies (NOA)), from those with more severe disease at presentation who require anti-TNF therapies. Moreover, this vector was also able to distinguish those patients that achieve a full response (FR) to anti-TNF blockade from those more treatment-resistant patients who only achieve a partial response (PR). Our study jointly leverages a treatment-naive cohort, high-resolution principled scRNA-seq data analysis, and clinical outcomes to understand which baseline cell states may predict inflammatory disease trajectory.
SARS-CoV-2 infection can cause severe respiratory COVID-19. However, many individuals present with isolated upper respiratory symptoms, suggesting potential to constrain viral pathology to the nasopharynx. Which cells SARS-CoV-2 primarily targets and how infection influences the respiratory epithelium remains incompletely understood. We performed scRNA-seq on nasopharyngeal swabs from 58 healthy and COVID-19 participants. During COVID-19, we observe expansion of secretory, loss of ciliated, and epithelial cell repopulation via deuterosomal expansion. In mild/moderate COVID-19, epithelial cells express anti-viral/interferon-responsive genes, while cells in severe COVID-19 have muted anti-viral responses despite equivalent viral loads. SARS-CoV-2 RNA+ host-target cells are highly heterogenous, including developing ciliated, interferon-responsive ciliated, AZGP1high goblet, and KRT13+ “hillock”-like cells, and we identify genes associated with susceptibility, resistance, or infection response. Our study defines protective and detrimental responses to SARS-CoV-2, the direct viral targets of infection, and suggests that failed nasal epithelial anti-viral immunity may underlie and precede severe COVID-19.
COVID-19, caused by SARS-CoV-2, can result in acute respiratory distress syndrome and multiple-organ failure, but little is known about its pathophysiology. Here, we generated single-cell atlases of 23 lung, 16 kidney, 16 liver and 19 heart COVID-19 autopsy donor tissue samples, and spatial atlases of 14 lung donors. Integrated computational analysis uncovered substantial remodeling in the lung epithelial, immune and stromal compartments, with evidence of multiple paths of failed tissue regeneration, including defective alveolar type 2 differentiation and expansion of fibroblasts and putative TP63+ intrapulmonary basal-like progenitor cells. Viral RNAs were enriched in mononuclear phagocytic and endothelial lung cells which induced specific host programs. Spatial analysis in lung distinguished inflammatory host responses in lung regions with and without viral RNA. Analysis of the other tissue atlases showed transcriptional alterations in multiple cell types in COVID-19 donor heart tissue, and mapped cell types and genes implicated with disease severity based on COVID-19 GWAS. Our foundational dataset elucidates the biological impact of severe SARS-CoV-2 infection across the body, a key step towards new treatments.
Angiotensin-converting enzyme 2 (ACE2) and accessory proteases (TMPRSS2 and CTSL) are needed for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) cellular entry, and their expression may shed light on viral tropism and impact across the body. We assessed the cell-type-specific expression of ACE2, TMPRSS2 and CTSL across 107 single-cell RNA-sequencing studies from different tissues. ACE2, TMPRSS2 and CTSL are coexpressed in specific subsets of respiratory epithelial cells in the nasal passages, airways and alveoli, and in cells from other organs associated with coronavirus disease 2019 (COVID-19) transmission or pathology. We performed a meta-analysis of 31 lung single-cell RNA-sequencing studies with 1,320,896 cells from 377 nasal, airway and lung parenchyma samples from 228 individuals. This revealed cell-type-specific associations of age, sex and smoking with expression levels of ACE2, TMPRSS2 and CTSL. Expression of entry factors increased with age and in males, including in airway secretory cells and alveolar type 2 cells. Expression programs shared by ACE2+TMPRSS2+ cells in nasal, lung and gut tissues included genes that may mediate viral entry, key immune functions and epithelial–macrophage cross-talk, such as genes involved in the interleukin-6, interleukin-1, tumor necrosis factor and complement pathways. Cell-type-specific expression patterns may contribute to the pathogenesis of COVID-19, and our work highlights putative molecular pathways for therapeutic intervention.
The SARS-CoV-2 pandemic has caused over 1 million deaths globally, mostly due to acute lung injury and acute respiratory distress syndrome, or direct complications resulting in multiple-organ failures. Little is known about the host tissue immune and cellular responses associated with COVID-19 infection, symptoms, and lethality. To address this, we collected tissues from 11 organs during the clinical autopsy of 17 individuals who succumbed to COVID-19, resulting in a tissue bank of approximately 420 specimens. We generated comprehensive cellular maps capturing COVID-19 biology related to patients’ demise through single-cell and single-nucleus RNA-Seq of lung, kidney, liver and heart tissues, and further contextualized our findings through spatial RNA profiling of distinct lung regions. We developed a computational framework that incorporates removal of ambient RNA and automated cell type annotation to facilitate comparison with other healthy and diseased tissue atlases. In the lung, we uncovered significantly altered transcriptional programs within the epithelial, immune, and stromal compartments and cell intrinsic changes in multiple cell types relative to lung tissue from healthy controls. We observed evidence of: alveolar type 2 (AT2) differentiation replacing depleted alveolar type 1 (AT1) lung epithelial cells, as previously seen in fibrosis; a concomitant increase in myofibroblasts reflective of defective tissue repair; and, putative TP63+ intrapulmonary basal-like progenitor (IPBLP) cells, similar to cells identified in H1N1 influenza, that may serve as an emergency cellular reserve for severely damaged alveoli. Together, these findings suggest the activation and failure of multiple avenues for regeneration of the epithelium in these terminal lungs. SARS-CoV-2 RNA reads were enriched in lung mononuclear phagocytic cells and endothelial cells, and these cells expressed distinct host response transcriptional programs. We corroborated the compositional and transcriptional changes in lung tissue through spatial analysis of RNA profiles in situ and distinguished unique tissue host responses between regions with and without viral RNA, and in COVID-19 donor tissues relative to healthy lung. Finally, we analyzed genetic regions implicated in COVID-19 GWAS with transcriptomic data to implicate specific cell types and genes associated with disease severity. Overall, our COVID-19 cell atlas is a foundational dataset to better understand the biological impact of SARS-CoV-2 infection across the human body and empowers the identification of new therapeutic interventions and prevention strategies.
Haplotype reconstruction of distant genetic variants remains an unsolved problem due to the short-read length of common sequencing data. Here, we introduce HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms. It introduces the observation that differential allele-specific expression can link genetic variants from the same physical chromosome, thus even enabling using reads that cover only individual variants. We demonstrate HapTree-X’s feasibility on in-house sequenced Genome in a Bottle RNA-seq and various whole exome, genome, and 10X Genomics datasets. HapTree-X produces more complete phases (up to 25%), even in clinically important genes, and phases more variants than other methods while maintaining similar or higher accuracy and being up to 10× faster than other tools. The advantage of HapTree-X’s ability to use multiple lines of evidence, as well as to phase polyploid genomes in a single integrative framework, substantially grows as the amount of diverse data increases.
Bulk transcriptomic studies have defined classical and basal-like gene expression subtypes in pancreatic ductal adenocarcinoma (PDAC) that correlate with survival and response to chemotherapy; however, the underlying mechanisms that govern these subtypes and their heterogeneity remain elusive. Here, we performed single-cell RNA-sequencing of 23 metastatic PDAC needle biopsies and matched organoid models to understand how tumor cell-intrinsic features and extrinsic factors in the tumor microenvironment (TME) shape PDAC cancer cell phenotypes. We identify a novel cancer cell state that co-expresses basal-like and classical signatures, demonstrates upregulation of developmental and KRAS-driven gene expression programs, and represents a transitional intermediate between the basal-like and classical poles. Further, we observe structure to the metastatic TME supporting a model whereby reciprocal intercellular signaling shapes the local microenvironment and influences cancer cell transcriptional subtypes. In organoid culture, we find that transcriptional phenotypes are plastic and strongly skew toward the classical expression state, irrespective of genotype. Moreover, we show that patient-relevant transcriptional heterogeneity can be rescued by supplementing organoid media with factors found in the TME in a subtype-specific manner. Collectively, our study demonstrates that distinct microenvironmental signals are critical regulators of clinically relevant PDAC transcriptional states and their plasticity, identifies the necessity for considering the TME in cancer modeling efforts, and provides a generalizable approach for delineating the cell-intrinsic versus -extrinsic factors that govern tumor cell phenotypes.
B cell receptors (BCRs) display a combination of variable (V)-gene-encoded complementarity determining regions (CDRs) and adaptive/hypervariable CDR3 loops to engage antigens. It has long been proposed that the former tune for recognition of pathogens or groups of pathogens. To experimentally evaluate this within the human antibody repertoire, we perform immune challenges in transgenic mice that bear diverse human CDR3 and light chains but are constrained to different human VH–genes. We find that, of six commonly deployed VHsequences, only those CDRs encoded by IGHV1-2∗02 enable polyclonal antibody responses against bacterial lipopolysaccharide (LPS) when introduced to the bloodstream. The LPS is from diverse strains of gram-negative bacteria, and the VH-gene-dependent responses are directed against the non-variable and universal saccrolipid substructure of this antigen. This reveals a broad-spectrum anti-LPS response in which germline-encoded CDRs naturally hardwire the human antibody repertoire for recognition of a conserved microbial target.
Despite the epidemics of chronic obstructive pulmonary disease (COPD), the cellular and molecular mechanisms of this disease are far from being understood. Here, we characterize and classify the cellular composition within the alveolar space and peripheral blood of COPD patients and control donors using a clinically applicable single-cell RNA-seq technology corroborated by advanced computational approaches for: machine learning-based cell-type classification, identification of differentially expressed genes, prediction of metabolic changes, and modeling of cellular trajectories within a patient cohort. These high-resolution approaches revealed: massive transcriptional plasticity of macrophages in the alveolar space with increased levels of invading and proliferating cells, loss of MHC expression, reduced cellular motility, altered lipid metabolism, and a metabolic shift reminiscent of mitochondrial dysfunction in COPD patients. Collectively, single-cell omics of multi-tissue samples was used to build the first cellular and molecular framework for COPD pathophysiology as a prerequisite to develop molecular biomarkers and causal therapies against this deadly disease.
Single-cell RNA sequencing (scRNA-seq) has provided a high-dimensional catalog of millions of cells across species and diseases. These data have spurred the development of hundreds of computational tools to derive novel biological insights. Here, we outline the components of scRNA-seq analytical pipelines and the computational methods that underlie these steps. We describe available methods, highlight well-executed benchmarking studies, and identify opportunities for additional benchmarking studies and computational methods. As the biochemical approaches for single-cell omics advance, we propose coupled development of robust analytical pipelines suited for the challenges that new data present and principled selection of analytical methods that are suited for the biological questions to be addressed.
The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, creates an urgent need for identifying molecular mechanisms that mediate viral entry, propagation, and tissue pathology. Cell membrane bound angiotensin-converting enzyme 2 (ACE2) and associated proteases, transmembrane protease serine 2 (TMPRSS2) and Cathepsin L (CTSL), were previously identified as mediators of SARS-CoV2 cellular entry. Here, we assess the cell type-specific RNA expression of ACE2, TMPRSS2, and CTSL through an integrated analysis of 107 single-cell and single-nucleus RNA-Seq studies, including 22 lung and airways datasets (16 unpublished), and 85 datasets from other diverse organs. Joint expression of ACE2 and the accessory proteases identifies specific subsets of respiratory epithelial cells as putative targets of viral infection in the nasal passages, airways, and alveoli. Cells that co-express ACE2 and proteases are also identified in cells from other organs, some of which have been associated with COVID-19 transmission or pathology, including gut enterocytes, corneal epithelial cells, cardiomyocytes, heart pericytes, olfactory sustentacular cells, and renal epithelial cells. Performing the first meta- analyses of scRNA-seq studies, we analyzed 1,176,683 cells from 282 nasal, airway, and lung parenchyma samples from 164 donors spanning fetal, childhood, adult, and elderly age groups, associate increased levels of ACE2, TMPRSS2, and CTSL in specific cell types with increasing age, male gender, and smoking, all of which are epidemiologically linked to COVID-19 susceptibility and outcomes. Notably, there was a particularly low expression of ACE2 in the few young pediatric samples in the analysis. Further analysis reveals a gene expression program shared by ACE2+TMPRSS2+ cells in nasal, lung and gut tissues, including genes that may mediate viral entry, subtend key immune functions, and mediate epithelial-macrophage cross- talk. Amongst these are IL6, its receptor and co-receptor, IL1R, TNF response pathways, and complement genes. Cell type specificity in the lung and airways and smoking effects were conserved in mice. Our analyses suggest that differences in the cell type-specific expression of mediators of SARS-CoV-2 viral entry may be responsible for aspects of COVID-19 epidemiology and clinical course, and point to putative molecular pathways involved in disease susceptibility and pathogenesis.
Crucial transitions in cancer—including tumor initiation, local expansion, metastasis, and therapeutic resistance—involve complex interactions between cells within the dynamic tumor ecosystem. Transformative single-cell genomics technologies and spatial multiplex in situ methods now provide an opportunity to interrogate this complexity at unprecedented resolution. The Human Tumor Atlas Network (HTAN), part of the National Cancer Institute (NCI) Cancer Moonshot Initiative, will establish a clinical, experimental, computational, and organizational framework to generate informative and accessible three-dimensional atlases of cancer transitions for a diverse set of tumor types. This effort complements both ongoing efforts to map healthy organs and previous large-scale cancer genomics approaches focused on bulk sequencing at a single point in time. Generating single-cell, multiparametric, longitudinal atlases and integrating them with clinical outcomes should help identify novel predictive biomarkers and features as well as therapeutically relevant cell types, cell states, and cellular interactions across transitions. The resulting tumor atlases should have a profound impact on our understanding of cancer biology and have the potential to improve cancer detection, prevention, and therapeutic discovery for better precision-medicine treatments of cancer patients and those at risk for cancer.
The scale and capabilities of single-cell RNA-sequencing methods have expanded rapidly in recent years, enabling major dis- coveries and large-scale cell mapping efforts. However, these methods have not been systematically and comprehensively benchmarked. Here, we directly compare seven methods for single-cell and/or single-nucleus profiling—selecting representa- tive methods based on their usage and our expertise and resources to prepare libraries—including two low-throughput and five high-throughput methods. We tested the methods on three types of samples: cell lines, peripheral blood mononuclear cells and brain tissue, generating 36 libraries in six separate experiments in a single center. To directly compare the methods and avoid processing differences introduced by the existing pipelines, we developed scumi, a flexible computational pipeline that can be used with any single-cell RNA-sequencing method. We evaluated the methods for both basic performance, such as the structure and alignment of reads, sensitivity and extent of multiplets, and for their ability to recover known biological information in the samples.
Cellular immunity is critical for controlling intracellular pathogens, but individual cellular dynamics and cell–cell cooperativity in evolving human immune responses remain poorly understood. Single-cell RNA-sequencing (scRNA-seq) represents a powerful tool for dissecting complex multicellular behaviors in health and disease and nominating testable therapeutic targets. Its application to longitudinal samples could afford an opportunity to uncover cellular factors associated with the evolution of disease progression without potentially confounding inter-individual variability. Here, we present an experimental and computational methodology that uses scRNA-seq to characterize dynamic cellular programs and their molecular drivers, and apply it to HIV infection. By performing scRNA-seq on peripheral blood mononuclear cells from four untreated individuals before and longitudinally during acute infection, we were powered within each to discover gene response modules that vary by time and cell subset. Beyond previously unappreciated individual- and cell-type-specific interferon-stimulated gene upregulation, we describe temporally aligned gene expression responses obscured in bulk analyses, including those involved in proinflammatory T cell differentiation, prolonged monocyte major histocompatibility complex II upregulation and persistent natural killer (NK) cell cytolytic killing. We further identify response features arising in the first weeks of infection, for example proliferating natural killer cells, which potentially may associate with future viral control. Overall, our approach provides a unified framework for characterizing multiple dynamic cellular responses and their coordination.
There is pressing urgency to better understand the pathogenesis of the severe acute respiratory syndrome (SARS) coronavirus (CoV) clade SARS-CoV-2, which causes the disease known as COVID-19. SARS-CoV-2, like SARS-CoV, utilizes ACE2 to bind host cells. While initial SARS- CoV-2 cell entry and infection depend on ACE2 in concert with the protease TMPRSS2 for spike (S) protein activation, the specific cell subsets targeted by SARS-CoV-2 in host tissues, and the factors that regulate ACE2 expression, remain unknown. Here, we leverage human and non- human primate (NHP) single-cell RNA-sequencing (scRNA-seq) datasets to uncover the tissue- resident cell subsets that may serve as the cellular targets of SARS-CoV-2. We identify ACE2 and TMPRSS2 co-expressing cells within type II pneumocytes in NHP lung, absorptive enterocytes in human and NHP terminal ileum, and human nasal goblet secretory cells. Strikingly, we discover, and extensively corroborate using publicly available data sets, that ACE2 is an interferon-stimulated gene (ISG) in human epithelial cells. We further validate this finding in primary upper airway human respiratory epithelial cells. Thus, SARS-CoV-2 may exploit IFN- driven upregulation of ACE2, a key tissue-protective mediator during lung injury, to enhance infection.
Immune responses within barrier tissues are regulated, in part, by nociceptors, specialized peripheral sensory neurons that detect noxious stimuli. Previous work has shown that nociceptor ablation not only alters local responses to immune challenge at peripheral sites, but also within draining lymph nodes (LNs). The mechanisms and significance of nociceptor-dependent modulation of LN function are unknown. Indeed, although sympathetic innervation of LNs is well documented, it has been unclear whether the LN parenchyma itself is innervated by sensory neurons. Here, using a combination of high-resolution imaging, retrograde viral tracing, single-cell transcriptomics (scRNA-seq), and optogenetics, we identified and functionally tested a sensory neuro-immune circuit that is preferentially located in the outermost cortex of skin-draining LNs. Transcriptomic profiling revealed that there are at least four discrete subsets of sensory neurons that innervate LNs with a predominance of peptidergic nociceptors, and an innervation pattern that is distinct from that in the surrounding skin. To uncover potential LN-resident communication partners for LN-innervating sensory neurons, we employed scRNA-seq to generate a draft atlas of all murine LN cells and, based on receptor-ligand expression patterns, nominated candidate target populations among stromal and immune cells. Using selective optogenetic stimulation of LN-innervating sensory axons, we directly experimentally tested our inferred connections. Acute neuronal activation triggered rapid transcriptional changes preferentially within our top-ranked putative interacting partners, principally endothelium and other nodal stroma cells, as well as several innate leukocyte populations. Thus, LNs are monitored by a unique population of sensory neurons that possesses immunomodulatory potential.
To mark the 15th anniversary of Nature Methods, we asked scientists from across diverse fields of basic biology research for their views on the most exciting and essential methodological challenges that their communities are poised to tackle in the near future.
Genomic medicine has paved the way for identifying biomarkers and therapeutically actionable targets for complex diseases, but is complicated by the involvement of thousands of variably expressed genes across multiple cell types. Single-cell RNA-sequencing study (scRNA-seq) allows the characterization of such complex changes in whole organs. The study is based on applying network tools to organize and analyze scRNA-seq data from a mouse model of arthritis and human rheumatoid arthritis, in order to find diagnostic biomarkers and therapeutic targets. Diagnostic validation studies were performed using expression profiling data and potential protein biomarkers from prospective clinical studies of 13 diseases. A candidate drug was examined by a treatment study of a mouse model of arthritis, using phenotypic, immunohistochemical, and cellular analyses as read-outs. We performed the first systematic analysis of pathways, potential biomarkers, and drug targets in scRNA-seq data from a complex disease, starting with inflamed joints and lymph nodes from a mouse model of arthritis. We found the involvement of hundreds of pathways, biomarkers, and drug targets that differed greatly between cell types. Analyses of scRNA-seq and GWAS data from human rheumatoid arthritis (RA) supported a similar dispersion of pathogenic mechanisms in different cell types. Thus, systems-level approaches to prioritize biomarkers and drugs are needed. Here, we present a prioritization strategy that is based on constructing network models of disease-associated cell types and interactions using scRNA-seq data from our mouse model of arthritis, as well as human RA, which we term multicellular disease models (MCDMs). We find that the network centrality of MCDM cell types correlates with the enrichment of genes harboring genetic variants associated with RA and thus could potentially be used to prioritize cell types and genes for diagnostics and therapeutics. We validated this hypothesis in a large-scale study of patients with 13 different autoimmune, allergic, infectious, malignant, endocrine, metabolic, and cardiovascular diseases, as well as a therapeutic study of the mouse arthritis model. Overall, our results support that our strategy has the potential to help prioritize diagnostic and therapeutic targets in human disease.
Circulating tumor cells (CTCs) play a fundamental role in cancer progression. However, in mice, limited blood volume and the rarity of CTCs in the bloodstream preclude longitudinal, in-depth studies of these cells using existing liquid biopsy techniques. Here, we present an optofluidic system that continuously collects fluorescently labeled CTCs from a genetically engineered mouse model (GEMM) for several hours per day over multiple days or weeks. The system is based on a microfluidic cell sorting chip connected serially to an unanesthetized mouse via an implanted arteriovenous shunt. Pneumatically controlled microfluidic valves capture CTCs as they flow through the device, and CTC-depleted blood is returned back to the mouse via the shunt. To demonstrate the utility of our system, we profile CTCs isolated longitudinally from animals over 4 days of treatment with the BET inhibitor JQ1 using single-cell RNA sequencing (scRNA-Seq) and show that our approach eliminates potential biases driven by intermouse heterogeneity that can occur when CTCs are collected across different mice. The CTC isolation and sorting technology presented here provides a research tool to help reveal details of how CTCs evolve over time, allowing studies to credential changes in CTCs as biomarkers of drug response and facilitating future studies to understand the role of CTCs in metastasis.
Genome-wide association studies (GWAS) have revealed risk alleles for ulcerative colitis (UC), but their cell type and pathway specificities are often unknown. Here, we generate an atlas of 115,517 cells from the colon mucosa of seven UC patients and ten healthy individuals, revealing 51 epithelial, stromal, and immune cell subsets, including a subset of BEST4+ enterocytes, which may sense and respond to pH, and IL13RA2+IL-11+ inflammatory fibroblasts, which we associate with resistance to anti-TNF therapy. Inflammatory fibroblasts, inflammatory monocytes, microfold-like cells, and CD8+IL-17+ T cells expand during disease, and form intercellular interaction hubs that mediate cross-talk between diverse cellular lineages. We identify hundreds of putative autocrine and paracrine cell-cell interactions that may explain the migration, expansion, or inhibition of cell types with disease. Surprisingly, UC risk genes are often cell type specific and co-regulated in relatively few gene modules, suggesting convergence onto limited sets of cell types and pathways. Using this observation, we nominate and infer putative functions for UC risk genes across all GWAS loci. Our atlas thus provides a framework for interrogating complex human diseases and mapping risk variants onto their cell types and pathways of activity.
Human immunity relies on the coordinated responses of many cellular subsets and functional states. Inter-individual variations in cellular composition and communication could thus potentially alter host protection. Here, we explore this hypothesis by applying single-cell RNA-sequencing to examine viral responses among the dendritic cells (DCs) of three elite controllers (ECs) of HIV-1 infection.
To overcome the potentially confounding effects of donor-to-donor variability, we present a generally applicable computational framework for identifying reproducible patterns in gene expression across donors who share a unifying classification. Applying it, we discover a highly functional antiviral DC state in ECs whose fractional abundance after in vitro exposure to HIV-1 correlates with higher CD4+ T cell counts and lower HIV-1 viral loads, and that effectively primes polyfunctional T cell responses in vitro. By integrating information from existing genomic databases into our reproducibility-based analysis, we identify and validate select immunomodulators that increase the fractional abundance of this state in primary peripheral blood mononuclear cells from healthy individuals in vitro.
Overall, our results demonstrate how single-cell approaches can reveal previously unappreciated, yet important, immune behaviors and empower rational frameworks for modulating systems-level immune responses that may prove therapeutically and prophylactically useful.
The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community.
We develop single-cell genomic approaches to comprehensively profile complex biological ensembles. To date, the majority of our work has focused on establishing, validating, and scaling single-cell transcriptomics, often through the development of microdevices to enable genome-wide identification of the cell types/states that comprise functional or dysfunctional biological samples.
Most recently, we have developed Seq-Well, a portable, low-cost platform for high-throughput single-cell RNA-Seq (scRNA-Seq). By providing open access to resources and protocols, we hope to democratize access to cutting-edge approaches in single-cell genomics.
To complement and inform the analysis of scRNA-Seq datasets, we create methods to simultaneously profile additional cellular characteristics of interest (e.g. genome, epigenome, or proteome), independently, or in combination with, scRNA-Seq. For a given technique or system, we ask what additional information would help us better interpret our scRNA-Seq results and develop methods to collect these data. These novel methods often map ancillary information into a DNA-based readout that can be coanalyzed with cellular mRNA or developing/applying microdevices. Recently, we have developed a method for integrated mRNA and protein detection that leverages proximity extension assays (Genshaft et al. 2016). To extract the information content from these novel datasets more effectively, we also formulate new computational methods and analyses.
We explore how the extracellular milieu impacts intracellular decision-making by experimentally controlling the cellular microenvironment or leveraging naturally occurring sources of variation within a tissue. Here, we employ solutions that include controlled culture conditions with cells (Shalek et al., 2014) or organoids, chemical or genetic perturbations (Kumar et al., 2014), and constant microfluidic perfusion. We are also developing in silico approaches that are powered by in-situ cellular tagging techniques. In each instance, we aim to understand the degree to which extracellular environments modulate, and can be used to rationally control, the responses of individual cells or the overall distribution thereof, with an eye toward engineering ensemble responses.
We use microdevices, coupled with functional signal readouts, to create and study defined cell-cell interactions. By explicitly enumerating cell type, number, and additional functional properties (e.g., cytokine secretion), we model ensemble behaviors, looking for synergies and antagonisms. These genetic signatures, along with those collected via our other platforms, provide a unique and essential reference for deconvolving behaviors in complex ensembles. We are also using genetic tracing strategies to examine differences between interacting and random cell pairs in vivo, and are developing computational methods (Tirosh et al., 2016) to identify putative interactions from scRNA-Seq data.
As the amount of data we have relating to cells, properties, surroundings, and interactions increases exponentially, we are motivated to develop pan-system measurements and analyses to paint comprehensive pictures of immune response in health and disease. Relying on massive transcriptomic datasets generated from complex tissues, like melanoma tumors, inflamed human gut, M. tuberculosis (MTB)-induced granulomas, and healthy or SHIV-infected monkey tissues, we have begun to construct social networks of integrated responses to physiological perturbations. The technologies outlined above uniquely enable us to generate foundational datasets (e.g., transcriptomes from interacting cell pairs) for deconvolving and interpreting the potential drivers of observed ensemble behaviors, as well as for identifying which properties we cannot explain, and thus need to study. To date, our lab has generated over 2 million single-cell transcriptomes across multiple tissues, individuals, and species; we are utilizing this data, paired with metadata and additional characteristics, to look for common cellular network motifs, such as division of labor, quorum sensing, persistence, or bet-hedging.
The immune system plays an important role in regulating homeostatic balance across tissues and individuals in the face of changing and challenging environments. Given the pivotal and outsized impact cell subsets (e.g., rare precocious DCs) can have on ensemble dynamics (e.g., global activation of an antiviral response and deactivation of inflammation), we aim to understand the functional consequences of variation in cellular composition across tissues, as well as how different immune cells adapt to changing environmental conditions.
Motivating questions in the lab include:
- How can we perform observational and experimental studies to understand the fundamental units of tissues structure and function?
- Can we derive basic principles governing homeostatic and pathogenic immune responses within tissues?
- What dictates the evolution of clonal antigen-specific T & B cell responses?
To this end, we are several multiple tissues from multiple organisms across common sources of variation. By examining consistent and unique themes that emerge across these systems, we aim to extract basic principles that govern homeostatic and pathogenic immune responses within tissues. Ultimately, we intend to leverage this information to rationally engineer immune responses (e.g., in vaccines and immunotherapies).
Our immune system collaborates with environment- and diet-dependent commensals to establish and maintain homeostasis, and to defend against pathogenic threats (e.g., viruses, bacteria, fungi). We are interested in understanding the nature and impact of these interactions on host tissues, as well as potential avenues to modulate them for therapeutic or prophylactic ends.
Illustrative questions and areas of study include:
- How do microbial composition and byproducts influence cellular differentiation and phenotypic diversity within the gut?
- How do pathogens (e.g. HIV and TB) impact target cell phenotypes and overall tissue function in the context of acute and systemic infection?
- To what degree can therapeutic intervention (e.g. cART for HIV-1) re-establish homeostatic setpoint (i.e. composition and function)?
We have several projects and collaborations (local and international) actively exploring these and related questions in vaccine design that have both inspired, and take advantage of, some of our unique tools to profile thousands of single cells from limited clinical samples anywhere in the world, and develop clinically relevant hypotheses.
A diverse array of mechanisms—including genetic mutations, environmental triggers, and diet—can alter cell function and reduce tissue stability, ultimately leading to malignancy, autoimmunity, or immunodeficiency. By identifying which cells these factors affect and in what ways, we aim to develop targeted therapeutic interventions in areas such as cancer, allergy, and inflammatory bowel disease.
Motivating questions that drive our research include:
- How do the coordinated interactions between epithelial and immune populations inform barrier tissue function in the context of homeostasis, inflammation and malignancy?
- How can we leverage information across systems to derive a set of unifying principles of cellular ecology in health and disease?
Current projects aim to contrast the cellular microenvironments of healthy, inflamed, and malignant (Tirosh et al., 2016; Patel et al., 2014) tissues to examine inflammation-induced changes and the drivers of malignant transformation, as well as to identify which cells remember prior insult. We are similarly profiling aberrant immune behaviors in immune privileged tissues, such as the nervous systems. As in our host-microbial studies, our goal is to identify common features shared across different immune-related diseases that we can probe further in natural (tissues, models) and engineered (patterned cells and cellular structures, organoids) ensembles.
Single-cell transcriptomics reveals gene expression heterogeneity but suffers from stochastic dropout and characteristic bimodal expression distributions in which expression is either strongly non-zero or non-detectable. We propose a two-part, generalized linear model for such bimodal data that parameterizes both of these features. We argue that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation. Our model provides gene set enrichment analysis tailored to single-cell data. It provides insights into how networks of co-expressed genes evolve across an experimental treatment. MAST is available at https://github.com/RGLab/MAST.