COVID-19, caused by SARS-CoV-2, can result in acute respiratory distress syndrome and multiple-organ failure, but little is known about its pathophysiology. Here, we generated single-cell atlases of 23 lung, 16 kidney, 16 liver and 19 heart COVID-19 autopsy donor tissue samples, and spatial atlases of 14 lung donors. Integrated computational analysis uncovered substantial remodeling in the lung epithelial, immune and stromal compartments, with evidence of multiple paths of failed tissue regeneration, including defective alveolar type 2 differentiation and expansion of fibroblasts and putative TP63+ intrapulmonary basal-like progenitor cells. Viral RNAs were enriched in mononuclear phagocytic and endothelial lung cells which induced specific host programs. Spatial analysis in lung distinguished inflammatory host responses in lung regions with and without viral RNA. Analysis of the other tissue atlases showed transcriptional alterations in multiple cell types in COVID-19 donor heart tissue, and mapped cell types and genes implicated with disease severity based on COVID-19 GWAS. Our foundational dataset elucidates the biological impact of severe SARS-CoV-2 infection across the body, a key step towards new treatments.

Angiotensin-converting enzyme 2 (ACE2) and accessory proteases (TMPRSS2 and CTSL) are needed for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) cellular entry, and their expression may shed light on viral tropism and impact across the body. We assessed the cell-type-specific expression of ACE2, TMPRSS2 and CTSL across 107 single-cell RNA-sequencing studies from different tissues. ACE2, TMPRSS2 and CTSL are coexpressed in specific subsets of respiratory epithelial cells in the nasal passages, airways and alveoli, and in cells from other organs associated with coronavirus disease 2019 (COVID-19) transmission or pathology. We performed a meta-analysis of 31 lung single-cell RNA-sequencing studies with 1,320,896 cells from 377 nasal, airway and lung parenchyma samples from 228 individuals. This revealed cell-type-specific associations of age, sex and smoking with expression levels of ACE2, TMPRSS2 and CTSL. Expression of entry factors increased with age and in males, including in airway secretory cells and alveolar type 2 cells. Expression programs shared by ACE2+TMPRSS2+ cells in nasal, lung and gut tissues included genes that may mediate viral entry, key immune functions and epithelial–macrophage cross-talk, such as genes involved in the interleukin-6, interleukin-1, tumor necrosis factor and complement pathways. Cell-type-specific expression patterns may contribute to the pathogenesis of COVID-19, and our work highlights putative molecular pathways for therapeutic intervention.

The SARS-CoV-2 pandemic has caused over 1 million deaths globally, mostly due to acute lung injury and acute respiratory distress syndrome, or direct complications resulting in multiple-organ failures. Little is known about the host tissue immune and cellular responses associated with COVID-19 infection, symptoms, and lethality. To address this, we collected tissues from 11 organs during the clinical autopsy of 17 individuals who succumbed to COVID-19, resulting in a tissue bank of approximately 420 specimens. We generated comprehensive cellular maps capturing COVID-19 biology related to patients’ demise through single-cell and single-nucleus RNA-Seq of lung, kidney, liver and heart tissues, and further contextualized our findings through spatial RNA profiling of distinct lung regions. We developed a computational framework that incorporates removal of ambient RNA and automated cell type annotation to facilitate comparison with other healthy and diseased tissue atlases. In the lung, we uncovered significantly altered transcriptional programs within the epithelial, immune, and stromal compartments and cell intrinsic changes in multiple cell types relative to lung tissue from healthy controls. We observed evidence of: alveolar type 2 (AT2) differentiation replacing depleted alveolar type 1 (AT1) lung epithelial cells, as previously seen in fibrosis; a concomitant increase in myofibroblasts reflective of defective tissue repair; and, putative TP63+ intrapulmonary basal-like progenitor (IPBLP) cells, similar to cells identified in H1N1 influenza, that may serve as an emergency cellular reserve for severely damaged alveoli. Together, these findings suggest the activation and failure of multiple avenues for regeneration of the epithelium in these terminal lungs. SARS-CoV-2 RNA reads were enriched in lung mononuclear phagocytic cells and endothelial cells, and these cells expressed distinct host response transcriptional programs. We corroborated the compositional and transcriptional changes in lung tissue through spatial analysis of RNA profiles in situ and distinguished unique tissue host responses between regions with and without viral RNA, and in COVID-19 donor tissues relative to healthy lung. Finally, we analyzed genetic regions implicated in COVID-19 GWAS with transcriptomic data to implicate specific cell types and genes associated with disease severity. Overall, our COVID-19 cell atlas is a foundational dataset to better understand the biological impact of SARS-CoV-2 infection across the human body and empowers the identification of new therapeutic interventions and prevention strategies.

Haplotype reconstruction of distant genetic variants remains an unsolved problem due to the short-read length of common sequencing data. Here, we introduce HapTree-X, a probabilistic framework that utilizes latent long-range information to reconstruct unspecified haplotypes in diploid and polyploid organisms. It introduces the observation that differential allele-specific expression can link genetic variants from the same physical chromosome, thus even enabling using reads that cover only individual variants. We demonstrate HapTree-X’s feasibility on in-house sequenced Genome in a Bottle RNA-seq and various whole exome, genome, and 10X Genomics datasets. HapTree-X produces more complete phases (up to 25%), even in clinically important genes, and phases more variants than other methods while maintaining similar or higher accuracy and being up to 10×  faster than other tools. The advantage of HapTree-X’s ability to use multiple lines of evidence, as well as to phase polyploid genomes in a single integrative framework, substantially grows as the amount of diverse data increases.

There is pressing urgency to understand the pathogenesis of the severe acute respiratory syndrome coronavirus clade 2 (SARS-CoV-2) which causes the disease COVID-19. SARS-CoV- 2 spike (S)-protein binds ACE2, and in concert with host proteases, principally TMPRSS2, promotes cellular entry. The cell subsets targeted by SARS-CoV-2 in host tissues, and the factors that regulate ACE2 expression, remain unknown. Here, we leverage human, non-human primate, and mouse single-cell RNA-sequencing (scRNA-seq) datasets across health and disease to uncover putative targets of SARS-CoV-2 amongst tissue-resident cell subsets. We identify ACE2 and TMPRSS2 co-expressing cells within lung type II pneumocytes, ileal absorptive enterocytes, and nasal goblet secretory cells. Strikingly, we discover that ACE2 is a human interferon- stimulated gene (ISG) in vitro using airway epithelial cells, and extend our findings to in vivo viral infections. Our data suggest that SARS-CoV-2 could exploit species-specific interferon-driven upregulation of ACE2, a tissue-protective mediator during lung injury, to enhance infection.

The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, creates an urgent need for identifying molecular mechanisms that mediate viral entry, propagation, and tissue pathology. Cell membrane bound angiotensin-converting enzyme 2 (ACE2) and associated proteases, transmembrane protease serine 2 (TMPRSS2) and Cathepsin L (CTSL), were previously identified as mediators of SARS-CoV2 cellular entry. Here, we assess the cell type-specific RNA expression of ACE2, TMPRSS2, and CTSL through an integrated analysis of 107 single-cell and single-nucleus RNA-Seq studies, including 22 lung and airways datasets (16 unpublished), and 85 datasets from other diverse organs. Joint expression of ACE2 and the accessory proteases identifies specific subsets of respiratory epithelial cells as putative targets of viral infection in the nasal passages, airways, and alveoli. Cells that co-express ACE2 and proteases are also identified in cells from other organs, some of which have been associated with COVID-19 transmission or pathology, including gut enterocytes, corneal epithelial cells, cardiomyocytes, heart pericytes, olfactory sustentacular cells, and renal epithelial cells. Performing the first meta- analyses of scRNA-seq studies, we analyzed 1,176,683 cells from 282 nasal, airway, and lung parenchyma samples from 164 donors spanning fetal, childhood, adult, and elderly age groups, associate increased levels of ACE2, TMPRSS2, and CTSL in specific cell types with increasing age, male gender, and smoking, all of which are epidemiologically linked to COVID-19 susceptibility and outcomes. Notably, there was a particularly low expression of ACE2 in the few young pediatric samples in the analysis. Further analysis reveals a gene expression program shared by ACE2+TMPRSS2+ cells in nasal, lung and gut tissues, including genes that may mediate viral entry, subtend key immune functions, and mediate epithelial-macrophage cross- talk. Amongst these are IL6, its receptor and co-receptor, IL1R, TNF response pathways, and complement genes. Cell type specificity in the lung and airways and smoking effects were conserved in mice. Our analyses suggest that differences in the cell type-specific expression of mediators of SARS-CoV-2 viral entry may be responsible for aspects of COVID-19 epidemiology and clinical course, and point to putative molecular pathways involved in disease susceptibility and pathogenesis.

Cellular immunity is critical for controlling intracellular pathogens, but individual cellular dynamics and cell–cell cooperativity in evolving human immune responses remain poorly understood. Single-cell RNA-sequencing (scRNA-seq) represents a powerful tool for dissecting complex multicellular behaviors in health and disease and nominating testable therapeutic targets. Its application to longitudinal samples could afford an opportunity to uncover cellular factors associated with the evolution of disease progression without potentially confounding inter-individual variability. Here, we present an experimental and computational methodology that uses scRNA-seq to characterize dynamic cellular programs and their molecular drivers, and apply it to HIV infection. By performing scRNA-seq on peripheral blood mononuclear cells from four untreated individuals before and longitudinally during acute infection, we were powered within each to discover gene response modules that vary by time and cell subset. Beyond previously unappreciated individual- and cell-type-specific interferon-stimulated gene upregulation, we describe temporally aligned gene expression responses obscured in bulk analyses, including those involved in proinflammatory T cell differentiation, prolonged monocyte major histocompatibility complex II upregulation and persistent natural killer (NK) cell cytolytic killing. We further identify response features arising in the first weeks of infection, for example proliferating natural killer cells, which potentially may associate with future viral control. Overall, our approach provides a unified framework for characterizing multiple dynamic cellular responses and their coordination.

We develop single-cell genomic approaches to comprehensively profile complex biological ensembles. To date, the majority of our work has focused on establishing, validating, and scaling single-cell transcriptomics, often through the development of microdevices to enable genome-wide identification of the cell types/states that comprise functional or dysfunctional biological samples.

Most recently, we have developed Seq-Well, a portable, low-cost platform for high-throughput single-cell RNA-Seq (scRNA-Seq). By providing open access to resources and protocols, we hope to democratize access to cutting-edge approaches in single-cell genomics.

To complement and inform the analysis of scRNA-Seq datasets, we create methods to simultaneously profile additional cellular characteristics of interest (e.g. genome, epigenome, or proteome), independently, or in combination with, scRNA-Seq. For a given technique or system, we ask what additional information would help us better interpret our scRNA-Seq results and develop methods to collect these data. These novel methods often map ancillary information into a DNA-based readout that can be coanalyzed with cellular mRNA or developing/applying microdevices. Recently, we have developed a method for integrated mRNA and protein detection that leverages proximity extension assays (Genshaft et al. 2016). To extract the information content from these novel datasets more effectively, we also formulate new computational methods and analyses.

We explore how the extracellular milieu impacts intracellular decision-making by experimentally controlling the cellular microenvironment or leveraging naturally occurring sources of variation within a tissue. Here, we employ solutions that include controlled culture conditions with cells (Shalek et al., 2014) or organoids, chemical or genetic perturbations (Kumar et al., 2014), and constant microfluidic perfusion. We are also developing in silico approaches that are powered by in-situ cellular tagging techniques. In each instance, we aim to understand the degree to which extracellular environments modulate, and can be used to rationally control, the responses of individual cells or the overall distribution thereof, with an eye toward engineering ensemble responses.

We use microdevices, coupled with functional signal readouts, to create and study defined cell-cell interactions. By explicitly enumerating cell type, number, and additional functional properties (e.g., cytokine secretion), we model ensemble behaviors, looking for synergies and antagonisms­. These genetic signatures, along with those collected via our other platforms, provide a unique and essential reference for deconvolving behaviors in complex ensembles. We are also using genetic tracing strategies to examine differences between interacting and random cell pairs in vivo, and are developing computational methods (Tirosh et al., 2016) to identify putative interactions from scRNA-Seq data.

As the amount of data we have relating to cells, properties, surroundings, and interactions increases exponentially, we are motivated to develop pan-system measurements and analyses to paint comprehensive pictures of immune response in health and disease. Relying on massive transcriptomic datasets generated from complex tissues, like melanoma tumors, inflamed human gut, M. tuberculosis (MTB)-induced granulomas, and healthy or SHIV-infected monkey tissues, we have begun to construct social networks of integrated responses to physiological perturbations. The technologies outlined above uniquely enable us to generate foundational datasets (e.g., transcriptomes from interacting cell pairs) for deconvolving and interpreting the potential drivers of observed ensemble behaviors, as well as for identifying which properties we cannot explain, and thus need to study. To date, our lab has generated over 2 million single-cell transcriptomes across multiple tissues, individuals, and species; we are utilizing this data, paired with metadata and additional characteristics, to look for common cellular network motifs, such as division of labor, quorum sensing, persistence, or bet-hedging.


The immune system plays an important role in regulating homeostatic balance across tissues and individuals in the face of changing and challenging environments. Given the pivotal and outsized impact cell subsets (e.g., rare precocious DCs) can have on ensemble dynamics (e.g., global activation of an antiviral response and deactivation of inflammation), we aim to understand the functional consequences of variation in cellular composition across tissues, as well as how different immune cells adapt to changing environmental conditions.

Motivating questions in the lab include:

  1. How can we perform observational and experimental studies to understand the fundamental units of tissues structure and function?
  2. Can we derive basic principles governing homeostatic and pathogenic immune responses within tissues?
  3. What dictates the evolution of clonal antigen-specific T & B cell responses?

To this end, we are several multiple tissues from multiple organisms across common sources of variation. By examining consistent and unique themes that emerge across these systems, we aim to extract basic principles that govern homeostatic and pathogenic immune responses within tissues. Ultimately, we intend to leverage this information to rationally engineer immune responses (e.g., in vaccines and immunotherapies).

Our immune system collaborates with environment- and diet-dependent commensals to establish and maintain homeostasis, and to defend against pathogenic threats (e.g., viruses, bacteria, fungi). We are interested in understanding the nature and impact of these interactions on host tissues, as well as potential avenues to modulate them for therapeutic or prophylactic ends.

Illustrative questions and areas of study include:

  1. How do microbial composition and byproducts influence cellular differentiation and phenotypic diversity within the gut?
  2. How do pathogens (e.g. HIV and TB) impact target cell phenotypes and overall tissue function in the context of acute and systemic infection?
  3. To what degree can therapeutic intervention (e.g. cART for HIV-1) re-establish homeostatic setpoint (i.e. composition and function)?

We have several projects and collaborations (local and international) actively exploring these and related questions in vaccine design that have both inspired, and take advantage of, some of our unique tools to profile thousands of single cells from limited clinical samples anywhere in the world, and develop clinically relevant hypotheses.

A diverse array of mechanisms—including genetic mutations, environmental triggers, and diet—can alter cell function and reduce tissue stability, ultimately leading to malignancy, autoimmunity, or immunodeficiency. By identifying which cells these factors affect and in what ways, we aim to develop targeted therapeutic interventions in areas such as cancer, allergy, and inflammatory bowel disease.

Motivating questions that drive our research include:

  1. How do the coordinated interactions between epithelial and immune populations inform barrier tissue function in the context of homeostasis, inflammation and malignancy?
  2. How can we leverage information across systems to derive a set of unifying principles of cellular ecology in health and disease?

Current projects aim to contrast the cellular microenvironments of healthy, inflamed, and malignant (Tirosh et al., 2016; Patel et al., 2014) tissues to examine inflammation-induced changes and the drivers of malignant transformation, as well as to identify which cells remember prior insult. We are similarly profiling aberrant immune behaviors in immune privileged tissues, such as the nervous systems. As in our host-microbial studies, our goal is to identify common features shared across different immune-related diseases that we can probe further in natural (tissues, models) and engineered (patterned cells and cellular structures, organoids) ensembles.

Single-cell transcriptomics reveals gene expression heterogeneity but suffers from stochastic dropout and characteristic bimodal expression distributions in which expression is either strongly non-zero or non-detectable. We propose a two-part, generalized linear model for such bimodal data that parameterizes both of these features. We argue that the cellular detection rate, the fraction of genes expressed in a cell, should be adjusted for as a source of nuisance variation. Our model provides gene set enrichment analysis tailored to single-cell data. It provides insights into how networks of co-expressed genes evolve across an experimental treatment. MAST is available at