Single cell type - Methods summaryThe single cell type atlas aims to create a comprehensive map of protein-coding gene expression across cell types found in the major adult tissues. We achieved this by systematically integrating and re-analyzing single cell and single nucleus RNA sequencing datasets from 34 healthy tissues under a common pipeline. This section summarizes the data processing and analysis methods for of 36 scRNA-seq or snRNA-seq datasets, 1175 individual cell type clusters, and 154 final cell types. Method details
Key publicationsKarlsson M et al. (2021) “A single cell type transcriptomics map of human tissues” Sci Adv. Shi M et al. (2025) "A resource for whole-body gene expression map of human tissues based on integration of single cell and bulk transcriptomics" Genome Biol.
How has the data been generated?Collection of scRNA-seq and snRNA-seq dataThe 34 scRNA-seq/snRNA-seq datasets included were systematically selected through an extensive literature search on single cell transcriptomic databases and studies featuring healthy adult human tissue. These datasets were respectively retrieved from the Single Cell Expression Atlas, the Human Cell Atlas, the Gene Expression Omnibus (GEO), EMBL-EBI Biostudies, and the Tabula Sapiens. A complete list of datasets and their references is presented here. Gene expression quantificationFor most datasets, we obtained the raw sequencing files (FASTQ files). To obtain the gene counts, we mapped the reads (inside the FASTQ files) to the human genome reference with gene annotations based on Ensemble 109. To correct for the residual RNA in solution inside droplets, each sample went through ambient RNA correction procedure (Soupx). We applied a multistep quality control procedure to ensure removal of poor-quality cells and technical artifacts. This involved removal of probable doublets (two cells inside one droplet), removal of droplets with high mitochondrial content, and removal of high RNA content outliers. Detection thresholds were additionally carefully adjusted so that we captured low RNA content cells, and to ensure representation of rare cell types. Clustering and cell type annotationFollowing cell quality control, each dataset was individually processed through a cell type annotation procedure involving cell clustering. At a very basic level, cell clustering works by reducing the complex and high-dimensional gene expression into its principal components, followed by running the clustering algorithm (Leiden) to group cells under a generic label. The gene expression profile of these cell clusters is later investigated using known cell-specific marker genes to identify the underlying cell type. Each cluster we assigned two levels of classification: (1) A detailed cell type annotation, that prioritises resolution within a tissue (e.g., Arterial endothelial cells), and (2) a main cell type annotation, that harmonizes cluster labels across all datasets included (e.g., Vascular endothelial cells). Normalisation and data integrationAfter cluster labeling, we used a pseudobulk approach to integrate expression profiles across all tissues, creating a single, unified reference for each cell type. The process followed these steps:
Cell type hierarchical organisationTo facilitate effective data presentation and usability, we introduced ahierarchical structure to the cell type annotations: cell types (n = 154), cell type groups (n = 53), and cell classes (n = 15). The cell type and cell type Group levels contain expression data used for downstream gene classification. In contrast, the cell classes serve an organizational purpose. The cell type group expression profile is derived from the cell type data using a max pooling aggregation method. For every gene within a cell type group, the maximum nCPM value observed across constituent cell types is retained. Immunohistochemistry on tissue microarraysFor confirming scRNA-seq profiles and cell type specificity at the protein level, antibody-based protein expression profiling of normal human tissue types was generated using immunohistochemistry (IHC) on tissue microarrays (TMAs), as described in more detail in the Tissue section.
How has the classification of all protein-coding genes been done?Gene Classification and Specificity ScoringWe analyzed the processed nCPM data to classify every protein-coding gene based on its expression pattern. This was done independently on both the cell type and cell type group data.
Gene clusteringThe processed single cell types data was used to cluster genes according to their expression across clusters. Genes detected in at least one cell type (nCPM > 1) were taken into account for annotation. The procedure involved genewise scaling the nCPM expression and extracging the principal components (PCA). Based on these principal components, the gene to gene spearman distances were calculated. Subsequently, a neighborhood graph was computet and based on this the louvain clustering algorithm was set to run 100 times. The consensus clusters calculated from the 100 iterations was taken as the final cluster assignment. This procedure resulted in 110 expression clusters. These gene clusters were subsequently mannuyally annotated, based on overrepresentation analysis across divers biological databases and our own diverse specificity annotations. The results The clustering of 19294 genes showing expression above cut of in the in single cell types resulted in 110 expression clusters, which have been manually annotated to describe common features in terms of function and specificity. The result of the cluster analysis is presented as a UMAP based on gene expression, where each cluster has been summarized as colored areas containing most of the cluster genes. The interactive results are available here.
What is presented?The data is presented as interactive UMAP plots and summarizing bar plots, displaying the expression of each gene in each cluster or single cell type, including information on cell type specificity from a body-wide perspective. The data is linked to protein expression profiles in the Tissue section, presenting the single cell type specificity as high-resolution histological images.
What is the difference between cell type, cell group, and cell class?Below is an example illustrating the herarchy of cell class, grouped cell type, cell type and cell type detail.
All the cluster annotations and cell type information, including respective cell type group is listed in the cluster data list. More details and an interactive way to explore the cell type hierarchy is found here.
Data overviewThe data we present here encompasses the transcriptome of 1,217,972 cells after cell filtering and quality control. The individual cells were grouped into 1175 cell clusters and manually annotated with both detailed and main cell type names. Clusters that met quality and confidence criteria were then integrated to define 154 final cell types, which were again grouped once more onto 53 broader cell type groups. There are several lists available for download, providing a complete overview and expression data across the different level of details.
|