Single Cell Type - Methods summary
The Single Cell Type section contains single cell RNA sequencing (scRNAseq) data from 25 major healthy tissues and organs and 444 individual cell type clusters.
Key publication: Karlsson M et al. (2021) “A single cell type transcriptomics map of human tissues” Sci Adv 28;7(31): abh2169
What can you learn from the Single Cell Type section?
How has the data been generated?
Collection of scRNA-seq data
The scRNA-seq dataset was retrieved from published studies based on healthy human tissues. We performed meta-analysis of literature on scRNA-seq and searched single cell databases, including the Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc/home), the Human Cell Atlas (https://www.humancellatlas.org), the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) and the European Genome-phenome Archive (https://www.ebi.ac.uk/ega/). To avoid technical bias and to ensure the single cell datasets can best represent the corresponding tissues, we applied the following criteria for data selection: (1) Single cell RNA sequencing was performed on single cell suspension from tissues without pre-enrichment of cell types; (2) Datasets included >4,000 cells and 20 million read counts; (3) Pseudo-bulk gene expression profiles were highly correlated with bulk RNA-seq profiles. In total, datasets from 25 tissue types and human blood were included.
Immunohistochemistry on tissue microarrays
For confirming scRNA-seq profiles and cell type specificity at the protein level, antibody-based protein expression profiling of normal human tissue types was generated using immunohistochemistry (IHC) on tissue microarrays (TMAs), as described more in detail in the the Tissue section.
How has the data been analyzed?
Quantified raw sequencing data were downloaded from the corresponding depository database based on the accession number provided by the study. Unfiltered data were used as input for downstream analysis with in-house pipeline using Single-Cell Analysis in Python, where the data was considered valid if: i) a cell has at least 200 genes; and ii) a gene is expressed in at least 10% of the cells. By pooling the data from each cell type cluster and calculating the average normalized protein-coding transcripts per million, it is possible to visualize expression profiles for each gene in each cluster at both a genome-wide and single cell type level. Each of the 444 different cell type clusters were manually annotated choosing the main cell type based on an extensive survey of well-known tissue and cell-type specific markers, including both markers from the original publications, and additional markers used in pathology diagnostics. The average expression for each cell type forms the basis for the classification of all the protein-coding genes with regard to specificity, as described below.
What is presented in the section?
The data is presented as interactive UMAP plots and summarizing bar plots, displaying the expression of each gene in each cluster or single cell type, including information on cell type specificity from a body-wide perspective. The data is linked to protein expression profiles in the Tissue section, presenting the single cell type specificity as high-resolution histological images
How has the classification of all protein-coding genes been done?
A genome-wide classification of the protein-coding genes with regard to single cell type specificity has been performed using between-sample normalized data. The results can serve as a reference for researchers interested in expression profiles in any of all the main cell types. The genes were classified according to specificity into (i) cell type enriched genes with at least fourfold higher expression levels in one cell type as compared with any other analyzed cell type; (ii) group enriched genes with enriched expression in a small number of cell types (2 to 10); and (iii) cell type enhanced genes with only moderately elevated expression. Finally, a new classification based on expression clusters has recently been introduced in which all genes are clustered based on expression similarity across all cell types. The results are presented as an UMAP cluster plot (see figure) and an interactive version is available here.