Cluster and dataset comparison

Inclusion Criteria for Data in the Single Cell Type Resource

The single cell RNA-seq dataset was retrieved from published studies based on healthy human tissues. We performed meta-analysis of literature on scRNA-seq and searched single cell databases, including the Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc/home), the Human Cell Atlas (https://www.humancellatlas.org), the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/), the Tabula Sapiens (https://tabula-sapiens-portal.ds.czbiohub.org/), the Allen Brain Atlas (https://portal.brain-map.org/) and the European Genome-phenome Archive (https://www.ebi.ac.uk/ega/). To avoid technical bias and to ensure the single cell datasets can best represent the corresponding tissues, we applied the following criteria for data selection: (1) Single cell RNA sequencing was performed on single cell suspension from tissues without pre-enrichment of cell types; (2) Datasets included >3,000 cells and 20 million read counts; (3) Pseudo-bulk gene expression profiles were highly correlated with bulk RNA-seq profiles. In total, datasets from 30 tissue types and human blood were included. The samples, their references, and cluster details are listed here.

Tabula Sapiens

The Tabula sapiens project (Tabula Sapiens Consortium* et al. (2022)), includes nearly 500,000 cells from 24 different tissues and organs. The data is publicly available (https://tabula-sapiens.sf.czbiohub.org/) and included in the CZ CellxGene tool where you can explore separate cell types across tissues, here exemplified by epithelial cells across the different tissue samples.

Currently, 12 tissue types in the aggregated HPA single cell type data, which is used for cell type classification, are imported from the Tabula Sapiens (blood, bone marrow, eye, lung, lymph node, prostate, salivary gland, thymus, tongue, urinary bladder and vascular). Additionally, another 23 datasets are currently included in the HPA Single Cell Type resource, for comparison and validation of cell cluster expression profiles. For the tissue datasets represented by Tabula Sapiens, the original clustering is added to the gene detail pages for comparison of cluster expression overview.

Tabula sapiens clustering with HPA clustering of the same data

In the case of the 12 tissue types (blood, bone marrow, eye, lung, lymph node, prostate, salivary gland, thymus, tongue, urinary bladder and vascular*) represented by Tabula Sapiens data in the HPA aggregated cell type expression profile, the addition of Tabula Sapiens' own clustering details enables comparison and verification of the HPA pipeline robustness.

Tabula Sapiens data but with the HPA pipeline and clustering methods.

Tabula Sapiens data with Tabula Sapiens clustering details.

Eye

RHO is a protein enriched in the rod and cone photoreceptor cells of the retina.


RHO - eye

RHO - eye


RHO - eye

RHO - eye

Lung

DNAI2 is a protein enriched in ciliated cells, highly expressed in the cilia cluster of the lung sample.


DNAI2 - lung

DNAI2 - lung


DNAI2 - lung

DNAI2 - lung

Prostate

KLK3 is a protein enriched in prostatic glandular cells.


KLK3 - prostate

KLK3 - prostate


KLK3 - prostate

KLK3 - prostate

Salivary gland

LPO is a protein enriched in serous glandular cells of the salivary gland.


LPO - salivary gland

LPO - salivary gland


LPO - salivary gland

LPO - salivary gland

Thymus

THEMIS is a protein enriched in T-cells.


THEMIS - thymus

THEMIS - thymus


THEMIS - thymus

THEMIS - thymus

Tongue

KRT5 is a protein enriched in basal keratinocytes.


KRT5 - tongue

KRT5 - tongue


KRT5 - tongue

KRT5 - tongue

Vascular

SELE is a protein enriched in endothelial cells, and specifically detected in the endothelial cell clusters.


SELE - vasculature

SELE - vasculature


SELE - vascular

SELE - vascular

Blood

GP9 is a platele enriched protein, detected in the erythroid cluster of both datasets.


GP9 - blood

HBD - blood


GP9 - pbmc

HBD - PBMC

Bone marrow

HBD is a erythroid enriched protein, detected in the erythroid cluster of both datasets.


HBD - bone marrow

HBD - bone marrow


HBD - bone marrow

HBD - bone marrow

Lymph node

MS4A1 is a B-cell specific protein.


MS4A1 - lymph node

MS4A1 - lymph node


MS4A1 - lymph node

MS4A1 - lymph node

Spleen

IGHA2 is a protein enriched in plasma cells.


IGHA2 - spleen

IGHA2 - spleen


IGHA2 - spleen

IGHA2 - spleen

Tabula Sapiens comparison with non-Tabula Sapiens

For the tissues represented by non-Tabula Sapiens data, the addition of Tabula Sapiens clustering data enables a dataset for comparison and result validation. Here, we show examples of expression overview in each of the tissues that are represented by a non-Tabula Sapiens dataset and compare the cell type expression profile with the Tabula Sapiens results. The comparison for these tissues is available for each protein-coding gene at the gene detail page.

Adipose tissue

In the HPA cell type aggregated data, cell data representing the adipose tissue is based on data from Lazarescu O et al. (2025).
LIPE is a protein enriched in adipocytes, confirmed by immunohistochemical staining in the Tissue Atlas.


LIPE - adipose tissue

LIPE - adipose tissue

The HPA clustering of Lazarescu O et al. (2025) adipose tissue single cell data.


LIPE - adipose tissue

LIPE - adipose tissue

Tabula Sapiens expression and clustering data.

Heart muscle

In the HPA cell type aggregated data, cell data representing heart muscle is based on data from Koenig AL et al. (2022).
MB is a protein enriched in the cardiomyocytes of the heart.


MB - heart muscle

MB - heart muscle

The HPA clustering of Koenig AL et al. (2022) heart muscle single cell data.


MB - heart muscle

MB - heart muscle

Tabula Sapiens expression and clustering data.

Kidney

In the HPA cell type aggregated data, cell data representing the kidney is based on data from Lake BB et al. (2023).
SLC12A1 is a protein enriched in distal tubular cells and collecting ducts, confirmed by immunohistochemical staining in the Tissue Atlas.


SLC12A1 - kidney

SLC12A1 - kidney

The HPA clustering of Lake BB et al. (2023) kidney single cell data.


SLC12A1 - kidney

SLC12A1 - kidney

Tabula Sapiens expression and clustering data.

Liver

In the HPA cell type aggregated data, cell data representing the liver is based on data from MacParland SA et al. (2018).
HAO1 is protein enriched in hepatocytes of the liver, consistent specificity to the cell clusters independent of the dataset.


HAO1 - liver

HAO1 - liver

The HPA clustering of MacParland SA et al. (2018) liver single cell data.


HAO1 - liver

HAO1 - liver

Tabula Sapiens expression and clustering data.

Pancreas

In the HPA cell type aggregated data, cell data representing the pancreas is based on data from Craig-Schapiro R et al. (2025).
CPA1 is a protein enriched in pancreatic exocrine glandular cells,


CPA1 - pancreas

CPA1 - pancreas

The HPA clustering of Craig-Schapiro R et al. (2025) pancreas single cell data.


CPA1 - pancreas

CPA1 - pancreas

Tabula Sapiens expression and clustering data.

Skeletal muscle

In the HPA cell type aggregated data, cell data representing skeletal muscle is based on data from Pass CG et al. (2023).
MYH2 is a protein enriched in skeletal myocytes.


MYH2 - skeletal muscle

MYH2 - skeletal muscle

The HPA clustering of Pass CG et al. (2023) skeletal muscle single cell data.


MYH2 - skeletal muscle

MYH2 - skeletal muscle

Tabula Sapiens expression and clustering data.

Skin

In the HPA cell type aggregated data, cell data representing the skin is based on data from Solé-Boldo L et al. (2020).
KRT10 is a protein enriched in suprabasal keratinocytes.


KRT10 - skin

KRT10 - skin

The HPA clustering of Solé-Boldo L et al. (2020) skin single cell data.


KRT10 - skin

KRT10 - skin

Tabula Sapiens expression and clustering data.

Small intestine

In the HPA cell type aggregated data, cell data representing the small intestine is based on data from Wang Y et al. (2020).
ALPI is a protein with elevated expression in proximal enterocytes.


ALPI - small intestine

ALPI - small intestine

The HPA clustering of Wang Y et al. (2020) small intestine single cell data.


ALPI - small intestine

ALPI - small intestine

Tabula Sapiens expression and clustering data.