Deep Visual Proteomics — Method summaryThis dataset represents a mass spectrometry (MS)-based, cell type-resolved proteomic atlas of the human body. It was generated with Deep Visual Proteomics (DVP), a spatial proteomics technology developed in the Mann lab. 27 cell types across 14 tissues from a single healthy female donor were profiled, quantifying about two-thirds of all human protein-coding genes with up to 8,500 proteins per cell type. Methods details Access methods details for more technical background and cell type overview Key publicationsThis dataset: Weiss et al., bioRxiv
What is Deep Visual Proteomics?Deep Visual Proteomics (DVP) is a spatial proteomics technology that allows investigation of selected cells within their native tissue context. It combines high-resolution tissue imaging, AI-guided cell segmentation, laser microdissection and ultra-high-sensitivity mass spectrometry. By bridging the world of MS-based proteomics and imaging, DVP enables proteome quantification while preserving spatial and morphological context. Although DVP can resolve individual cells, this atlas pools cells of the same type per sample to maximize proteome depth and reproducibility. How was the study setup?
Figure 1. Schematic overview of the Deep Visual Proteomics (DVP) workflow. Tissue samples were collected from 14 organs across the human body and processed into sections for imaging-based analysis. Tissue sections were stained by immunofluorescence, and cell types were identified and segmented based on marker-based features, yielding 27 distinct cell types. Cells of one kind were isolated by laser microdissection and collected into individual wells, followed by sample preparation and analysis by ultra-high-sensitivity mass spectrometry. Deep cell-type-resolved proteomes were obtained and are now available as part of the Human Protein Atlas Single Cell Resource. Created in BioRender. Weiss, C. (2026) https://BioRender.com/7wku96h Which samples were analyzed?The atlas is based on 14 formalin-fixed paraffin-embedded tissues from one healthy young female donor, sampled at approximately 15h post-mortem. This single-donor design eliminates inter-individual variability and allowed inclusion of female reproductive tissues, such as ovary and fallopian tube. How are cell types defined and isolated?Within each tissue, a pre-defined set of cell types are visualized by immunofluorescence signal of cell type-specific marker proteins. For this, markers and the downstream image analysis were optimized for each of the 27 cell types. On whole-slide images, stained cells are outlined by AI-based segmentation pipelines tailored to each cell type's marker pattern and morphology. These cell contours are slightly dilated to ensure the complete cell is captured before individual excision by laser microdissection. Five replicates are collected per cell type whenever possible, each pooling 100,000 µm² of individually cut cell’s area so that cell types of very different size and morphology are sampled on comparable terms. How is protein abundance measured?The samples were analyzed by data-independent acquisition (DIA) mass spectrometry on an Orbitrap Astral Zoom instrument coupled to an Evosep One chromatography system. For this, excised cells were lysed, proteins digested and peptides cleaned before being loaded on the LC-MS/MS setup. The samples were measured at a gradient of 80 samples per day (SPD) resulting in a measurement time of below 2 days. Raw files were processed with DIA-NN in library-free mode against the reviewed human reference proteome and normalized across all samples with directLFQ. What is the difference between a cell contour and a cell?Here, cells are identified based on immunofluorescence-based marker signal on 3 µm-thin tissue sections. Because each section captures only a 3 µm-thick slice of a three-dimensional cell, what is outlined by segmentation and excised by laser microdissection is restricted in their third dimension. How pure are the isolated cell type populations?DVP combines marker-based cell identification with individual laser microdissection of each segmented contour, enabling highly precise isolation of the targeted cell population directly from intact tissue. That said, some cell types present inherent challenges for clean isolation. Because each tissue section captures a thin slice of a three-dimensional tissue architecture, material lying directly above, below or adjacent to a targeted cell can fall within the same contour. This is particularly relevant in densely packed tissues where small cells of a different type can sit immediately next to the selected one or complex tissue architectures such as the brain where neuronal and glial projections extend across the section plane. The resulting proteomes are therefore best understood as cell type-enriched rather than pure single cell type populations. Still, each cell type reliably recovers its expected marker and characteristic proteins, confirming that the dominant signal reflects the selected population. Notably, this consideration applies to any cell type isolation approach. Dissociation-based single-cell techniques achieve high purity but depend on successful tissue breakdown, which is not feasible for all cell types. For instance, tightly connected epithelial cells and exceptionally large cells such as cardiomyocytes and oocytes are often inaccessible. DVP, by analyzing cells within intact tissue, enables the caption such populations and additionally DVP preserves spatial metadata for the isolated population. Which proteome depth was achieved?Across the 27 cell types, the atlas captures nearly 14,000 protein groups which equals about two-thirds (67%) of all human protein-coding genes. A total dynamic range of 8.5 orders of magnitude was covered. On a cell type level between 3,645 and 8,500 proteins per cell type were quantified. Coverage extends to 86% of annotated enzymes, 90% of essential proteins, 76% of FDA-approved drug targets and 89% of potential drug targets. Moreover, even classes traditionally challenging for MS are well represented, including 61% of predicted membrane proteins, 50% of transcription factors. Gene set enrichment analysis enabled systematic functional characterization of all 27 cell types, confirming that each cell type's proteome reflects its specialized biological role.
Why can I see 27 bars but the specificity category is based on 24 DVP samples?Prior to on indiviual RNA-protein pair concordance evaluation, six DVP cell types were consolidated into three broader groups to reflect high protein overlap and simplify cross-modality comparison: alveolar cells type 1 and 2 into alveolar cells, ciliated and secretory cells of the fallopian tube into fallopian tube epithelial cells, and CD4 and CD8 T cells into T cells. Resulting this, agreement score and specificity and distribution categories are based on a set of 24 cell types. All 27 samples are shown separately in the gene summary pages, but specificity categories and agreement scores are only applied to the 24 grouped cell types. This is also explained in the method detail page. How was the protein data integrated into a gene-centric atlas?To enable comparison between modalities, UniProt protein identifiers were mapped to Ensembl gene identifiers (Ensembl v109, UniProt release 2022_05) as used by the HPA. Protein groups that could not be uniquely mapped to a single gene were classified as multi-mappable and excluded from downstream analysis. RNA detection was defined as expression ≥ 1 nCPM, and MS-based protein detection as a non-missing intensity value. After filtering, the matched dataset comprised 13,191 RNA-protein pairs across the 27 cell types. How was protein detection defined and classified?A protein value was set to missing for a cell type if more than half of its replicates lacked a measurement for that protein. In general, protein detection was defined as a non-missing intensity value. In total, 13496 genes are listed as detected on protein level. out of which, 2292 are detected in all. The detected proteins were classified into distribution and specificity categories applied throughout the Human Protein Atlas, making this dataset comparable to other Resources presented. Genes/proteins with an elevated detection are divided into three subcategories:
By applying the HPA specificity category, we can define what proteins show elevated levels in the different cell types.
What does it mean when my protein of interest is not detected?A protein not detected here does not necessarily mean it is not expressed in the human body. The DVP atlas profiles 27 cell types across 14 tissues from a single donor, so proteins predominantly expressed in cell types or tissues outside this selection will be absent from the dataset. In addition, mass spectrometry has an inherent detection limit, and proteins of very low abundance can fall below it even in cell types where they are biologically present. Finally, the problem with your protein of interest may also be the gene mapping, if the uniprot ID belongs to a protein group or if there is a miss-match with the translation into gene ID, or versions of Uniprot and/or Ensembl. How are RNA and protein integrated?How were RNA and protein data matched?For each of the 27 DVP cell types, the best-matching cluster(s) from the single-cell and single-nuclei RNA sequencing datasets of the Single cell type data set were selected. In most cases one or more dedicated clusters within the same tissue could be directly assigned. In a few cases multiple cell type clusters were combined to capture anatomically composite structures, such as glomeruli which integrate podocytes and glomerular endothelial cells. This resulted in over 13,000 RNA-protein pairs, allowing side-by-side quantitative comparison across cell types. Read more and see details of overlap vs non-overlap on the RNA-Protein comparison page.
Figure 3. Schematic of data integration strategy. For each cell type analyzed by DVP, matching single-cell or single-nuclei RNA sequencing clusters were selected from the Human Protein Atlas. Lymph node macrophages are shown as example. Created in BioRender. Weiss, C. (2026) https://BioRender.com/7wku96h How can users compare RNA and protein?On each gene-summary page, RNA expression (nCPM) and protein abundance (intensity) are shown side-by-side across the 27 cell types. To support quantitative comparison beyond visual inspection, two agreement scores are provided per RNA-protein pair: i) a detection agreement score (ADet) capturing in how many cell types both modalities agree on whether a gene is detected, and ii) an expression agreement score (AExp) capturing whether both modalities assign above- or below-median expression to the same cell types. Both scores range from 0 to 1, with values closer to 1 indicating higher concordance between RNA and protein across cell types. In addition, the HPA cell type specificity categories were applied analogously to both modalities and each RNA-protein pair is assigned to one of four overlap categories: elevated with overlapping cell types, elevated with non-overlapping cell types, elevated/low-specificity mix, or low specificity in both modalities. What insights emerged from the comparison?Spearman correlations between matched RNA and protein profiles were expectatly modest per cell type, with lower values consistently observed for cell types profiled by single-nuclei compared to single-cell RNA sequencing. Nevertheless, each transcriptome most frequently showed its highest correlation with the corresponding proteome, confirming the biological coherence of the matched pairings. Beyond these per-cell-type correlations, pathway-level patterns were strikingly consistent across all 27 cell types: metabolic pathways were systematically more highly represented at the protein level, while signalling pathways were more prominent at the transcript level. This indicates that RNA-protein divergence is shaped by pathway membership rather than cell type identity. Identity-defining gene programs showed the strongest RNA–protein agreement, whereas a substantial fraction of elevated proteins were assigned to different cell types depending on the modality, underscoring the value of a matched RNA–protein resource. Data overviewThere are several lists available for download, providing a complete overview and expression data across the different level of details.
|