DVP method details Samples Images and markers Method details Cell type labelling by immunofluorescence Laser microdissection Mass spectrometry Data analysis Integration with RNA sequencing data Selection of matching RNA clusters Matching DVP samples with cell type clusters

DVP methods

DVP method details

What is Deep Visual Proteomics?

Deep Visual Proteomics (DVP) is a spatial proteomics technology that allows investigation of selected cells within their native tissue context. It combines high-resolution tissue imaging, AI-guided cell segmentation, laser microdissection and ultra-high-sensitivity mass spectrometry. By bridging the world of MS-based proteomics and imaging, DVP enables proteome quantification while preserving spatial and morphological context. Although DVP can resolve individual cells, this atlas pools cells of the same type per sample to maximize proteome depth and reproducibility.

Samples

All tissues analysed in this dataset were obtained from a single young female donor with no known acute or chronic disease. A single-donor design was chosen to eliminate inter-individual variability in genetic background. Post-mortem tissue collection was performed approximately 15 hours after death, following a predefined anatomical sampling scheme. Collected tissues used here were bone marrow, cerebrum, coronary artery, heart, skeletal muscle, kidney, liver, lung, lymph node, pancreas, ovary, fallopian tube, salivary gland and skin.

Tissue preparation for DVP

All tissues were processed as formalin-fixed paraffin-embedded (FFPE) samples. Specimens were fixed, dehydrated, cleared and infiltrated with paraffin using an automated tissue processor, with embedding performed under controlled orientation to preserve tissue architecture. FFPE blocks were sectioned at 3 µm thickness, routine haematoxylin and eosin (H&E) staining was performed on adjacent sections to confirm tissue integrity, morphology and preservation quality.

Image-driven cell selection

Images were imported into BIAS (Biological Image Analysis Software), cell segmentation was performed on a per-cell-type basis using one of three approaches, chosen according to morphology and marker expression. i) For cell types with well-defined cellular structures, custom-trained Cellpose models (v2.0) were used for automated segmentation. ii) For cell types with irregular morphology, intensity thresholding in BIAS was applied. iii) In the few and clear cases manual segmentation was performed.

Images and markers

The full size images are too large to explore here in full resolution, all maging data have been deposited to BioImages with the accession number S-BIAD2869. Tissue section overviews are available in the atlas as thumbnails, on gene summary pages, where a cropped representative area is selected for zoomable high-resolution exploration. Below we show smaller examples showcasing the used markers and overlaping segmentation mask used per tissue and cell type. Cell nuclei is SYTOX Green nuclear counterstain is shown in cyan, and cell specific markers are shown in magenta. Respective markers are mentioned per cell type shape. Note, the size of each image vary, follow the links for zoomable exploration.

Brain

Three different cell types were sampled from the cerebral cortex.

Neurons

RBFOX3, also called NeuN, was used to label neurons in the cortical section. Neurons were sampled with a focus on the cell body, due to the complex 3D shape of neurons, and for this reason many synapse-related proteins can be found at a higher level in the astrocyte and microglia samples, while the neuronal cell body includes proteins supporting the metabolically active neuron.

Astrocytes and neuropil

GFAP was used as a astrocyte marker. Shapes are based on the labeling, and due to astrocyte's close proximity to synapse and vasculature, the astrocyte cell shapes will include synapse-related proteine. Therefore, we call it astrocytes and neuropil.

Microglia and neuropil

TMEM119 was used as a microglia marker. Shapes are based on the labeling, and due to microglia's close proximity to neuronal cell bodies as well as synapses, the microglia cell shapes include neuronal proteins. Therefore, we call it microglia and neuropil.

Pancreas

Pancreatic islets and epithelial cells (mainly exocrine) were sampled from pancreas.

Pancreatic islet cells

INS was used as an islet marker and used for cell shape segmentation. In the zoomable image, glucagone (GCG) is also included. Note, insulin (INS) is not detected in the protein data set, likely due to degradation by post-mortem proteolysis during the ~15 h interval before tissue fixation, falling below the detection threshold of the mass spectrometry workflow. The area for islet shape sampling was generally done away from the exocrine sampling. The pancreatic islet sample does include some pancreatic epithelial cells, which can be observed when exploring markers for exocrine pancreas, such as CPA2.

Pancreatic epithelial cells

EPCAM was used as a general epithelial marker, and used as guidence for the sampling of exocrine pancreas/epithelial cells. Although the zoomable image shows how cell mask also include areas inside pancreatic islets, we see very little overlap for islet markers such as GCG.

Kidney

The kidney is represented by the 4 main structures, Glomeruli, proximal and distal tubles were all sampled from the cortical area of the kidney, while the collecting ducts were sampled from the medulla area of the kidney section. Glomeruli was sampled from one section while the other shapes were from a consecutive section.

Glomeruli

PODXL was used for labeling the podocytes of glomeruli. Whole glomeruli shapes were collected, and potentially a smaller number of proximal tubules were also included, explaining some of the overlap in proximal-specific proteins in the glomeruli sample, such as PDZK1.

Proximal tubules

AQP1 was used as proximal tubule marker.

Distal tubules

EPCAM was used as a marker for distal tubules.

Collecting ducts

L1CAM was used as a marker for collecting ducts.

Coronary artery (vasculature)

Smooth muscle cells

ACTA2 was used as a marker for smooth muscle cells. However, ACTA2 is not available in the dataset, due to shared extensive sequence similarity with the other α-actin isoforms, so the tryptic peptides detected by mass spectrometry cannot be uniquely assigned to ACTA2 and are reported within the ACTA1 protein group during protein inference.

Heart

Cardiomyocytes

TNNT2 as well as lectin (WGA) labeling for membranes was used as markers to segment the heart muscle fibers.

Capillaries

CD34 was used as a marker for capillaries in the heart muscle.

Skeletal muscle

Skeletal myofiber

A lectin (WGA) labeling was used for membrane detection, to identify the skeletal muscle fibers.

Lung

Type 1 and 2 alveolar cells were sampled separately, using different markers. They are displayed separately on gene summary pages, but combined (by max-value) representing alveolvar cells in the specificity category calculations.

Alveolar cell type 1

AGER was used as a marker for type 1 alveolar cells.

Alveolar cell type 2

SFTPC was used as a marker for type 2 alveolar cells.

Salivary gland

Large ducts

ATP6V1B1 was used as a marker for large ducts in the salivary gland.

Liver

Hepatocytes

A pan-cadherin marker was used for labeling hepatocytes in the liver.

Ovary

Oocytes

ZP3 was used as a marker for oocytes. Oocytes were gathered from multiple sections to achive enough sample for analysis. Note, the oocyte sample also includes thin granulosa cell layer of the primary follicle, thus shows overlap with the granulosa sample for some proteins, but not all, one such example is FOXL2.

Granulosa cells

ELK1 was used as a marker for granulosa cells. The granulosa samples is from one late stage follicle, and does not fully represent the complete spectrum of granulosa cell proteins.

Fallopian Tube

The sampling of cells from the fallopian tube was separated into two samples, one enriched for ciliated cells and one enriched for secretory cells. They are displayed separately on gene summary pages, but combined (by max-value) representing fallopian epithelial cells in the specificity category calculations.

Ciliated cells

FOXJ1 was used as a marker for ciliated cells. The sample is enriched for ciliated cells but also includes secretory cells, confirmed when exploring secretory specific proteins, such as OVGP1.

Secretory cells

EPCAM was used as a marker for secretory cells. The sample is enriched for secretory cells but also includes ciliated cells, confirmed when exploring cilia specific proteins, such as DNAH9.

Bone marrow

A bone marrow smear was used for sampling MCEMP1-positive immune cells, enriching for neutrophils.

Neutrophils

MCEMP1 was used as a marker, and cells with strong signal was selected.

Lymph node

Two separate sections were used for sampling different immune cells, one for B- and T- cells and one for macrophages.

B-cells

MS4A1 was used as a B-cell marker and positive cells were selected.

T-cells

Two types of T-cells were sampled from the lymph node, CD4 and CD8 positive cells. They are displayed separately on gene summary pages, but combined (by max-value) representing T-cells in the specificity category calculations.

CD4 was used as a marker for CD4+ T cells.

CD8 was used as a marker for CD8+ T cells.

Marcophages

CD163 was used as a marker for macrophages.

Skin

Keratinocytes

CD44 and a pan-cytokeratin marker was used for labeling the skin. SOX10 was used as a marker for melanocytes, to avoid in the segmentation process.

Method details

Cell type labelling by immunofluorescence

Cell type-specific marker antibodies were selected to identify the 27 cell types covered by this dataset, with between one and three markers co-stained per slide together with a SYTOX Green nuclear counterstain. In detail, sections were deparaffinised, subjected to heat-induced antigen retrieval, blocked in 5% BSA/PBS, and incubated with primary and fluorophore-conjugated secondary antibodies before nuclear counterstaining. Stained sections were imaged as whole slides by widefield microscopy on a Zeiss Axioscan 7, with channels acquired according to the panel used for each tissue. Tiles were stitched into single CZI images.

Laser microdissection

Cell contours were imported into a Leica LMD7 system operated with LMD v8.5 software and aligned based on the three manually selected reference points. Laser parameters (power, speed, head current and related settings) were optimised for each tissue and cell type to ensure clean excision while minimising tissue damage. Excised cells were collected into 384-well plates. For each cell type replicate, a total of 100,000 µm² of individually isolated cells was pooled. This area-based approach accommodates the wide morphological diversity of human cell types and enables dataset-wide normalisation downstream. Five replicates were collected per cell type where material and cell abundance allowed, with fewer replicates for rare populations.

Mass spectrometry

Collection plates were washed with acetonitrile and vacuum-dried, followed by cell lysis and protein digested by a LysC/trypsin mixture. Peptides were purified on Evotip Pure tips (EvoSep) following the manufacturer’s protocol. Peptides were separated on an Evosep One system coupled to an Orbitrap Astral Zoom mass spectrometer (Thermo Fisher Scientific) via an EASY-Spray source. An Aurora Rapid C18 column (5 cm × 75 µm, 1.7 µm particle size, IonOpticks) was operated at 60 °C using the Whisper Zoom 80 samples-per-day gradient. MS1 scans were recorded at 240,000 resolution over 380–980 m/z. A FAIMS Pro interface was operated at −40 V compensation voltage, and fragment spectra were acquired using 100 variable-width DIA isolation windows covering the precursor range, with 10 ms maximum injection time and 25% HCD collision energy.

Data analysis

Raw data processing

All raw files were processed with DIA-NN (v1.9.2) in library-free mode using a deep-learning-predicted spectral library generated from the reviewed human reference proteome (UniProt UP000005640, downloaded March 2024). All replicates from a given cell types were searched together in a single run with match-between-runs and peptidoform analysis enabled. Protein intensities were subsequently normalised across all samples using directLFQ (v0.2.19) with default settings. No imputation of missing values was applied. Figure idea: MS-based proteomics and DIANN (would need to be done)

Aggregation to cell type-level proteomes

For each cell type, up to five biological replicates were combined into a single cell type-level proteome by taking the median protein intensity across replicates. A protein value was set to missing for a cell type if more than half of its replicates lacked a measurement for that protein. In general, protein detection was defined as a non-missing intensity value. The core proteome was defined as the set of proteins detected across all 27 cell types, whereas cell type-specific proteins were those detected in a single cell type only.

Cell type specificity classification of proteins

Proteins were classified into cell type distribution and specificity categories following the Human Protein Atlas framework. This harmonises the DVP dataset with the categories used elsewhere in the HPA and supports navigation via the same category filters used for other Resources. Table of distribution and specificity categories criteria

Integration with RNA sequencing data

Selection of matching RNA clusters

For each of the 27 DVP cell types, matching single-cell (scRNAseq) or single-nuclei (snRNAseq) RNA sequencing clusters were selected from the Single cell type data set. Cluster selection was performed independently per tissue, based on the best possible overlap with the DVP sampling, and followed one of three scenarios depending on cell type complexity: i) one RNA cluster per DVP cell type (e.g. oocytes and granulosa cells); ii) multiple RNA clusters of the same cell type per DVP cell type (e.g. hepatocytes); or iii) multiple RNA clusters of different cell types, for anatomically composite structures where a DVP sample integrates several cell types (e.g. glomeruli as a structure of podocytes and glomerular capillary endothelial cells). For scenarios ii) and iii), a weighted mean expression value was calculated across the contributing clusters to generate a pseudo-bulk profile. Cluster assignment was confirmed by correlating gene expression across all available clusters of a tissue with protein abundance from the corresponding DVP sample.

Matching DVP samples with cell type clusters

The selection of cell type clusters were based on cell type shapes and cell type names, where markers were explored to find the clusters with highest expression and matching cell identity. To confirm the selection, all clusters of the respective tissue types were compared with the DVP samples from the corresponding tissue type. Examples of this is displayed here below as correlation heatmaps. Selected clusters are marked in color matching the color for the DVP sample names in the plots. Liver (hepatocytes, ovary (oocytes and granulosa cells) and lymph node (macrophage, B-cells and T-cells) are shown.