MS-based proteomics

MS data overview

The bulk MS dataset comprises 20 tissues profiled by mass spectrometry-based proteomics from a single healthy female donor obtained from the MunIch cardiovaScular StudIes biObaNk (MISSION). Tissue collection and handling were approved by the Ethics Committee of the Technical University of Munich (approval ID 325/18S) and conducted in accordance with the Declaration of Helsinki. Donor information was handled in a fully anonymized manner. The 20 tissues profiled by bulk MS are cerebral cortex, lung, salivary gland, stomach, duodenum, jejunum, ileum, colon, liver, pancreas, kidney, ovary, fallopian tube, coronary artery, heart muscle, skeletal muscle, skin, spleen, lymph node, and bone marrow. Each tissue was analyzed in three biological replicates. The same tissue specimens were used as for the cell type-resolved Deep Visual Proteomics (DVP) analysis presented in the Single Cell Resource, enabling direct integration of bulk and cell type-resolved measurements.

Sample preparation and mass spectrometry

All tissues were processed as formalin-fixed paraffin-embedded (FFPE) samples and sectioned at 3 µm thickness. Sample preparation was performed using Solid-Phase Extraction Capture (SPEC): bulk tissue pieces were scraped from FFPE sections, deparaffinised, lysed, and proteins digested into peptides with LysC/trypsin on a strong anion exchange (SAX) tip. SPEC immobilises peptides in nanoliter volumes, enabling fast, robust and ultra-sensitive sample preparation with minimal losses. Peptides were separated on an Evosep One system coupled to an Orbitrap Astral Zoom mass spectrometer (Thermo Fisher Scientific) and analysed in data-independent acquisition (DIA) mode. Separation was performed on an Aurora Rapid C18 column (5 cm × 75 μm, 1.7 μm, IonOpticks) at 60 °C using the Whisper Zoom 80 SPD gradient. MS1 scans were acquired at 240,000 resolution over 380–980 m/z. Fragment ion spectra were acquired across 200 equidistant 3 Th DIA isolation windows covering the precursor range, with 6 ms maximum injection time and 25% HCD collision energy.

Data processing and normalization

Raw files were processed with DIA-NN (v1.9.2) in library-free mode using a deep-learning-predicted spectral library generated from the reviewed human reference proteome (UniProt UP000005640, downloaded March 2024). Individual replicates tissues were searched together in one search with MBR and peptidoform analysis enabled. The resulting 27 report files were combined and protein intensities were normalized using directLFQ (version 0.2.19) with standard settings. No imputation of missing values was applied. For each tissue, the three biological replicates were combined into a single tissue-level proteome by taking the median protein intensity across replicates. A protein value was set to missing for a tissue if more than half of its replicates lacked a measurement for that protein. Protein detection was defined as a non-missing intensity value.

Classification of MS-based protein data

The bulk MS data was used to classify protein-coding genes according to their tissue-specific protein expression into two dimensions: specificity category and distribution category. The same framework is applied to the transcriptomics data elsewhere in the HPA, enabling direct comparison between modalities. The categorization reflects only the 20 tissues included in this dataset, which is grouped into 16 main tissue types, similar to the grouping done for th bulk RNAseq classification.

Explanation of the specificity category

The MS-based proteomics data was used to classify all genes according to their tissue-specific into two different schemas: specificity category and distribution category.

Category	Description
Enriched	Intensity level in a particular tissue type is at least four times any other tissue type
Group enriched	Intensity level in a group (of 2-5 tissues) is at least four times any other tissue type
Enhanced	Enhanced: Intensity level in a one or several tissues has at least four times the mean of all tissue types
Low specificity	Detected in at least one tissue type but not elevated in any tissue type
Not detected	Missing values in more than half of the replicates in all tissue types

An additional category "elevated", containing all genes in the first three categories (tissue/cell line/cell type enriched, group enriched and tissue/cell line/cell type enhanced), has been used for some parts of the analysis. TS/CS-score (Tissue Specificity/Cell Specificity score) is calculated for “elevated” tissues/cell lines. TS/CS-score is calculated as the fold change from the tissue/cell line with highest RNA to the tissue/cell line with second highest RNA.

Explanation of the distribution category

Category	Description
Detected in single	Detected in a single tissue/region/cell type
Detected in some	Detected in more than one but less than one third of tissues/regions/cell types
Detected in many	Detected in at least a third but not all tissues/regions/cell types
Detected in all	Detected in all tissues/regions/cell types
Not detected	nTPM < 1 in all tissues/regions/cell types

Comparison to IHC-based protein data

The bulk MS data was compared to the antibody-based IHC protein expression profiles in the HPA. The IHC dataset was filtered to the 18 tissues overlapping with the bulk MS sampling (blood vessel lacks IHC annotation, and IHC-annotated small intestine integrates jejunum and ileum). The comparison was based on detection, i.e. whether a protein detected by IHC was also detected by MS. For IHC, all positive staining levels (low, middle, high) were treated as detection. The comparison was performed both globally across all tissues and on a per-tissue basis. To assess whether MS detection reflects antibody quality, the analysis was stratified by HPA antibody reliability tier (Enhanced, Supported, Approved, Uncertain). MS recovery follows the reliability hierarchy: the highest recovery is seen for proteins covered by Enhanced antibodies, with progressively lower recovery for Supported, Approved and Uncertain. This makes bulk MS and IHC complementary readouts of the same proteins and provides an independent reference that can support the re-evaluation of antibodies in the HPA.

Comparison to bulk RNA expression data

The bulk MS data was further compared with bulk RNA expression in the HPA. The comparison was based on the HPA consensus RNA expression data, which integrates HPA in-house RNAseq with GTEx (Lonsdale et al., 2013) RNAseq profiles, restricted to tissues overlapping with the bulk MS sampling. For consistency with the HPA Tissue Resource, intestinal tissues (duodenum, jejunum, ileum and colon) were grouped into a single value using the maximum expression across the group, and spleen and lymph node were grouped into lymphoid tissue. UniProt identifiers were mapped to Ensembl gene identifiers (Ensembl v109, UniProt release 2022_05) as used by the HPA. Protein groups that could not be uniquely mapped to a single gene were classified as multi-mappable and excluded from downstream analysis. RNA detection was defined as expression ≥ 1 nTPM, and MS-based protein detection as a non-missing intensity value. Specificity and distribution categories were applied to the bulk MS and bulk RNA datasets using the same criteria, enabling direct comparison at the category level. Both classifications are available in the resource and can be combined with other filters to explore category overlap between modalities. In addition, detection overlap was assessed across all four modalities of the resource (bulk MS, bulk RNAseq, DVP and sc/snRNAseq) to identify proteins consistently observed, unique to a single modality, or systematically missed.