MS-based proteomicsMS data overviewThe bulk MS dataset comprises 20 tissues profiled by mass spectrometry-based proteomics from a single healthy female donor obtained from the MunIch cardiovaScular StudIes biObaNk (MISSION). Tissue collection and handling were approved by the Ethics Committee of the Technical University of Munich (approval ID 325/18S) and conducted in accordance with the Declaration of Helsinki. Donor information was handled in a fully anonymized manner. The 20 tissues profiled by bulk MS are cerebral cortex, lung, salivary gland, stomach, duodenum, jejunum, ileum, colon, liver, pancreas, kidney, ovary, fallopian tube, coronary artery, heart muscle, skeletal muscle, skin, spleen, lymph node, and bone marrow. Each tissue was analyzed in three biological replicates. The same tissue specimens were used as for the cell type-resolved Deep Visual Proteomics (DVP) analysis presented in the Single Cell Resource, enabling direct integration of bulk and cell type-resolved measurements. Sample preparation and mass spectrometryAll tissues were processed as formalin-fixed paraffin-embedded (FFPE) samples and sectioned at 3 µm thickness. Sample preparation was performed using Solid-Phase Extraction Capture (SPEC): bulk tissue pieces were scraped from FFPE sections, deparaffinised, lysed, and proteins digested into peptides with LysC/trypsin on a strong anion exchange (SAX) tip. SPEC immobilises peptides in nanoliter volumes, enabling fast, robust and ultra-sensitive sample preparation with minimal losses. Peptides were separated on an Evosep One system coupled to an Orbitrap Astral Zoom mass spectrometer (Thermo Fisher Scientific) and analysed in data-independent acquisition (DIA) mode. Separation was performed on an Aurora Rapid C18 column (5 cm × 75 μm, 1.7 μm, IonOpticks) at 60 °C using the Whisper Zoom 80 SPD gradient. MS1 scans were acquired at 240,000 resolution over 380–980 m/z. Fragment ion spectra were acquired across 200 equidistant 3 Th DIA isolation windows covering the precursor range, with 6 ms maximum injection time and 25% HCD collision energy. Data processing and normalizationRaw files were processed with DIA-NN (v1.9.2) in library-free mode using a deep-learning-predicted spectral library generated from the reviewed human reference proteome (UniProt UP000005640, downloaded March 2024). Individual replicates tissues were searched together in one search with MBR and peptidoform analysis enabled. The resulting 27 report files were combined and protein intensities were normalized using directLFQ (version 0.2.19) with standard settings. No imputation of missing values was applied. For each tissue, the three biological replicates were combined into a single tissue-level proteome by taking the median protein intensity across replicates. A protein value was set to missing for a tissue if more than half of its replicates lacked a measurement for that protein. Protein detection was defined as a non-missing intensity value. Classification of MS-based protein dataThe bulk MS data was used to classify protein-coding genes according to their tissue-specific protein expression into two dimensions: specificity category and distribution category. The same framework is applied to the transcriptomics data elsewhere in the HPA, enabling direct comparison between modalities. The categorization reflects only the 20 tissues included in this dataset, which is grouped into 16 main tissue types, similar to the grouping done for th bulk RNAseq classification. Explanation of the specificity categoryThe MS-based proteomics data was used to classify all genes according to their tissue-specific into two different schemas: specificity category and distribution category.
Explanation of the distribution category
Comparison to IHC-based protein dataThe bulk MS data was compared to the antibody-based IHC protein expression profiles in the HPA. The IHC dataset was filtered to the 18 tissues overlapping with the bulk MS sampling (blood vessel lacks IHC annotation, and IHC-annotated small intestine integrates jejunum and ileum). The comparison was based on detection, i.e. whether a protein detected by IHC was also detected by MS. For IHC, all positive staining levels (low, middle, high) were treated as detection. The comparison was performed both globally across all tissues and on a per-tissue basis. To assess whether MS detection reflects antibody quality, the analysis was stratified by HPA antibody reliability tier (Enhanced, Supported, Approved, Uncertain). MS recovery follows the reliability hierarchy: the highest recovery is seen for proteins covered by Enhanced antibodies, with progressively lower recovery for Supported, Approved and Uncertain. This makes bulk MS and IHC complementary readouts of the same proteins and provides an independent reference that can support the re-evaluation of antibodies in the HPA. Comparison to bulk RNA expression dataThe bulk MS data was further compared with bulk RNA expression in the HPA. The comparison was based on the HPA consensus RNA expression data, which integrates HPA in-house RNAseq with GTEx (Lonsdale et al., 2013) RNAseq profiles, restricted to tissues overlapping with the bulk MS sampling. For consistency with the HPA Tissue Resource, intestinal tissues (duodenum, jejunum, ileum and colon) were grouped into a single value using the maximum expression across the group, and spleen and lymph node were grouped into lymphoid tissue. UniProt identifiers were mapped to Ensembl gene identifiers (Ensembl v109, UniProt release 2022_05) as used by the HPA. Protein groups that could not be uniquely mapped to a single gene were classified as multi-mappable and excluded from downstream analysis. RNA detection was defined as expression ≥ 1 nTPM, and MS-based protein detection as a non-missing intensity value. Specificity and distribution categories were applied to the bulk MS and bulk RNA datasets using the same criteria, enabling direct comparison at the category level. Both classifications are available in the resource and can be combined with other filters to explore category overlap between modalities. In addition, detection overlap was assessed across all four modalities of the resource (bulk MS, bulk RNAseq, DVP and sc/snRNAseq) to identify proteins consistently observed, unique to a single modality, or systematically missed. |