Tissue resource - Methods summaryThe Tissue resource contains immunohistochemistry (IHC)-based protein expression profiles covering 45 normal tissues, mRNA expression data from 55 tissues derived mainly from deep sequencing of mRNA, and MS-based proteomics data representing 20 different tissues. In addition, fluorescent multiplex IHC (mIHC/IF) has been performed for 1106 proteins in ciliated cells, kidney, testis and salivary gland. Key publication: Uhlén M et al. (2015) “Tissue-based map of the human proteome.” Science 347(6220):1260419.
What can you learn from the Tissue resource?Learn about:
Data overviewHow has the data been generated?Immunohistochemistry on tissue microarraysThe protein expression data covering 45 normal human tissue types was derived from antibody-based protein profiling using conventional brightfield IHC. Tissue microarrays of 1 mm samples were stained with DAB (3,3'-diaminobenzidine)-labeled antibodies and counterstained with hematoxylin. Each tissue type is represented by samples from three individuals, with the exception of endometrium, skin, soft tissue and stomach, each represented by samples from six individuals. For selected proteins, additional tissues were stained, including mouse brain, human lactating breast, eye, thymus and extended samples of endometrium, adrenal gland, skin and brain. All IHC-stained sections from tissue microarrays were scanned to allow for subsequent analysis and presentation at the HPA web portal. All tissue samples were collected and handled in accordance with Swedish laws and regulations and obtained from the Department of Clinical Pathology, Uppsala University Hospital, Uppsala, Sweden as part of the sample collection governed by the Uppsala Biobank. All tissue samples were anonymized in accordance with an approval and advisory report from the Uppsala Ethical Review Board. Read more about the generation of IHC images Read more about antibody generation RNA expression dataThe transcriptomics expression data was derived from three different sources. The HPA dataset was generated in-house based on mRNA samples from 40 normal tissues extracted from frozen tissue sections, which were obtained and handled as described above for immunohistochemistry. The GTEx dataset was imported from the Genotype-Tissue Expression consortium and consists of samples from 37 normal tissues. Both of these datasets were produced by deep sequencing of mRNA (RNA-seq). In addition, the transcriptomics dataset from the FANTOM5 Consortium, based on Cap Analysis of Gene Expression (CAGE) in 60 normal tissues, was also imported. Read more about generation of tissue transcriptomics data Read more about GTEx and FANTOM data
MS-based proteomicsThe MS-based protein detection data covers 20 tissues from one healthy female donor. These include the 14 tissues also profiled at cell type resolution by Deep Visual Proteomics (DVP, presented in the Single Cell Resource) together with six additional organs such as spleen and different layers of the gastrointestinal tract. For each tissue, samples were processed in bulk, capturing the full mixture of cell types present in the intact organ. Relative protein abunfances was then measured on a high-sensitivity Orbitrap Astral Zoom mass spectrometer. Raw data have been deposited in the ProteomeXchange Consortium via the PRIDE partner repository (dataset identifier PXD074522). Read more about data processing of proteomics data How has the data been analyzed?Primary dataImages of the IHC-stained tissue samples were manually annotated with regard to staining intensity (negative, weak, moderate or strong), fraction of stained cells, defined of relevance for each tissue type (<25%, 25-75% or >75%) and subcellular localization (nuclear and/or cytoplasmic/membranous) in the annotated cell types. For a number of proteins (7593), an in-depth characterization of the spatial distribution of protein expression in selected tissues of the standard tissue microarray (testis, cerebellum, bronchus, nasopharynx, fallopian tube, placenta, kidney, intestine and skin) was performed, where in total 57 new cell types or cell structures were annotated. Read more about annotation of IHC staining Knowledge-based annotation of protein expressionTo create a comprehensive, knowledge-based, overview of protein expression in normal tissues for each gene, the primary annotation of IHC images from one or several available antibodies were stringently evaluated together with RNA-seq data from internal and external sources and available protein/gene characterization data (with special emphasis on RNA-seq data). Based on this evaluation, a reliability score for each annotated protein expression profile was set as either Enhanced, Supported, Approved, or Uncertain. “Enhanced” reliability score was assigned to genes where at least one antibody has been validated using orthogonal or independent enhanced validation methods. Read more about the knowledge-based annotation Read more about antibody validation and validation of IHC
RNA expression dataThe RNA expression values included in each RNA dataset (HPA, GTEx and FANTOM5) were mapped to corresponding genes in the Ensembl version used in the Human Protein Atlas. The HPA and GTEx datasets were processed in a normalization pipeline in order to be combined into a consensus dataset, allowing for a clear comparison of gene expression across 51 human tissues. The FANTOM5 dataset is normalized using a separate pipeline and also presented separately. Read more about normalisation of transcriptomics data MS-based proteomicsTo enable comparison between modalities, UniProt protein identifiers were mapped to Ensembl gene identifiers (Ensembl v109, UniProt release 2022_05) as used by the HPA. Protein groups that could not be uniquely mapped to a single gene were classified as multi-mappable and excluded from downstream analysis. For each tissue types, three biological replicates were analysed and later combined into one representative value (median). A protein value was set to missing for a tissue type if more than half of its replicates lacked a measurement for that protein. In general, protein detection was defined as a non-missing intensity value. Read more about data processing of proteomics data How quantitative is the MS-based proteomics?Mass spectrometry identifies proteins via their constituent peptides and reports relative protein abundances for every protein identified in a given tissue, measured in intensities. Several thousand proteins are quantified per tissue, covering a wide dynamic range from highly expressed structural and metabolic proteins down to low-abundance regulators such as transcription factors. Values are normalized across all 20 tissues using a label-free quantification approach (directLFQ), so that protein abundances can be compared directly between organs. How does the bulk MS data overlap with the DVP data in the Single Cell Resource?The bulk MS data and the cell type-resolved DVP data share 14 of the 20 tissues, providing two complementary views of the same organs. The bulk measurements capture protein abundances averaged across all cell types present in a tissue, while the DVP data resolve protein abundances within individual cell types. Due to their different nature, the two also differ in sample preparation and MS acquisition: bulk samples are processed using SPEC (Solid-Phase Extraction Capture) from whole tissue pieces, whereas DVP isolates defined cell populations from tissue sections by laser microdissection, each coupled to MS acquisition methods optimized for the respective input amounts. What is presented in the resource?
MS-based proteomics profile across 20 different tissues is provided for 13573 genes with detected protein levels.
How was fluorescent multiplex immunohistochemistry data (mIHC/IF) performed and analyzed?For 1106 proteins mIHC/IF has been utilized to further increase the information regarding the spatial expression of proteins in tissues, and also to relate the expression to cellular states or distinct cellular phenotypes. By using an iterative staining-stripping method, the mIHC/IF staining technology allows for determining co-localization between proteins expressed in subsets of cells indistinguishable from one another in conventional brightfield IHC. In detail, a robust 5-plex antibody panel was generated by sub-clustering scRNA-seq data. By combining in-depth characterized proteins previously analyzed with conventional IHC, an antibody panel was generated for studying the spatial cell type, subcellular or state-specific localization. These panels have been generated for the cilia specific subcellular localization, kidney cell types, pancreatic islet cell types,spermatogenesis cell-states in three panels, sertoli subcellular structures and salivary gland cell types. For more details about the different panels visit the multiplex section of the Tissue resource. The localization of the unknown protein of interest was manually annotated in tissue microarrays typically consisting of doublet 1 mm cores from three patients for each tissue stained with the panel. The protein for characterization is changed while the panel markers remain constant. The annotation analyzed i) fraction of unknown-marker stained cells that overlap with panel markers (<25%, 25-75% or >75%), and if applicable ii) subcellular localization (nuclear and/or cytoplasmic/plasma membrane/membrane). Read more about mIHC/IF annotation and the panel markers
Has the MS-based protein data been compared to the IHC-based protein data?Yes. We asked whether proteins shown by IHC to be present were also detected by MS, stratified by the HPA antibody reliability annotation (Enhanced, Supported, Approved, Uncertain). This comparison was performed both globally and on a per-tissue basis. MS recovery follows the reliability hierarchy: the highest recovery is seen for proteins covered by Enhanced antibodies, with progressively lower recovery for Supported, Approved and Uncertain. Supporting the possibility of using the MS-based prpteomics data as orthogonal method, when validating antibody staining. How has the classification of all protein-coding genes been done?A genome-wide classification of the protein-coding genes with regard to tissue distribution as well as specificity has been performed using between-sample normalized data. The results can serve as a reference for researchers interested in expression profiles in any of all the major tissues and organs. The genes were classified according to specificity into (i) tissue-enriched genes with at least fourfold higher expression levels in one tissue type as compared with any other analysed tissue; (ii) group-enriched genes with enriched expression in a small number of tissues (2 to 5); and (iii) tissue-enhanced genes with only moderately elevated expression. In addition, all genes were classified according to distribution in which each gene is scored according to the presence (expression levels higher than a cut-off) in the normal tissues. Read more about classification of transcriptomics data The same classification stretegy was applied on the MS-based proteomics data, defining the tissue elevated proteins within the 20 tissue types included. Read more about data processing of proteomics data Additionally, a classification based on expression clusters is also included, where each gene is scored based on similarity of expression across all normal tissues. The results are presented as an UMAP cluster plot (see below figure). By clicking on any clusters, the visitor can access an interactive version of the UMAP and details about all clusters . Read more about gene expression clustering
|