Tissue - Methods summary

The Tissue section of the Atlas contains immunohistochemistry (IHC)-based protein expression profiles covering 44 normal tissues and mRNA expression data from 54 tissues derived mainly from deep sequencing of mRNA. In addition, fluorescent multiplex IHC (mIHC/IF) has been performed for 742 proteins in testis and kidney.

Key publication: Uhlén M et al. (2015) “Tissue-based map of the human proteome.” Science 347(6220):1260419.

What can you learn from the Tissue section?

Learn about:

  • protein localization in tissues at single-cell level
  • in-depth protein localization in testis and kidney based on multiplex profiling
  • a catalogue of genes enriched in a particular tissue (specificity)
  • which genes have a similar expression profile across tissues (expression cluster)

How has the data been generated?

Immunohistochemistry on tissue microarrays

The protein expression data covering 44 normal human tissue types was derived from antibody-based protein profiling using conventional brightfield IHC. Tissue microarrays of 1 mm samples were stained with DAB (3,3'-diaminobenzidine)-labeled antibodies and counterstained with hematoxylin. Each tissue type is represented by samples from three individuals, with the exception of endometrium, skin, soft tissue and stomach, each represented by samples from six individuals. For selected proteins, additional tissues were stained, including mouse brain, human lactating breast, eye, thymus and extended samples of endometrium, adrenal gland, skin and brain. All IHC-stained sections from tissue microarrays were scanned to allow for subsequent analysis and presentation at the HPA web portal. All tissue samples were collected and handled in accordance with Swedish laws and regulations and obtained from the Department of Clinical Pathology, Uppsala University Hospital, Uppsala, Sweden as part of the sample collection governed by the Uppsala Biobank. All tissue samples were anonymized in accordance with an approval and advisory report from the Uppsala Ethical Review Board.

RNA expression data

The transcriptomics expression data was derived from three different sources. The HPA dataset was generated in-house based on mRNA samples from 40 normal tissues extracted from frozen tissue sections, which were obtained and handled as described above for immunohistochemistry. The GTEx dataset was imported from the Genotype-Tissue Expression consortium and consists of samples from 37 normal tissues. Both of these datasets were produced by deep sequencing of mRNA (RNA-seq). In addition, the transcriptomics dataset from the FANTOM5 Consortium, based on Cap Analysis of Gene Expression (CAGE) in 60 normal tissues, was also imported.

How has the data been analyzed?

Primary data

Images of the IHC-stained tissue samples were manually annotated with regard to staining intensity (negative, weak, moderate or strong), fraction of stained cells, defined of relevance for each tissue type (<25%, 25-75% or >75%) and subcellular localization (nuclear and/or cytoplasmic/membranous) in the annotated cell types. For a number of proteins (7567), an in-depth characterization of the spatial distribution of protein expression in selected tissues of the standard tissue microarray (testis, cerebellum, bronchus, nasopharynx, fallopian tube, placenta, kidney, intestine and skin) was performed, where in total 57 new cell types or cell structures were annotated.

Knowledge-based annotation of protein expression

To create a comprehensive, knowledge-based, overview of protein expression in normal tissues for each gene, the primary annotation of IHC images from one or several available antibodies were stringently evaluated together with RNA-seq data from internal and external sources and available protein/gene characterization data (with special emphasis on RNA-seq data). Based on this evaluation, a reliability score for each annotated protein expression profile was set as either Enhanced, Supported, Approved, or Uncertain. “Enhanced” reliability score was assigned to genes where at least one antibody has been validated using orthogonal or independent enhanced validation methods.

RNA expression data

The RNA expression values included in each RNA dataset (HPA, GTEx and FANTOM5) were mapped to corresponding genes in the Ensembl version used in the Human Protein Atlas. The HPA and GTEx datasets were processed in a normalization pipeline in order to be combined into a consensus dataset, allowing for a clear comparison of gene expression across 50 human tissues. The FANTOM5 dataset is normalized using a separate pipeline and also presented separately.

What is presented in the section?

Knowledge-based protein expression profiles for 15,320 genes (78% of human protein-coding genes) provide a best estimate of protein expression in all major tissues and organs. All images of IHC-stained normal tissues are available together with primary annotation data and sample information. RNA expression data is presented for all protein-coding genes in a total of 54 tissue types in the consensus dataset based on integration of the HPA and GTEx data, as well as in each of these datasets separately, including results and information of each individual sample. In addition, FANTOM5 transcriptomics data in 60 tissues is presented separately.

How was fluorescent multiplex immunohistochemistry data (mIHC/IF) performed and analyzed?

For 742 proteins in testis (n=592) and kidney (n=162), mIHC/IF has been utilized to further increase the information regarding the spatial expression of proteins in tissues, and also to relate the expression to cellular states or distinct cellular phenotypes. By using an iterative staining-stripping method, the mIHC/IF staining technology allows for determining co-localization between proteins expressed in subsets of cells indistinguishable from one another in conventional brightfield IHC.

In detail, a robust 5-plex antibody panel was generated by sub-clustering scRNA-seq testis and kidney data. By combining in-depth characterized proteins previously analyzed with conventional IHC in testis and kidney, an antibody panel was generated for studying the spatial cell type/state-specific localization.

For testis, four antibody panels were generated, spermatogonia, spermatocytes, spermatids, and Sertoli cell panels, to characterize when and where proteins are expressed during spermatogenesis. For the spermatogonia panel, protein expression throughout the development of spermatogonia was analyzed, i.e. from spermatogonial stem cell state (states 0-1) to active spermatogonial states (states 2-4) representing cells that are preparing to enter the first meiotic phase of spermatogenesis. For the spermatocyte panel, protein expression during the spermatocyte differentiation and development was highlighted and used to differentiate protein expression in order to determine if proteins are related to earlier or later phases of spermatocyte differentiation. For the spermatid panel, the protein expression during the later phases of spermatogenesis and transformation into flagellated sperm cells (spermiogenesis) was mapped. Lastly, the Sertoli cell panel was utilized to characterize the protein expression in Sertoli cells and decipher protein expression patterns exclusive to Sertoli cells, and highlight those proteins that are also expressed in the surrounding spermatogenic cells. This was performed by labeling the Sertoli nuclei, cytoplasm, and cell membrane, as well as germ cells (spermatogonia, spermatocytes, and spermatids).

For kidney, the antibody panel was generated to profile the different renal tubules (collecting ducts, distal and proximal tubules) as well as cells within the glomerular compartment (podocytes and endothelial cells).

The localization of the unknown protein of interest was manually annotated in tissue microarrays consisting of doublet 1 mm cores from three patients. The protein for characterization is changed while the panel markers remain constant. The annotation analyzed i) fraction of unknown-marker stained cells that overlap with panel markers (<25%, 25-75% or >75%), and ii) subcellular localization (nuclear and/or cytoplasmic/plasma membrane/membrane).

How has the classification of all protein-coding genes been done?

A genome-wide classification of the protein-coding genes with regard to tissue distribution as well as specificity has been performed using between-sample normalized data. The results can serve as a reference for researchers interested in expression profiles in any of all the major tissues and organs. The genes were classified according to specificity into (i) tissue-enriched genes with at least fourfold higher expression levels in one tissue type as compared with any other analysed tissue; (ii) group-enriched genes with enriched expression in a small number of tissues (2 to 5); and (iii) tissue-enhanced genes with only moderately elevated expression. In addition, all genes were classified according to distribution in which each gene is scored according to the presence (expression levels higher than a cut-off) in the normal tissues. Finally, a new classification based on expression clusters has recently been introduced in which each gene is scored based on similarity of expression across all normal tissues. The results are presented as an UMAP cluster plot (see below figure). By clicking on any clusters, the visitor can access an interactive version of the UMAP and details about all clusters .

In the below figure, the number of tissue-enriched and group-enriched genes are shown, in red and orange, respectively.