ASSAYS AND ANNOTATION

Immunohistochemistry - tissues

Immunohistochemistry - cells

Immunofluorescence - cells

Western blot

Protein array

RNA

Evidence




Immunohistochemistry - tissues

Description

The protein atlas contains histological images of sections from human tissues. The images represent a view similar to what is seen in a microscope when examining sections of tissue on glass slides. Each antibody in the database has been used for immunohistochemical staining of both normal and cancer tissue. The specific binding of an antibody to its corresponding antigen results in a brown-black staining. The tissue section is counterstained with hematoxylin to enable visualization of microscopical features. Hematoxylin staining is unspecific and results in a blue coloring of both cells and extracellular material.

Tissue microarrays provide the possibility to immunohistochemically stain a large number and variety of normal and cancer tissues (movie about tissue microarray production and immunohistochemical staining). The generated tissue microarrays include samples from 46 different normal tissue types from 138 individuals and 20 different types of cancer from 216 patients, each sample represented by a 1 mm core, in total 570 images. For each antibody, the protein expression pattern in normal tissue is represented by triplicate samples and protein expression is annotated in 76 different normal cell types present in these tissues. For cancer tissues each tumor is represented in duplicate samples and protein expression is annotated in tumor cells. Normally, a smaller fraction of the 570 images are missing for each antibody due to technical issues. Specimens containing normal and cancer tissue have been collected and sampled from anonymized paraffin embedded material of surgical specimens, in accordance with approval from the local ethics committee.

Since specimens are derived from surgical material, normal is here defined as non-neoplastic and morphologically normal. It is not always possible to obtain fully normal tissues and thus several of the tissues denoted as normal will include alterations due to inflammation, degeneration and tissue remodeling. In rare tissues, hyperplasia or benign proliferations are included as exceptions. It should also be noted that within normal morphology there exists inter-individual differences and variations due to primary diseases, age, sex etc. Such differences may also effect protein expression and thereby immunohistochemical staining patterns.

Samples from cancer are also derived from surgical material. The inclusion of tumors has been based on availability and representativity. Due to subgroups and heterogeneity of tumors within each cancer type, included cases represent a typical mix of specimens from surgical pathology. However, an effort has been made to include high and low grade malignancies where such is applicable. In certain tumor groups, subtypes have been included, e.g. breast cancer includes both ductal and lobular cancer, lung cancer includes both squamous cell carcinoma and adenocarcinoma and liver cancer includes both hepatocellular and cholangiocellular carcinoma etc. Tumor heterogenity and inter-individual differences is also reflected in diverse expression of proteins resulting in variable immunohistochemical staining patterns.

Annotation

In order to provide an overview of protein expression patterns, all images of immunohistochemically stained tissue were manually annotated by a board certified pathologist or by specially educated personnel followed by verification of a pathologist. The pathologists are experienced in interpretation of tissue morphology under the microscope and have used a specially designed software to view and annotate the histological images. Annotation of each different normal and cancer tissue was performed using a simplified scheme for classification of immunohistochemical outcome. Each tissue was examined for representativity and immunoreactivity. The different tissue specific cell types included in each normal tissue type were annotated. For each cancer, tumor cells were annotated. Basic annotation parameters included an evaluation of i) staining intensity (negative, weak, moderate or strong), ii) fraction of stained cells (rare, <25%, 25-75% or >75%) and iii) subcellular localization (nuclear and/or cytoplasmic/membranous). The manual annotation also provides a summarizing text comment for each antibody.

The terminology and ontology used is compliant with standards used in pathology and medical science. SNOMED classification has been used for assignment of topography and morphology. SNOMED classification also underlies the given original diagnosis from which normal as well as cancer samples were collected from.

A histological dictionary used in the annotation is available as a PDF-document, containing images which are immunohistochemically stained with antibodies included in the protein atlas. The dictionary displays subtypes of cells distinguishable from each other and also shows specific expression patterns in different intracellular structures. Annotation dictionary: screen usage (15MB), printing (95MB).

Knowledge-based annotation

Annotated protein expression aims to create a comprehensive map over protein expression patterns in normal human tissues and cells. The conflation of data from two or more antibodies directed towards the same protein target (non-overlapping epitopes), evaluation of the performance of respective antibodies and a review of available protein/gene characterization data, allows for a knowledge-based interpretation of the distribution pattern and relative abundance of proteins in various tissues. An annotation of protein expression is possible for all genes for which there are two or more antibodies directed towards the corresponding protein target.

The immunohistochemical staining pattern in normal tissues provides the fundament for a subsequent annotated protein expression. The annotation of an immunohistochemical staining pattern is subjective and based on the experienced evaluation of positive immunohistochemical signals in defined subpopulations of cells within a tissue context. The microscopical images and previous annotations of the included 82 normal cell types are reviewed simultaneously and compared. The annotation data is merged and results in a single expression profile for each protein. In addition to accounting for performance of antibodies and available protein/gene characterization data, the review also considers sub-optimal experimental procedures. This includes immunostaining errors such as sub-optimal titration of the primary antibody and suspected cross-reactivity as well as the fact that multiple immunostainings have been performed on non-consecutive tissue microarray sections, allowing for differences in immunohistochemical staining patterns caused by inter-individual and inter-specimen variations. The final annotated protein expression is considered as a best estimate and as such reflects the most probable histological distribution and relative expression level for the evaluated proteins, and is displayed as high, medium, low or not detected level of expression.

Back to top



Immunohistochemistry - cells

Description

As a complement to the representation of normal and cancer tissue, the protein atlas displays images of a selection of widely used and well characterized human cell lines as well as cell samples from healthy individuals and leukemia/lymphoma patients.

A cell microarray has been used to enable immunohistochemical staining of a panel of cell lines and cell samples. Duplicates from 46 cell lines,10 leukemia blood cell samples and 2 samples of PBMC renders a total of 116 cell images per antibody. Included cell lines are derived from DSMZ, ATCC or academic research groups (kindly provided by cell line founders). Information regarding sex and age of the donor, tissue origin and source is listed here. All cells are fixed in 4% paraformaldehyde and dispersed in agarose prior to paraffin embedding and immunohistochemical staining.

The CMA enables representation of leukemia and lymphoma cell lines, covering major hematopoietic neoplasms and even different stages of differentiation. Cell lines from solid tumors are also included in the CMA. A subset originate from solid tumors not represented in the TMAs, e.g. sarcoma, choriocarcinoma, small cell lung carcinoma, and the remaining cell lines are derived from tumor types also represented in the TMAs.

The immunohistochemical protocols used result in a brown-black staining, localized where an antibody has bound to its corresponding antigen. The section is furthermore histochemically counterstained with hematoxylin to enable visualization of microscopical features. Hematoxylin staining is unspecific, and results in a blue coloring of both cells and extracellular material.

Annotation

In order to provide an overview of protein expression patterns, all images of immunohistochemically stained cell lines are annotated using an automated recognition software for image analysis. The image analysis software, TMAx (Beecher Instruments, Sun Prairie, WI, USA), built on an object-oriented image analysis engine from Definiens, utilizes rule-based operations and multiple iterative segmentation processes together with fuzzy logic to identify cells and immunohistochemical stain deposits.

Output parameters from the software always displayed in conjunction with the annotated images are:

  • number of objects defined as cells in the image
  • staining intensity (negative, weak, moderate and strong)
  • fraction (%) of positive cells
In addition, two overlay images with additional numerical information are presented to facilitate interpretation. The information displayed includes:
  • Cell: object based view representing fraction (%) of immunostained cells. The color code for each cell represents a range of immunoreactivity, blue (negative/very weak), yellow (weak/moderate), orange (moderate/strong) and red (strong) cells. This classification is based on areas of different intensities within each object (cell). This differs slightly from the subjective classification provided by manual annotation of cells in normal and cancer tissue.
  • Area: area-based view representing immunostained areas (%) within cells. The color code represents a range of immunoreactivity, yellow (weak/moderate), green (moderate/strong) and red (strong). Negative/very weak areas are transparent. The intensity score is generated from the total of this area based analysis.

Back to top



Immunofluorescence - cells

Description

As a complement to the immunohistochemically stained cells and tissues, the protein atlas displays high resolution, multicolor images of immunofluorescently stained cells. This provides spatial information on protein expression patterns on a fine cellular and subcellular level.

Originally three cell lines, U-2 OS, A-431 and U-251 MG, originated from different human tissues were chosen to be included in the immunofluorescent analysis. Starting from year 2012, the cell line panel has been expanded to include additional cell lines: A-549, BJ, CACO-2, HaCaT, HEK 293, HeLa, Hep-G2, MCF-7, PC-3, RH-30, RT-4, SH-SY5Y, SiHa, SK-MEL-30 and TIME. To enhance the probability for a large number of proteins to be expressed, the cell lines were selected from different lineages, e.g. tumor cell lines from mesenchymal, epithelial and glial tumors. The selection was furthermore based on morphological characteristics, widespread use and multitude of publications using these cell lines. Information regarding sex and age of the donor, cellular origin and source is listed here. For each antibody two suitable cell lines from the cell line panel are now selected for the immunofluorescent analysis, based on RNA sequencing data. The third cell line chosen for each antibody is always U-2 OS, in order to localize the whole human proteome on a subcellular level in one cell line.

In addition to the human cell lines, the mouse cell line NIH 3T3 is also stained. This is only done for the antibodies corresponding to genes where the mouse and human genes are orthologues.

Besides the HPA antibodies, the cells are also stained with reference markers in order to facilitate the annotation of the subcellular distribution of the protein targeted by the HPA antibody. The following probes/organelles are used as references; (i) DAPI for the nucleus, (ii) anti-tubulin antibody as internal control and marker of microtubules, and (iii) calreticulin for the endoplasmic reticulum (ER).

The resulting confocal images are single slice images representing one optical section of the cells. The microscope settings are optimized for each sample. The different organelle probes are displayed as different channels in the multicolor images; the HPA antibody staining is shown in green, nuclear stain in blue, micro-tubules in red and ER in yellow.

Annotation

In order to provide an interpretation of the staining patterns, all images of immunofluorescently stained cell lines are manually annotated. For each cell line and antibody the intensity and subcellular location of the staining is described. The staining intensity is classified as negative, weak, moderate or strong based on the laser power and detector gain settings used for image acquisition in combination with the visual appearance of the image. The subcellular location is further combined with parameters describing the staining characteristics (i.e. smooth, granular, speckled, fibrous, dotty or clusters).

Knowledge-based annotation

Knowledge-based annotation of subcellular location aims to provide an interpretation of the subcellular location of a protein in three human cell lines. The conflation of immunofluorescence data from two or more antibodies directed towards the same protein and a review of available protein/gene characterization data, allows for a knowledge-based interpretation of the subcellular location.

Back to top



Western blot

Description

Western blot analysis of antibody specificity has been done using a routine sample setup composed of IgG/HSA-depleted human plasma and protein lysates from a limited number of human tissues and cell lines. Antibodies with a non-supportive routine WB have been revalidated using an over-expression lysate (VERIFY Tagged Antigen(TM), OriGene Technologies, Rockville, MD) as a positive control. Antibody binding was visualized by chemiluminescence detection in a CCD-camera system using a peroxidase (HRP) labeled secondary antibody.

Antibodies included in the Human Protein Atlas have been analyzed without further efforts to optimize the procedure and therefore it cannot be excluded that certain observed binding properties are due to technical rather than biological reasons and that further optimization could result in a different outcome.

Back to top



Protein array

Description

All purified antibodies are analyzed on antigen microarrays. The specificity profile for each antibody is determined based on the interaction with 384 different antigens including its own target. The antigens present on the arrays are consecutively exchanged in order to correspond to the next set of 384 purified antibodies. Each microarray is divided into 14 replicated subarrays, enabling the analysis of 14 antibodies simultaneously. The antibodies are detected through a fluorescently labeled secondary antibody and a dual color system is used in order to verify the presence of the spotted proteins. A specificity profile plot is generated for each antibody, where the signal from the binding to its own antigen is compared to the unspecific binding to all the other antigens. The vast majority of antibodies are given a pass, but a fraction are failed either due to low signal or low specificity.

Back to top



RNA

Description

In total, 44 cell lines and 27 tissues have been analyzed by RNA-seq to estimate the transcript abundance of each protein-coding gene.

For cell lines, early-split samples were used as duplicates and total RNA was extracted using the RNeasy mini kit. Information regarding cellular origin and source of each cell line is listed here.

For normal tissue, specimens were collected with consent from patients and all samples were anonymized in accordance with approval from the local ethics committee (ref #2011/473) and Swedish rules and legislation. All tissues were collected from the Uppsala Biobank and RNA samples were extracted from frozen tissue sections.

For a total number of 86 cell line samples and 95 tissue samples, mRNA sequencing was performed on Illumina HiSeq2000 and 2500 machines (Illumina, San Diego, CA, USA) using the standard Illumina RNA-Seq protocol with a read length of 2x100 bases. Transcript abundance estimation was performed using Tophat v2.0.3 and Cufflinks v2.0.2. For each gene, FPKM values or 'number of Fragments Per Kilobase gene model and Million reads', were calculated and the average FPKM value for replicate samples were used as abundance scores. The threshold level to detect presence of a transcript for a particular gene was set to > 1 FPKM.

FPKM thresholds were further set for categorization of transcript expression levels into low, medium or high RNA abundance.

Abundance FPKM tissue FPKM cell line
Not detected 0-1 0-1
Low 1-10 1-20
Medium 10-50 20-50
High >50 >50

The RNA-seq data was used to classify all genes according to their tissue-specific expression into one of eight different categories, defined based on the total set of all FPKM values in 27 tissues:

  • highly tissue enriched (expression in one tissue at least 50-fold higher than all other tissues)
  • moderately tissue enriched (expression in one tissue at least five-fold higher than all other tissues)
  • group enriched (5-fold higher average FPKM level in a group of two to seven tissues compared to all other tissues)
  • mixed low (detected in 1-26 tissues and at least one tissue < 10 FPKM)
  • mixed high (detected in 1-26 tissues and all detected tissues > 10 FPKM)
  • expressed in all low (all tissues > 10 FPKM, at least one tissue < 10 FPKM)
  • expressed in all high (all tissues > 10 FPKM)
  • not detected (< 1 FPKM in all 27 tissues)

Back to top



Evidence

Description

For each gene, a protein evidence summary score was calculated based on three parameters: UniProt protein existence (UniProt evidence), transcript profiling categories (RNA evidence) and a Protein Atlas antibody based score (HPA evidence).

The UniProt protein existence data was assigned to classes

  • evidence at protein level (class 1)
  • evidence at transcript level (class 2)
  • inferred from homology (class 3)
  • predicted (class 4)
  • uncertain (class 5)

The UniProt protein ids were mapped to genes from Ensembl version 73.37.

The RNA evidence was based on the gene abundance level described in the RNA section above.

The HPA evidence was calculated based on the manual curation of Western blot, tissue profiling and subcellular location as described in Supplementary Table 1 in Fagerberg et al.

The protein evidence summary score for each gene was assigned “High” if a gene was found having both UniProt evidence class 1 and “High” in the HPA evidence; “Medium” if the gene had UniProt evidence class 1 or was scored “High” in the HPA evidence; “Low” if the HPA evidence was “Medium” and the UniProt evidence class was 2, 3, 4 or 5; “Only RNA” if UniProt evidence class was 2 or RNA evidence was “High”; and “None” if RNA evidence was “Medium” or lower and the gene was scored as UniProt evidence class 3, 4 or 5.

Back to top