DATA QUALITY ASSURANCE AND SCORING

Quality assurance

Validation score

    Immunohistochemistry (IH)

    Immunofluorescence (IF)

    Western blot (WB)

    Protein array (PA)

Reliability score

    Immunohistochemistry (IH)

    Immunofluorescence (IF)

RNA approval - cells




Quality assurance

The usefulness of antibodies in different assays is dependent on both sensitivity and specificity of epitope binding. The quality of antibodies in the database is monitored through a number of different quality assurance steps. Below is a list of measures taken to ensure that the quality of produced and utilized PrEST antibodies is acceptable. All PrEST antibodies must pass steps 1-3 in order to be used for immunohistochemistry. Steps 4-5 provide a basis for an evaluation and scoring of antibody validity. All antibodies that provide a reasonable pattern of immunoreactivity are added to the Human Protein Atlas portal. Feed-back from the research community is appreciated and needed for continuous curation of data.

Quality assurance steps for PrEST antibodies generated within the Human Protein Atlas project:

  1. Plasmid inserts are sequenced to assure that the correct PrEST sequence is cloned.
  2. Size of resulting recombinant protein (including the specific PrEST) is analyzed using mass spectrometry to assure that the correct antigen has been produced and purified.
  3. To control for cross-reactivity, affinity purified antibodies are tested for sensitivity and specificity on protein arrays consisting of glass slides with spotted PrEST fragments.
  4. Antibody specificity is analyzed using Western blot in a standardized setup. Total protein lysates from a limited number of tissues (liver and tonsil), cell lines (RT-4 and U-251 MG), and human plasma are used to evaluate the antibody target binding in a Western blot setting. Antibodies with a non-supportive routine WB have been revalidated using an over-expression lysate (VERIFY Tagged Antigen(TM), OriGene Technologies, Rockville, MD) as a positive control.
  5. Immunohistochemical staining of normal and cancer tissue is examined by trained pathologists to assure plausible immunohistochemical staining properties.

For commercially available antibodies (CABs), immunohistochemistry has been performed in a similar manner as for HPA-antibodies. These antibodies have also been tested on Western blots. For each commercially available antibody, a link to the antibody provider is given on the "Antibody/Antigen" page.


Back to top



Validation score

The validation score indicates how well the quality assurance data supports the specificity of the antibody towards the expected human target protein.

For antibodies supplied through commercial or other academic sources we provide our Western blot and Immunohistochemistry validation scores. For further validation we refer to quality controls provided by respective company.

All validation scores are classified in two main categories:

Supportive:
Uncertain


Immunohistochemistry (IH)

A validation score for immunohistochemistry is assigned for all antibodies and reflects the results of immunostaining. If a single antibody is available for the target protein the validation score is mainly based on conformance of the expression pattern to available gene/protein characterization data in scientific literature and data from bioinformatic predictions. Extensive or sufficient gene/protein data requires that there is evidence of existence on a protein level and that a substantial quantity of published experimental data is available from literature and public databases. If two or more antibodies are available the validation score will in addition to the criteria above also depend heavily on similarity between staining patterns generated in normal tissue. The validation score should not be confused with the reliability score based on knowledge-based annotation of protein expression. Below are the sentences which are used to classify the antibodies into the two different categories. The decision tree provides an overview of the validation score correlating to each different sentence.

Supportive

  • Two (or more) antibodies yielding similar staining patterns which are consistent with available gene/protein characterization data.
  • Two (or more) antibodies yielding similar staining patterns which are partly consistent with gene/protein characterization data or consistent with limited gene/protein characterization data.
  • One antibody yielding a staining pattern which is consistent with gene/protein characterization data.
  • Two (or more) antibodies yielding partly similar staining patterns which are consistent with gene/protein characterization data.
  • Two (or more) antibodies yielding dissimilar staining patterns which are consistent with available gene/protein characterization data.

Uncertain

  • Two (or more) antibodies yielding similar or partly similar staining patterns and there is no available gene/protein characterization data.
  • Two (or more) antibodies yielding similar or partly similar staining patterns which are partly consistent with or contradicted by limited gene/protein characterization data.
  • Two (or more) antibodies yielding partly similar staining patterns which are partly consistent with gene/protein characterization data or consistent with limited gene/protein characterization data.
  • One antibody yielding a staining pattern which is partly consistent with gene/protein characterization data or consistent with limited gene/protein characterization data.
  • Two (or more) antibodies yielding dissimilar staining patterns which are partly consistent with gene/protein characterization data or consistent with limited gene/protein characterization data.
  • One antibody yielding a staining pattern and there is no available gene/protein characterization data or the staining pattern is partly consistent with or contradicted by limited gene/protein characterization data.
  • Two (or more) antibodies yielding dissimilar or partly similar staining patterns and there is no available gene/protein characterization data or only limited gene/protein characterization data.

Non-supportive

  • Two (or more) antibodies yielding dissimilar staining patterns which are contradicted by gene/protein characterization data.
  • One antibody yielding a staining pattern contradicted by gene/protein characterization data.
  • No staining.

Non-supportive antibodies are not published

The validation of multi-targeting (targeting proteins encoded by two or more genes) antibodies is mainly based on conformance of the expression pattern to available gene/protein characterization data. Similarity between paired antibodies is not taken in account due to the complexity of multiple gene targets.

Supportive

  • The multi-targeting antibody yielding a staining pattern consistent with gene/protein characterization data for all of the genes.)

Uncertain

  • The multi-targeting antibody yielding a staining pattern partly consistent with gene/protein characterization data or consistent with limited gene/protein characterization data for all of the genes.
  • The multi-targeting antibody yielding a staining pattern consistent with gene/protein characterization data for at least one of the genes whereas any of the other genes has no available gene/protein characterization data or is contradicted by gene/protein characterization data.
  • The multi-targeting antibody yielding a staining pattern partly consistent with limited gene/protein characterization data or there is no available gene/protein characterization data for all or at least one of the genes.
  • The multi-targeting antibody yielding a staining pattern partly consistent with characterization data for at least one of the genes whereas any of the other genes has no available gene/protein characterization data or is contradicted by gene/protein characterization data.

Non-supportive

  • Multi-targeting antibody yielding a staining pattern contradicted by gene/protein characterization data for all of the genes.
  • No staining.

Non-supportive antibodies are not published




Back to top



Immunofluorescence (IF)

A validation score of the observed staining is assigned for each cell line and is classified as either Supportive, Uncertain or Non-supportive based on concordance with available experimental gene/protein characterization data in the UniProtKB/Swiss-Prot database. The validation scores for the three cell lines are merged into one of the main categories; Supportive, Uncertain or Non-supportive, to represent the antibody staining in all analyzed cell lines. The decision tree provides an overview of the validation categories correlating to each different sentence.

Validation scores for Immunofluorescence:

Supportive

  • One/multiple location(s) supported by experimental gene/protein characterization data and ≥1 other antibody.
  • One/multiple location(s) with no available experimental gene/protein characterization data or partly supported and partly conflicted, but supported by ≥1 other antibody.
  • One/multiple location(s) supported by experimental gene/protein characterization data.
  • Multiple locations partly supported (at least one) by experimental gene/protein characterization data.
  • One/multiple location(s) in cytoplasm (e.g. golgi, mitochondria) supported by experimental evidence for cytoplasmic localization.

Uncertain

  • Location not consistent with experimental gene/protein characterization data, but supported by ≥1 other antibody.
  • One/multiple location(s) with no available experimental gene/protein characterization data.
  • Not decisive - One/multiple location(s) where experimental gene/protein characterization data is partly supporting and partly conflicting.
  • No staining.
  • One/multiple location(s) supported by experimental gene/protein characterization data but showing dissimilar staining to ≥1 other antibody.
  • One/multiple location(s) with no available experimental gene/protein characterization data or partly supported and partly conflicted, but showing dissimilar staining to ≥1 other antibody.

Non-supportive

  • Location not consistent with experimental gene/protein characterization data.
  • Location not consistent with experimental gene/protein characterization data and showing dissimilar staining to ≥1 other antibody.

The validation of multi-targeting (targeting proteins encoded by two or more genes) antibodies is based on the conformance of the expression pattern to available gene/protein characterization data. Similarity between paired antibodies is not taken in account due to the complexity of multiple gene targets.

Validation scores for Immunofluorescence - multi-targeting antibodies:

Supportive

  • The multi-targeting antibody (targeting proteins encoded by two or more genes) yielding a staining pattern consistent with available gene/protein characterization data for all of the genes.
  • The multi-targeting antibody (targeting proteins encoded by two or more genes) yielding a staining pattern partly consistent with available gene/protein characterization data for all of the genes.

Uncertain

  • The multi-targeting antibody yielding a staining pattern with no available gene/protein characterization data.
  • The multi-targeting antibody yielding a staining pattern consistent with available gene/protein characterization data for at least one of the genes but not all.
  • The multi-targeting antibody not yielding a staining pattern.

Non-supportive

  • The multi-targeting antibody yielding a staining pattern not consistent with available gene/protein characterization data.



Back to top



Western blot (WB)

Supportive

  • Bands corresponding to the predicted size in kDa (+/-20%).
  • Band of predicted size in kDa (+/-20%) with additional bands present.

Uncertain

  • Single band larger than predicted size in kDa (+20%) but partly supported by predicted transmembrane region, signal peptide or by other available data.
  • No bands detected.
  • Single band differing more than +/-20% from predicted size in kDa and not supported by predicted transmembrane region, signal peptide or by other available data.

Non-supportive

  • Weak band of predicted size in kDa (+/-20%) but with additional bands of higher intensity also present.
  • Only bands not corresponding to the predicted size.
  • Target too small/large to be analyzed with the present setup.

For antibodies showing non-supportive Western blot data the corresponding image is not shown.


Back to top



Protein array (PA)

Supportive

  • Pass with single peak corresponding to interaction only with its own antigen.

Uncertain

  • Pass with quality comment low specificity (binding to 1-2 PrESTs >15% and <40%).

Non-supportive

  • No or weak signal.
  • Low specificity (one antigen with >40% signal or more than two antigens with signal >15%).

Antibodies that are validated as non-supportive are not published.


Back to top



Reliability score

A reliability score is set for proteins where two or more antibodies are available and where a knowledge-based annotation of protein expression for IH or IF has been performed. The reliability of the annotated protein expression data is also scored as supportive or uncertain depending on similarity in immunostaining patterns and consistency with available protein/gene characterization data.

Immunohistochemistry (IH)

A similar immunostaining pattern implies that two or more antibodies directed towards the same protein target show the same cellular and subcellular distribution pattern in a vast majority of analyzed normal tissues. A partly similar immunostaining pattern implies that two or more antibodies directed towards the same protein target show the same cellular and subcellular distribution pattern in a majority of analyzed normal tissues but that the distribution of positivity differs between antibodies in a subset of analyzed tissues. Available protein/gene data refers to both experimental data regarding gene expression (transcript and protein level) that can be found in published literature and public databases, and to bioinformatic predictions. Extensive or sufficient protein/gene data requires that there is evidence of existence on a protein level and that a substantial quantity of published experimental data is available from literature and public databases. Limited protein/gene data does not require evidence of existence on a protein level and refers to genes for which only bioinformatic predictions and scarce published experimental data is available.

The reliability scores are based on the following criteria:

Supportive

  • Similar or partly similar immunostaining pattern, and RNA expression data consistent or mainly consistent with protein expression data.
  • Similar or partly similar immunostaining pattern, and extensive available protein/gene characterization data with support for the cellular distribution of immunoreactivity, RNA expression data unavailable or cannot be evaluated.

Uncertain

  • Similar or partly similar immunostaining pattern, and limited protein/gene characterization data without support for the cellular distribution of immunoreactivity, RNA expression data not consistent with protein expression data.
  • Dissimilar immunostaining pattern, and there is no available protein/gene characterization data supporting cellular distribution of immunoreactivity.


Back to top



Immunofluorescence (IF)

The reliability scores are based on the following criteria:

Supportive

  • Two independent antibodies yielding similar or partly similar staining patterns.
  • Two independent antibodies yielding dissimilar staining patterns, both supported by experimental gene/protein characterization data.
  • One antibody yielding a staining pattern supported by experimental gene/protein characterization data.
  • One antibody yielding a staining pattern with no available experimental gene/protein characterization data, but supported by other assay within the protein atlas.

Uncertain

  • Two independent antibodies yielding partly similar staining patterns but not consistent with data based on experimental protein/gene characterization data.
  • Two independent antibodies yielding dissimilar staining patterns with no available, or contradicted by, experimental gene/protein characterization data.
  • One antibody yielding a staining pattern with no available, or contradicted by, experimental gene/protein characterization data.


Back to top



RNA approval - cells

Antibodies used for the analysis of protein expression in cell lines were validated by comparison of immunohistochemical staining results with available transcript data in 44 cell lines. For two cell lines (LP-1 and Hth83) transcript data is missing.

Several different approval criteria were applied in order to adequately assess the quality of each antibody. Criteria are listed in table below. Spearman correlation between continuous values of IHC quantification and FPKM values across the set of cell lines constitutes one of the basic strategies. In addition to this we also compare categorized expression levels (low, medium and high) set by arbitrary threshold values, in order to evade the difficulty of comparing continuous numbers generated with two methods offering vastly different levels of accuracy and sensitivity. In brief the approval is performed automatically from generated expression values, and is designed as a funnel in which the antibodies are tried against the selection criteria with a descending level of stringency. Antibodies approved according to more stringent criteria are denoted "supportive antibodies" (marked with a star in the Human Protein Atlas), while the remaining antibodies are denoted "uncertain".

Approval category Criteria Supportive
Expression lymphoid cell lines ≥20% of lymphoid cell lines medium/high (the same for RNA and protein) AND 100% of remaining cell lines no/low. Yes
Expression myeloid cell lines ≥25% of lymphoid cell lines medium/high (the same for RNA and protein) AND 100% of remaining cell lines no/low. Yes
Expression hematopoietic cell lines ≥40% of lymphoid cell lines medium/high, one solid tumor cell line allowed to be moderate, remaining solid tumor cell lines no/low. Yes
Expression solid tumor cell lines ≥40% of lymphoid cell lines medium/high, one hematopoietic cell line allowed to be moderate, remaining hematopoietic cell lines no/low. Yes
Expression epithelial cell lines ≥ 20% of 15 epithelial cell lines medium/high (the same for RNA and protein) AND 100% of remaining cell lines no/low. Yes
Expression single cell line Only 1 cell line medium/high (the same for RNA and protein) AND ≤50% of remaining cell lines low (the rest no expression). Yes
Expression subset of cell lines 2-10 cell lines medium/high (the same for RNA and protein) AND ≤50% of remaining cell lines low (the rest no expression). Yes
Correlation ≥0.65 Spearman correlation ≥0.65 across 44 cell lines. Yes
All high/all medium/all low Either high, medium or low expression across all 44 cell lines. One cell line allowed to deviate in each category.* Yes
Congruent expression levels Congruent detection (RNA and protein) of no/low and medium/high expression in all 44 cell lines. Yes
Congruent expression, highest/lowest Protein and transcript data reveal detection of expression above threshold in the same cell lines. Additional criteria on cell lines with highest/lowest expression. ** Yes
Correlation ≥0.55, highest/lowest Spearman correlation ≥0.55 across 44 cell lines, in addition either the cell line with the highest level or the lowest level of expression must be congruent (RNA and protein). No
Congruent expression Protein and transcript data reveal detection of expression above threshold in the same cell lines. No
All no expression No expression detected across all 44 cell lines. No
All no expression/low expression No or low expression detected in all 44 cell lines. No
* For all no, one cell line is allowed to show low expression.
   For all low, one cell line is allowed to show no or medium expression.
   For all medium, one cell line is allowed to show low or high expression.
   For all high, one cell line is allowed to show medium expression.
** Transcript and protein data must congruently identify the:
   cell line with the highest expression AND the three cell lines with the lowest expression OR
   cell line with the lowest expression AND the three cell lines with the highest expression OR
   the seven cell lines with the highest expression OR
   the seven cell lines with the lowest expression.

Back to top