Subcellular resource - Methods summary

Summary: The subcellular resource provides information about the subcellular localization of proteins in human cell lines, including specialized sections for ciliated cell lines and indiced pluripotent stem cells (iPSCs). Some proteins have also been analyzed in human sperm cells. The resource also includes observations of cell-to-cell variabilities in protein expression, as well a detailed analysis of cell cycle-correlations in protein and RNA expression for a subset of the variable genes.

Key publications:

  • Thul PJ et al. (2017) “A subcellular map of the human proteome” Science 356 (6340): aal3321
  • Mahdessian D et al. (2021) "Spatiotemporal dissection of the cell cycle with single-cell proteogenomics" Nature 590(7847):649-654
  • Hansen JN et al. (2025) "Intrinsic heterogeneity of primary cilia revealed through spatial proteomics" Cell S0092-8674(25)01029-3

Learn about

  • The subcellular distribution of proteins in human cell lines.
  • The proteomes of different organelles and subcellular structures.
  • Single-cell variability in the expression levels and/or localizations of proteins.

Data overview

Data type Count Data Coverage (nr genes)
Protein location 49 Protein location data across 13603 genes 13603
SubCell embeddings SubCell embeddings for ICC-IF images

How has the data been generated?

For standard human cell lines, the subcellular distribution of each protein is assayed in up to three human cell lines selected from a panel of 42 cell lines. Most proteins have been stained in U2OS, and two additional cell lnes selected based on mRNA expression of the corresponding gene. Some proteins have also been stained in one or more ciliated cells lines, human induced pluripotent stem cells (iPSCs) and/or in human sperm, originating from a single healthy donor. In addition to the human cells, many proteins have been stained in the mouse cell line NIH 3T3, given that the human and mouse genes are orthologous.

The cells are grown in 96-well glass bottom plates (Figure 1). The location of proteins is determined by indirect immunofluorescence (ICC-IF) staining followed by high-resolution confocal microscopy imaging. The cells are fixed in 4% formaldehyde (or sometimes methanol) and permeabilized with Triton X-100. The target protein is targeted by an in-house polyclonal antibody generated within the HPA project, or sometimes using an antibody from a commercial source. For each gene, the use of PFA or methanol, as well as dilution factors for the antibodies, are stated in the Antibodies and Validation section. n order to facilitate the annotation of the subcellular localization of the protein targeted by the HPA antibody, the cells are also stained with reference markers: (i) DAPI for the nucleus, (ii) anti-tubulin antibody for microtubules, and (iii) anti-calreticulin or anti-KDEL for the endoplasmic reticulum (ER). For ciliated cells lines, an antibody targeting ARL13B has been used to mark primary cilia and an antibody targeting pericentrin (PCNT) has been used to mark basal bodies. In human sperm, an antibody targeting acetylated tubulin has been used as a marker for flagella and an antibody targeting citrate synthase (CS) has been used as a marker for mitochondria. The primary antibodies are detected with the help of species-specific secondary antibodies labelled by different fluorophores (Alexa Fluor 488, Alexa Fluor 555 fand Alexa Fluor 647).

The cells are imaged using a laser scanning confocal microscope with a 63X objective. The different fluorophores are displayed as different channels in multicolor images, with the protein of interest shown in green, the nucleus in blue, microtubules (or ARL13B/acetylated tubulin) in red and ER (or PCNT/CS) in yellow. The resulting confocal images are single slice images representing one optical section of the cells. Both for ciliated cell lines, sperm cells and iPSCs, cells have been images in multiple consequtive optical sections and images are displayed as z-stacks.


Figure 1. Schematic overview of the immunofluorescence workflow used in the subcellular resource.

Read more about antibody generation

How has the data been analyzed?

All images are analysed manually. This involves describing various aspects of the staining characteristics and classifying the localization of the target protein into one or more of 49 different organelles and subcellular structures. These structures can be recognized by trained experts based on the staining pattern of the target antibody together with the markers.

Subcellular location GO term
Acrosome GO:0001669
Actin filaments GO:0015629
Aggresome GO:0016235
Annulus GO:0097227
Basal body GO:0036064
Calyx GO:0120238
Cell Junctions GO:0030054
Centriolar satellite GO:0034451
Centrosome GO:0005813
Cleavage furrow GO:0032154
Connecting piece GO:0120212
Cytokinetic bridge GO:0045171
Cytoplasmic bodies GO:0036464
Cytosol GO:0005829
End piece GO:0097229
Endoplasmic reticulum GO:0005783
Endosomes GO:0005768
Equatorial segment
Flagellar centriole GO:0005814
Focal adhesion sites GO:0005925
Golgi apparatus GO:0005794
Intermediate filaments GO:0045111
Kinetochore GO:0000776
Lipid droplets GO:0005811
Lysosomes GO:0005764
Microtubule ends GO:1990752
Microtubules GO:0015630
Mid piece GO:0097225
Midbody GO:0030496
Midbody ring GO:0090543
Mitochondria GO:0005739
Mitotic chromosome GO:0005694
Mitotic spindle GO:0072686
Nuclear bodies GO:0016604
Nuclear membrane GO:0031965
Nuclear speckles GO:0016607
Nucleoli GO:0005730
Nucleoli fibrillar center GO:0001650
Nucleoli rim GO:0005730
Nucleoplasm GO:0005654
Perinuclear theca GO:0033011
Peroxisomes GO:0005777
Plasma membrane GO:0005886
Primary cilium GO:0005929
Primary cilium tip GO:0097542
Primary cilium transition zone GO:0035869
Principal piece GO:0097228
Rods & Rings
Vesicles GO:0043231

The analysis also involves comparisons of the staining patterns for one antibody in different cell lines, and for different antibodies targeting the same protein, as well as comparisons to external experimental evidence for protein localization found in the UniProtKB/Swiss-Prot database. Main and additional localizations are given a reliability score (Validated/Supported/Approved/Uncertain) that reflect the agreement with external data and potential existence of internal data that can be used for enhanced antibody validation. The individual localization reliability scores finally converge into the overall antibody and gene reliability scores.

  • Enhanced - One or more antibodies are enhanced validated and there is no contradicting data, such as literature describing experimental evidence for a different location.
  • Supported - There is no enhanced validation of the used antibody, but the annotated localization is reported in external literature.
  • Approved - The localization of the protein is partially in agreement with external data, or has not been previously described.
  • Uncertain - The antibody-staining pattern contradicts experimental data or expression is not detected at RNA level.

Read more about validation of antibodies and ICC/IF

What data is presented?

For each gene, information about the subcellular localization of the encoded protein is presented under the tab for the subcellular resource. This section includes an overview of RNA expression in a panel of human cell lines and representative confocal microscopy images of the protein stained in up to three of these (Figure 2). Additional assays may include protein localization in ciliated cells, human sperm cells, iPSCs, mouse cells, RNA and protein expression relative to cell cycle progression, and assay-specific antibody validation by co-localization with a GFP-tagged version of the protein and/or by siRNA-mediated knockdown of gene expression. In addition to the gene-centric presentation of data, the subcellular resource provides descriptive chapters for the proteome of each organelle and subcellular structure, as well as for the multi-localizing proteome and for the cell-to-cell variable proteome.


Figure 2. Examples of data presented in the subcellular resource.

Data validation

For the subcellular resource, the observed subcellular localization of the protein is compared to independent experimental data from external sources in the UniProtKB/Swiss-Prot database. Main and additional localizations are given individual reliability scores (Supported, Approved and Uncertain), where Supported reflects agreement with external data, Approved reflects partial agreement or lack of external data, and Uncertain reflects disagreement with external data. The individual localization reliability scores finally converge into the overall antibody and gene reliability score. The highest reliability score, Enhanced, is given to antibodies and genes that have been validated by the use of an independent antibody targeting a different part of the protein, by co-localization with a GFP-tagged version of the target protein, and/or by reduced signal upon siRNA-mediated knockdown of RNA expression.

Read more about the assays and annotations used in the subcellular resource here.