The organelle proteome

Spatial compartmentalization of biological functions is a fundamental strategy that enables multiple biological processes to occur in parallel without undesired interference. An organelle is a subunit of the eukaryotic cell with a specialized function. The name "organelle" stems from the analogy between the different roles of organelles in the cells to the different roles of organs in the human body as a whole. A distinction is often made between membrane-bound and non-membrane bound organelles. The membrane-bound organelles, such as the nucleus and the Golgi apparatus, have a clearly defined physical boundary that separates the internal space from the outside. In contrast, non-membrane bound organelles and subcellular compartments, like the cytoskeleton and nucleoli, constitute spatially distinct assemblies of proteins, and sometimes RNA, within the cell without a physical boundary. This partitioning of cellular components creates specific environments where the concentration of different molecules can be tailored to fit the purpose of the organelle or subcellular structure, and provides important opportunities for regulation and coordination of cellular processes.

A major function of proteins is to catalyze, conduct and control cellular processes in time and space. As different organelles and subcellular structures offer distinct environments, with distinct physiological conditions and interaction partners, the subcellular localization of a protein is an important factor for protein function. Consequently, mis-localization of proteins is associated with cellular dysfunction and various human diseases (Kau TR et al. (2004); Laurila K et al. (2009); Park S et al. (2011)). Knowledge of the spatial distribution of proteins at the subcellular level is essential for understanding protein function and molecular interactions, as well as for identifying the components of different cellular processes. Thus, studying how cells generate and maintain their spatial organization is central for understanding the mechanisms of living cells.

Within the subcellular resource, 13603 human proteins have been mapped at single-cell level to 49 different organelles and subcellular structures (Figure 1), which has enabled the definition of 14 major organelle proteomes. The final group shown below consists of proteins localizing to subdtructures of the highly specialized sperm cells.

The analysis also reveals that approximately half of the proteins localize to multiple compartments and identifies many proteins with single-cell variation in terms of protein abundance and/or spatial distribution.

Subcellular localization of proteins

There are several approaches for systematic analysis of protein localization at the subcellular level. The first major approach is organelle fractionation and quantitative mass-spectrometry, which allows identification of the subcellular location of proteins by comparing their distribution profiles across the fractions with known organelle markers (Park S et al. (2011); Christoforou A et al. (2016); Itzhak DN et al. (2016)). The second major approach is to use protein-protein interaction to deduce the local spatial proteome of proteins. In this case, affinity purification or enzyme-mediated proximity-labelling is used to deduce unknown protein subcellular locations from interaction networks overlayed with known subcellular locazations (Itzhak DN et al. (2016); Roux KJ et al. (2012); Lee SY et al. (2016)). The third major approach is imaging-based methods, which enable the exploration of subcellular distribution of proteins in situ in single cells. These approaches are complemental, but imaging-based subcellular profiling has the advantage of effectively identifying single-cell variability and multi-organelle localization. Imaging based approaches can be performed using tagged recombinant proteins (Huh WK et al. (2003); Simpson JC et al. (2000); Stadler C et al. (2013)) or affinity reagents, such as antibodies.

The subcellular resource employs an immunofluorescence (IF)-based approach combined with confocal microscopy to enable high-resolution investigation of the spatial distribution of proteins in cells (Thul PJ et al. (2017); Stadler C et al. (2013); Barbe L et al. (2008); Stadler C et al. (2010); Fagerberg L et al. (2011)). With the diffraction-limited resolution of about 200 nm, a confocal image gives detailed insights into organization at the subcullar level. The spatial distribution of each protein is investigated using indirect IF in up to three cell lines, usually comprised of U2 OS and two additional cell lines selected based on mRNA expression of the corresponding gene, using a panel of 42 human cell lines. Some proteins have also been mapped in ciliated cell lines, iPSCs and/or in human sperm cells. The protein of interest is visualized in green, while reference markers for microtubules (red), endoplasmic reticulum (yellow) and nucleus (blue) are used to outline the cell and the nucleus. From small dots like nuclear bodies, to larger structures such as the nucleoplasm, the distinct patterns in the images together with the reference markers make it possible to precisely determine the spatial distribution of a protein within the cell. The localization of each protein is assigned to one or more of 49 organelles and subcellular structures (Figure 1).


Nucleoplasm

Nuclear speckles

Nuclear bodies

Nucleoli

Nucleoli fibrillar center

Nucleoli rim

Mitotic chromosome

Kinetochore

Nuclear membrane

Cytosol

Cytoplasmic bodies

Rods & Rings

Aggresome

Mitochondria

Centrosome

Centriolar satellites

Microtubules

Microtubule ends

Mitotic spindle

Cytokinetic bridge

Midbody

Midbody ring

Cleavage furrow

Intermediate filaments

Actin filaments

Focal adhesion sites

Endoplasmic reticulum

Golgi apparatus

Vesicles

Endosomes

Lysosomes

Lipid droplets

Peroxisomes

Plasma membrane

Cell junctions

Primary cilium

Primary cilium tip

Primary cilium transition zone

Basal body

Acrosome

Equatorial segment

Perinuclear theca

Calyx

Connecting piece

Flagellar centriole

Midpiece

Principal piece

End piece

Annulus

Figure 1. An example of confocal immunofluorescence images of different proteins (green) localized to each of the subcellular organelles and substructures currently annotated in the subcellular resource in a representative set of cell lines. Microtubules are marked with an anti-tubulin antibody (red) and the nucleus is counterstained with DAPI (blue). For more example images and details describing all the 49 patterns annotated in the subcellular resource, see the cell structure dictionary.

Protein distribution in human cells

Figure 2 shows the distribution of all annotations across the 49 organelles and subcellular structures for 13603 genes with protein localization data in the subcellular resource. Note that one protein can have muliple localizations and thus be represented more than once in this plot. The plot is sorted by meta-compartments: cytoplasm, nucleus, endomembrane system, primary cilia and structures found n human sperm. The most common localization assigned to a protein is nucleus, closely followed by cytosol and then vesicles. 61% (n=8327) of the proteins were detected in more than one location (multilocalizing proteins), and 28% (n=3752) displayed single-cell variation in expression level or spatial distribution.

Figure 2. Bar plot showing the distribution of classifications of proteins in organelles and subcellular structures in the subcellular resource. Note that one protein can localize to more than one compartment. The bars are colored according to meta compartment.

Validation of antibodies and location data

The quality and use of antibodies in research have been frequently debated (Baker M. (2015)). As antibody off-target binding can cause false positive results, the subcellular resource makes an effort in manually scoring all results in terms of reliability. In the subcellular resource a reliability score for every annotated location at a four-graded scale is provided: Enhanced, Supported, Approved, and Uncertain, as described in detail here. Enhanced reliability scores are obtained through antibody validation according to one of the validation "pillars" as proposed by an international working group (Uhlen M et al. (2016): (i) genetic methods using siRNA silencing (Stadler C et al. (2012)) or CRISPR/Cas9 knock-out, (ii) expression of a fluorescent protein-tagged protein at endogenous levels (Skogs M et al. (2017)) or (iii) independent antibodies targeting different epitopes (Stadler C et al. (2010)). Supportive reliability scores are given for locations that are in agreement with external experimental data (UniProtKB/Swiss-Prot database). The reliability score Approved indicates that there is no external experimental information available to confirm the observed location or that external data is only partially supportive. An Uncertain location is contradictory compared to external protein localization data or protein function, but is shown if it cannot be ruled out that the data is correct, and further experiments are needed to establish the reliability of the antibody staining. The individual location reliability scores are summarized into an overall gene reliability score. The distribution of reliability scores in the subcellular resource is shown in Figure 3. Approximately 43% (n=5812) of the protein localizations provided are Enhanced or Supported. Table 1 details the organelle distribution of all localized proteins and the distribution of reliability scores on the basis of individual organelles and subcellular structures.

Figure 3. Pie chart showing level of reliability of the localized proteins, where each piece is the number of proteins with one type of score, out of the four reliability scores Enhanced, Supported, Approved, and Uncertain.

Table 1. Table showing the number of proteins localized to every organelle, structure, and substructure in the subcellular resource, along with the distribution of reliability scores.

Location Proteins Location reliability
% Enhanced Supported Approved Uncertain
Intermediate filaments 1471.111199324
Actin filaments 2531.9114315742
Focal adhesion sites 1481.110278823
Microtubules 3562.6105023165
Microtubule ends 70.10241
Cytokinetic bridge 2451.822716254
Midbody 560.41113113
Midbody ring 250.211194
Cleavage furrow 200011
Mitotic spindle 1611.21327850
Centriolar satellite 2281.773212663
Centrosome 5133.8710530695
Mitochondria 11328.311540851594
Aggresome 190.100172
Cytosol 526738.728115692663754
Cytoplasmic bodies 740.5125417
Rods & Rings 190.101162
Endoplasmic reticulum 5764.25019929136
Golgi apparatus 12799.463216837163
Vesicles 248818.3863651625412
Peroxisomes 240.261161
Endosomes 160.16910
Lysosomes 200.121332
Lipid droplets 400.3611203
Plasma membrane 226316.6947331105331
Cell Junctions 3312.4208917250
Nucleoplasm 628346.264719842921731
Nuclear membrane 2952.2137316940
Nucleoli 11168.290294558174
Nucleoli fibrillar center 3252.4149018734
Nucleoli rim 1531.127476514
Nuclear speckles 5073.74717822953
Nuclear bodies 6264.63423629165
Kinetochore 80.10521
Mitotic chromosome 790.61216438
Primary cilium 4673.4071214182
Primary cilium tip 13010195061
Primary cilium transition zone 960.70123846
Basal body 4513.3076222153
Acrosome 1120.8181021
Equatorial segment 820.602800
Perinuclear theca 700.504624
Calyx 700.504642
Connecting piece 920.704835
Flagellar centriole 1160.9018962
Mid piece 3522.62253205
Principal piece 3712.73373238
End piece 1951.40271635
Annulus 500.405441
Number of proteins 136031001306525980012430

Relevant links and publications

Kau TR et al., Nuclear transport and cancer: from mechanism to intervention. Nat Rev Cancer. (2004)
PubMed: 14732865 DOI: 10.1038/nrc1274

Laurila K et al., Prediction of disease-related mutations affecting protein localization. BMC Genomics. (2009)
PubMed: 19309509 DOI: 10.1186/1471-2164-10-122

Park S et al., Protein localization as a principal feature of the etiology and comorbidity of genetic diseases. Mol Syst Biol. (2011)
PubMed: 21613983 DOI: 10.1038/msb.2011.29

Christoforou A et al., A draft map of the mouse pluripotent stem cell spatial proteome. Nat Commun. (2016)
PubMed: 26754106 DOI: 10.1038/ncomms9992

Itzhak DN et al., Global, quantitative and dynamic mapping of protein subcellular localization. Elife. (2016)
PubMed: 27278775 DOI: 10.7554/eLife.16950

Roux KJ et al., A promiscuous biotin ligase fusion protein identifies proximal and interacting proteins in mammalian cells. J Cell Biol. (2012)
PubMed: 22412018 DOI: 10.1083/jcb.201112098

Lee SY et al., APEX Fingerprinting Reveals the Subcellular Localization of Proteins of Interest. Cell Rep. (2016)
PubMed: 27184847 DOI: 10.1016/j.celrep.2016.04.064

Huh WK et al., Global analysis of protein localization in budding yeast. Nature. (2003)
PubMed: 14562095 DOI: 10.1038/nature02026

Simpson JC et al., Systematic subcellular localization of novel proteins identified by large-scale cDNA sequencing. EMBO Rep. (2000)
PubMed: 11256614 DOI: 10.1093/embo-reports/kvd058

Stadler C et al., Immunofluorescence and fluorescent-protein tagging show high correlation for protein localization in mammalian cells. Nat Methods. 2013 Apr;10(4):315-23 (2013)
PubMed: 23435261 DOI: 10.1038/nmeth.2377

Thul PJ et al., A subcellular map of the human proteome. Science. (2017)
PubMed: 28495876 DOI: 10.1126/science.aal3321

Barbe L et al., Toward a confocal subcellular atlas of the human proteome. Mol Cell Proteomics. (2008)
PubMed: 18029348 DOI: 10.1074/mcp.M700325-MCP200

Stadler C et al., A single fixation protocol for proteome-wide immunofluorescence localization studies. J Proteomics. (2010)
PubMed: 19896565 DOI: 10.1016/j.jprot.2009.10.012

Fagerberg L et al., Mapping the subcellular protein distribution in three human cell lines. J Proteome Res. (2011)
PubMed: 21675716 DOI: 10.1021/pr200379a

Baker M., Reproducibility crisis: Blame it on the antibodies. Nature. (2015)
PubMed: 25993940 DOI: 10.1038/521274a

Uhlen M et al., A proposal for validation of antibodies. Nat Methods. (2016)
PubMed: 27595404 DOI: 10.1038/nmeth.3995

Stadler C et al., Systematic validation of antibody binding and protein subcellular localization using siRNA and confocal microscopy. J Proteomics. (2012)
PubMed: 22361696 DOI: 10.1016/j.jprot.2012.01.030

Skogs M et al., Antibody Validation in Bioimaging Applications Based on Endogenous Expression of Tagged Proteins. J Proteome Res. (2017)
PubMed: 27723985 DOI: 10.1021/acs.jproteome.6b00821