The housekeeping proteome

A large number of proteins are essential for all cells throughout the human body. These proteins are sometimes called housekeeping proteins, suggesting that their expression is crucial for the maintenance of basic cellular function.

Defining the housekeeping proteome

There are a number of definitions of housekeeping proteins that usually are associated with varying stringency with four main biological criteria - stable expression across samples, essentiality, involved in cellular maintenance and evolutionary conserved. However, the key assumption of any definition of housekeeping genes is that they are expected to be expressed in every cell type in the organism. A transcriptomics analysis of samples from 40 tissues, 154 single cell types and 1132 cancer cell lines grouped into 28 cancer types was used to identify the number of protein-coding genes detected in all analyzed tissues, cell types or cancer cell line groups, respectively. The results of the analysis are shown in Table 1, which also presents numbers corresponding to genes with less variance in expression across samples based on exclusion of genes classified as enriched or elevated in the RNA expression categorisation. The overlaps between genes classified as Detected in all in the different datasets are shown in the Venn diagram in Figure 1.

Table 1. The genes with expression in all tissues, single cell types or cell lines including or excluding enriched and elevated genes.

Category TissuesSingle cellsCell linesOverlap
Detected in all 8813242595712337
Detected in all excluding enriched 8509233295092212
Detected in all excluding elevated 66406459158590

Figure 1. Venn diagram showing the overlaps between genes classified as Detected in all in the three different data sets. Corresponding gene lists can be obtained by clicking the numbers in the plot

Tau score is another method for measuring tissue specificity that does not depend on expression cut-offs and results in a specificity value between 0 and 1, where 0 means broadly expressed and 1 means specific expression. The bar charts in Figure 2 show overlaying data representing of the Tau scores for genes belonging to the three different Detected in all categories in Table 1, for the three different datasets.

Figure 2. The Tau scores for genes corresponding to the three categories of Detected in all in Table 1 overlayed in bar plots for the three different data sets. The content of different overlays is shown using mouse-over and corresponding gene lists can be obtained by clicking in the bar plots.

Functions of housekeeping proteins

House-keeping proteins exist in all classes of proteins but are clearly overrepresented in those involved in basic cellular functions such as gene expression and regulation, metabolism and cell structure. Below are treemap plots showing overrepresented functions for house-keeping genes in the three different data sets followed by examples of classes of house-keeping proteins.

          Tissues                                                         Single cells                                                  Cell lines mitochondrial electron transport, NADH to ubiquinone ribosome disassembly vesicle coating lipoprotein biosynthetic process protein quality control for misfolded or incompletely synthesized proteins RNA splicing, via transesterification reactions U2-type prespliceosome assembly primary miRNA processing protein targeting to ER cotranslational protein targeting to membrane protein targeting to peroxisome retrograde protein transport, ER to cytosol endoplasmic reticulum to cytosol transport vesicle targeting, rough ER to cis-Golgi COPII vesicle coating Golgi to endosome transport early endosome to Golgi transport regulation of protein neddylation protein alpha-1,2-demannosylation pseudouridine synthesis mitochondrial transcription negative regulation of transcription elongation by RNA polymerase II nuclear pore organization cellular response to leucine starvation mitochondrial electron transport, NADH to ubiquinone negative regulation of DNA- templated transcription, elongation nuclear pore organization protein quality control for misfolded or incompletely synthesized proteins protein targeting to ER regulation of protein neddylation RNA splicing, via transesterification reactions vesicle coating vesicle targeting, rough ER to cis- Golgi macromolecule biosynthetic process gene expression translation ubiquitin-dependent protein catabolic process proteasomal protein catabolic process regulation of translation cytoplasmic translation RNA processing mRNA processing RNA splicing mRNA splicing, via spliceosome intracellular protein transport protein-RNA complex assembly cytoplasmic translation intracellular protein transport mRNA processing peptide biosynthetic process positive regulation of transcription by RNA polymerase III protein- RNA complex assembly regulation of endoplasmic reticulum tubular network organization aerobic respiration mitochondrial respiratory chain complex I assembly NADH dehydrogenase complex assembly mitochondrial RNA metabolic process snRNA metabolic process tRNA aminoacylation tRNA aminoacylation for protein translation nuclear envelope organization vesicle coating mitotic sister chromatid cohesion lysosomal lumen acidification retrograde protein transport, ER to cytosol RNA splicing, via transesterification reactions maturation of LSU-rRNA U2-type prespliceosome assembly regulation of mitochondrial translation regulation of protein neddylation vesicle targeting, rough ER to cis-Golgi COPII vesicle coating positive regulation of chromosome segregation regulation of DNA damage checkpoint protein localization to centrosome aerobic respiration cellular response to misfolded protein mitochondrial respiratory chain complex I assembly negative regulation of DNA- templated transcription, elongation protein quality control for misfolded or incompletely synthesized proteins regulation of DNA damage checkpoint regulation of mitochondrial translation regulation of telomerase RNA localization to Cajal body RNA splicing, via transesterification reactions tRNA aminoacylation for protein translation vesicle coating vesicle targeting, rough ER to cis- Golgi

Figure 3. Treemap plots of GO biological processes based on gene set enrichment analysis for the three different data sets consisting of Detected in all genes

Transcription and translation

An easily understood class of housekeeping proteins are those involved in the genetic machinery of gene expression, e.g. RNA polymerases and ribosomal proteins, essential for transcribing and translating the DNA into proteins. It is intuitive that without these genes the cell and organism cannot function at all.

RNA Polymerases

The RNA polymerases are enzymes responsible for synthesizing RNA copies from a DNA template by the process of transcription. In eukaryotic cells, transcription takes place in the cell nucleus, illustrated in the images below showing distinct staining of RNA polymerase II subunit A (POLR2E) in the nucleus of every cell. Some of these RNA transcripts are further processed into messenger RNAs (mRNA), the direct templates for any protein, which are exported to the cytoplasm where translation takes place. Out of the 34 polymerase proteins (KEGG PATHWAY: hsa03020), 31 are found to be expressed in all tissues.

Figure 4. Immunohistochemical staining showing the nuclear localization of the polymerase protein POLR2E.

Ribosomal proteins

The ribosomal proteins form the ribosome complex together with ribosomal RNA (rRNA). The role of the ribosome complex is to translate the genetic code of the mRNA molecules into proteins. Translation is facilitated through a reading of the combination of three base codons of the mRNA, each codon coding for an amino acid, and the formation of a resulting peptide chain, which when done, will be post-processed to be turned into a functional protein. Translation occurs in the cytosol, isolated from transcription. Out of all 180 ribosomal proteins, 176 are found to be expressed in all studied tissues.

Figure 5. Immunohistochemical staining of ribosomal protein RPL17 in liver, showing the cytosolic localization of the protein.

Metabolism

Apart from being able to translate DNA into functional proteins, a cell also needs to extract energy from organic matter and to utilize the energy to construct necessary components. These diverse and essential processes are together referred to as metabolism.

Citric acid cycle

The citric acid cycle is a central part of the metabolic pathway that converts organic matter from carbohydrates, proteins and fats into chemical energy through a series of chemical reactions. The enzymes that catalyze these reactions are apt examples of housekeeping proteins, since all cells require energy to survive and function. Out of the 30 genes involved in the citric acid cycle (KEGG PATHWAY: hsa00020) 27 are expressed in all tissues. Genes that are exceptions always have variants that are expressed in all tissues, as exemplified by the pyruvate dehydrogenase complex subunits PDHA2 (expressed exclusively in testis) and PDHA1 (ubiquitously expressed).

Figure 6. The citric acid cycle takes place in the matrix of the mitochondria, illustrated here by the immunohistochemical staining of SDHB.

Mitochondrial proteins

The main location for energy production in the cell is the mitochondria where the citric acid cycle,, among other pathways, takes place. The mitochondrion is a semi-autonomous organelle that contains its own genome and has a separate machinery for protein synthesis. The majority of its genes have however been transferred to the nuclear genome. The mitochondrion with its central part in energy production is crucial for cell survival and therefore most proteins involved in its function and structure are considered to be housekeeping proteins.

Structural proteins

Many proteins involved in the basic structure of the cell are expressed ubiquitously in all cell types, since all cells naturally need certain structures and scaffolds to function. Structural proteins can have numerous functions, but one crucial and obvious housekeeping function is providing rigidity to the cell and to maintain its shape.

Cytoskeleton

The cytoskeleton is a scaffold present in the cytoplasm of all cells, consisting of different types of filaments. The cytoskeleton is also highly involved in the movement of cellular components. Since many specialized uses of the cytoskeleton are present in various cells, far from all genes associated with the cytoskeleton are expressed everywhere. For instance the myosin heavy chains are involved in muscle contraction, and are thus exclusively expressed in muscle tissues. However many of the components are necessary for basic cell functionality and expressed everywhere.

Location of housekeeping proteins

The location of the housekeeping proteins in the three different data sets were analysed using membrane and signal peptide prediction methods and antibody-based immunofluorescence. The predicted location was classified as membrane, secreted or intracellular based on the results of majority decision methods for membrane region predictions (MDM) and signal peptide predictions (MDSEC). Antibody -based assays were used to determine the subcellular location experimentally.

Predicted location

According to the predictions the majority of the housekeeping proteins are, not surprisingly, intracellular proteins. The pie charts in Figure 7 show the results of the analyses for the three different data sets and by clicking on the numbers the gene sets corresponding to the predicted locations can be investigated.

Figure 7. Predicted location of the genes belonging to the Detected in all category in the three data sets Tissues, Single cells and Cell lines.

Subcellular location

Immunofluorescence (ICC-IF) and confocal microscopy was used to determine the subcellular location of housekeeping proteins in Tissues, Single cells and Cell lines. The pie charts in Figure 8 show that the majority of the analysed proteins reside in the cytoplasm or nucleus

Figure 8. Subcellular location of the genes belonging to the Detected in all category in the three data sets Tissues, Single cells and Cell lines. Genes without experimental data are classified as N/A.

Relevant links and publications

Uhlén M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419