The intracellular and membrane proteins included in the secretome
The annotated secretome includes 881 proteins that are predicted to be intracellular or associated with a membrane despite having a signal peptide and lacking a transmembrane domain. Based on the prediction, these proteins are transported into the lumen of the endoplasmic reticulum (ER) and thus enter the secretory pathway, but they subsequently stay inside the cell to some extent. There are many potential reasons for this: Proteins can localize to compartments along the secretory pathway (ER, Golgi apparatus, secretory vesicles) by preventing secretion through specific retention signals, for example a C-terminal KDEL sequence on many luminal ER proteins. Alternatively, proteins can be directed to intracellular compartments such as lysosomes. Thirdly, proteins can become membrane-bound by attaching to membranes using post-translational modifications such as the GPI-anchor or by interacting with other membrane-bound proteins. Finally, the methods for predicting whether a protein contains a signal peptide and/or transmembrane regions do not have 100% accuracy and the predicted presence or absence of these structures might be misleading.
Functions of the intracellular and membrane proteins
All genes encoding for proteins that are predicted to be intracellular or membrane-bound were classified according to the function described by UniProt. The results of the analysis are presented in Figure 1.
Figure 1. Number of genes encoding proteins that are intracellular or membrane-bound, classified according to the function described by UniProt. Each bar is clickable and gives a search result of proteins that belong to the selected category.
The majority of the proteins (n=345) belongs to the class of enzymes. Typical members here are ER- and Golgi-resident enzymes, which are involved in protein synthesis and folding or in post-translation modifications, such as the glycosylation of secreted and cell surface proteins. Lysosomal proteins, such as members of the proteolytic cathepsin protein family or other protein-degrading enzymes, form another larger group.
Receptors (n=84) are often integrated into the plasma membrane and are essential for cells to communicate with their environment by binding to cell signaling molecules or the extracellular matrix. These proteins need to have a signal peptide, because they are transported through the secretory pathway to the plasma membrane. The prediction for secretion takes into account that isoforms without a transmembrane domain exist, but reviewing these proteins often shows that there is no reliable data suggesting a secreted isoform.
Noteworthy, there is also a large number (n=150) of proteins with no annotated function in UniProt, however, it is only around one fifth of the predicted intracellular or membrane-bound proteins, which is comparatively small. Although structural analyses and predicted subcellular location can indicate a certain rule for the protein, the exact biological function remains unclear.
Tissue specificity classification
Based on transcriptomics analysis, all genes encoding predicted intracellular and membrane-bound proteins are classified according to their tissue specificity and tissue distribution across major organs and tissue types (Figures 4 and 5). The relative distribution to the different specificity categories follows the same pattern as all protein-coding genes: around one third has low tissue specificity (n=371) and one third is tissue enhanced (n=306). Of particular interest are the genes that are only in expressed in a single tissue (n=16), with the testis as the most common single expressing tissue (n=6).
Figure 4. Number of genes encoding proteins that are intracellular or membrane-bound, categorized according to tissue specificity. Categories include: tissue enriched, defined as mRNA level in one tissue at least four-fold higher than all other tissues; group enriched, defined as four-fold higher average mRNA level in a group of two to five tissues all other tissues; tissue enhanced, defines as four-fold higher average mRNA level in one or more tissues compared to the mean mRNA level of all tissues; expressed in all, defined as ≥ 1 nTPM in all tissues; and not detected, defined as < 1 nTPM in all tissues.
Figure 5. Number of genes encoding proteins that are locally intracellular and membrane, categorized according to tissue distribution. Categories include: detected in all, defined as n=100%; detected in many, defined as 31%=< n <100%; detected in some, defined as 1< n <31%; detected in single defined as single n=1; and not detected, n=0.
Origin of the intracellular and membrane proteins
The tissue-enriched proteins can be further assorted to the tissue with the highest expression. Differences in the same proteome-wide analysis could reveal enrichment of proteins with a false prediction, which could then indicate tissue-specific characteristics. The number of tissue-elevated intracellular and membrane-bound proteins in the predicted secretome is highest in the brain, which is similar to the whole proteome. But the second-highest number is found in the intestines, mostly cell surface proteins; they might be more likely to be predicted for secretion than other proteins.