The human secretome and membrane proteome


Approximately 39% of the 19628 human protein-coding genes are predicted to have either a signal peptide and/or at least one transmembrane region suggesting active transport of the corresponding protein out of the cell (secretion) or location in one of the numerous membrane systems in the cell. Interestingly, several genes code for multiple protein isoforms (splice variants) with alternative locations, including 675 genes with both secreted and membrane-bound isoforms. 2918 genes (15%) are predicted to have at least one secreted protein product, while 5455 (28%) are predicted to have at least one membrane-bound protein product. In addition, 11930 (61%) genes are predicted to be intracellular, i.e. no secreted or membrane-bound protein product, and most likely as intracellular proteins in the cytoplasm and/or nucleus. In Figure 1, the number of protein-coding genes in the various categories are shown for all 19628 genes.

Figure 1. The number of all human protein-coding genes predicted to be (1) intracellular, (2) membrane-spanning (3) secreted and (4) membrane-spanning and secreted protein isoforms, where the latter consists of a group of genes with multiple splice variants with at least one secreted and one membrane-spanning.



The importance of secreted and membrane-bound proteins


Proteins that are secreted from the cell or located in the cellular membranes play a crucial role in many physiological and pathological processes. Medically important secreted proteins include cytokines, coagulation factors, growth factors and other signaling molecules. The functions of membrane proteins are diverse and include ion channel activity or transport of other molecules across the membrane, enzymatic processes, anchoring of other proteins and receptor signaling. A large fraction of the clinically approved treatment regimes today use drugs directed towards (or consisting of) secreted proteins or cell surface-associated membrane proteins. Out of the 646 protein targets with known pharmacological action for approved drugs on the market at present, 157 are secreted and 379 membrane-bound. See the Druggable Proteome page for more details.

What is a secreted protein?


A secretory protein can be defined as a protein which is actively transported out of the cell. In humans, cells such as endocrine cells and B-lymphocytes are specialized in the secretion of proteins, but all cells in the body secrete proteins to a varying degree. In addition to being a rich source of new therapeutics and drug targets, a large fraction of the blood diagnostic tests used in the clinic are directed towards secreted proteins, emphasizing the importance of this class of proteins for medicine and biology. The most abundant secreted proteins include pancreatic enzymes (PRSS1, CELA3A, AMY2A) and other digestive enzymes expressed in salivary gland (PRR4, STATH, ZG16B) or stomach (PGA3, PGA4). One of the most important secretory organs is the liver, which produces a large number of plasma proteins such as albumin, fibrinogen and transferrin. Another group of highly abundant secreted proteins belong to the defensin family and are secreted by hematopoietic cells in the bone marrow (DEFA1, DEFA1b, DEFA3).

Figure 2. Immunohistochemistry-based images from the secreted proteins CELA3A (Chymotrypsin-like elastase family, member 3A) in pancreas, CPA1 (Carboxypeptidase 1) in pancreas and AMY1B (amylase alpha 1B) in salivary gland.



What is a membrane protein?


Membrane proteins constitute one of the largest and most important classes of proteins. A membrane protein is associated or attached to the membrane of a cell or an organelle inside the cell and can be classified as either peripheral or integral. Peripheral membrane proteins are associated with the membrane by being bound to either peripheral regions of the membrane or to integral membrane proteins, but they do not fully span the membrane. Integral membrane proteins contain alpha-helical or beta-barrel structures which are hydrophobic and therefore can span the entire lipid bilayer and are linked by extra-membranous loop regions. The alpha-helical integral membrane proteins form the major category of membrane proteins and are found in all types of biological membranes and will be the main focus here. Their key roles as transporters and receptors explain why they represent approximately 59% of all currently approved drug targets and hence their immense importance for the pharmacological industry. Many important receptors and cell surface molecules are found in the list of human cell differentiation molecules (CD-markers). G-protein coupled receptors (GPCRs), which contain seven transmembrane (TM) segments and include approximately 800 of the human protein-coding genes, comprise the largest group of membrane protein drug targets.

Figure 3. Different classes of membrane proteins.



Figure 4. Immunohistochemistry-based images from the CD marker C5AR1 in gall bladder, the G-protein coupled receptor CYSLTR2 in placenta and DSC2 in esophagus.



Prediction of transmembrane protein topology and signal peptides


Developing a better understanding of membrane protein structure and function is of immense importance for both biological and pharmacological purposes. Since membrane proteins are difficult to crystallize and severely underrepresented in structural databases, computational prediction of membrane protein structure has been crucial for continued studies of these key molecules. Most membrane protein prediction methods have focused on the topology of a-helical membrane proteins, i.e. the prediction of the position of the transmembrane (TM) segments in the protein sequence and their orientation relative to the membrane (Figure 5).

Figure 5. A schematic view of the topology of an alpha-helical membrane protein with four transmembrane segments and extracellular N- and C-terminals.

The TM segments are identified based on features such as length, amino acid property and hydrophobicity, and many prediction methods are based on machine-learning techniques. Here, a selection of seven prediction algorithms was used to create a majority decision-based method (MDM), using the combined results from the chosen tools, to estimate the human membrane proteome. Each protein with at least one TM segment with overlapping predictions by four out of the seven methods is considered a membrane protein. Table 1 shows the individual results in number of predicted protein-coding genes by each method, as well as the MDM prediction.

Table 1. Prediction of the human membrane proteome by seven different prediction methods for membrane protein topology as well as the majority decision-based method MDM and a method specialized in prediction of GPCRs.

Protein class

Number of genes

Number of proteins

Source

Predicted membrane proteins 5455 15118 MDM
MEMSAT3 predicted membrane proteins 7217 20080 MEMSAT3
MEMSAT-SVM predicted membrane proteins 6286 17565 MEMSAT-SVM
Phobius predicted membrane proteins 5753 15803 Phobius
SCAMPI predicted membrane proteins 6332 16821 SCAMPI
SPOCTOPUS predicted membrane proteins 7638 21437 SPOCTOPUS
THUMBUP predicted membrane proteins 7111 19960 THUMBUP
TMHMM predicted membrane proteins 5503 15275 TMHMM
GPCRHMM predicted membrane proteins 827 1300 GPCRHMM

The N-terminal signal sequences that are found in most secreted proteins and some types of membrane proteins are often called signal peptides (SP). A signal peptide is primarily identified by a short hydrophobic alpha-helix combined with a number of features that enables computational prediction based on the amino acid sequence of the protein. There are also a number of methods which incorporate a SP prediction model into their TM topology prediction algorithm to allow for more reliable results when it comes to distinguishing between the two features. Here, the human secretome has been predicted by performing a whole-proteome scan using three methods for signal peptide prediction: SignalP4.0, Phobius and SPOCTOPUS, which all have been shown to give reliable prediction results in a comparative analysis. Similarly to the MDM, a majority decision-based method for secreted proteins (MDSEC) has been constructed using results from three different prediction methods. All proteins with a predicted SP by at least two of the three methods are considered secreted. Since signal peptides are found both in secreted proteins and in certain types of membrane proteins, all proteins with a predicted SP in combination with a predicted TM region according to the MDM are considered membrane-spanning and therefore not secreted. The resulting numbers of genes encoding a predicted secreted protein are shown in Table 2.

Table 2. Prediction of the human secretome by three different prediction methods for signal peptides as well as the MDSEC.

Protein class

Number of genes

Number of proteins

Source

Predicted secreted proteins 2918 6621 HPA
SignalP predicted secreted proteins 2504 5719 SignalP
Phobius predicted secreted proteins 3304 7460 Phobius
SPOCTOPUS predicted secreted proteins 3705 8045 SPOCTOPUS


Classification of the human proteome


The combined results from analyses of the membrane proteome and the secretome are used to map the distribution of potential membrane proteins and secreted proteins in the human proteome. The protein isoforms of all human genes are annotated using the three categories: (i) secreted, (ii) membrane and (iii) intracellular (i.e., proteins with no predicted SP/TM features). Note that proteins classified as membrane may be located in intracellular membranes such as the endoplasmic reticulum or Golgi. Each of the human protein-coding genes are subsequently classified into those with all isoforms belonging to one of these groups or genes encoding protein isoform belonging to two or all three categories. The results (Figure 6) show that 39% of the human predicted genes have at least one protein isoform which is membrane-spanning or secreted (see top of page).

Figure 6. Venn diagram showing the overlap between the number of genes that are intracellular, membrane-spanning, secreted, or with isoforms belonging more than one of the three categories.



Examples of protein classes including secreted and membrane proteins


There are a number of important protein classes involving membrane-, proteome- and secretome-related proteins. In Table 3, some examples of such classes are presented.

Table 3. A selection of classes related to the membrane proteome and secretome.

Protein class

Number of genes

Number of proteins

Source

CD markers 374 940 UniProt
Transporters 1163 2661 TCDB
GPCRs excl olfactory receptors 395 742 UniProt
Voltage-gated ion channels 132 337 IUPHAR-DB
Plasma proteins 3751 8813 Plasma Proteome Database


The plasma proteome


Plasma is the clear, liquid fraction of the blood which is left when the white blood cells, red blood cells and platelets are removed. It is composed of water (90%), proteins (7-8%) and smaller substances such as salts, gases and nutrients. The most important functions of plasma includes transport of compounds needed in different parts of the body, balancing the fluid exchange of all tissues by regulating the osmotic pressure, as well as playing a large role in immune system function. Most cells in the body communicate with plasma directly or indirectly through other fluids. Analysis of the proteins present in plasma can therefore provide important information about a patient's health.

The plasma proteome has an extraordinary dynamic range spanning more than 10 orders of magnitude between the concentration of the most abundant protein albumin (ALB), which acts as a transporter and helps maintain colloid osmotic pressure, and the rarest proteins detectable today, which include interleukins and tissue leakage proteins. 90% of the plasma proteome consists of the ten most highly abundant proteins, which along with albumin include fibrinogen, involved in blood clotting, and immunoglobulins mainly involved in immune processes.

Although many proteins of the plasma proteome are secreted proteins that have gone through the secretory pathway, another group is composed of tissue leakage proteins which are found within cells but can be released into plasma as a result of cell death or damage. There is also an interesting class of proteins which go through a non-classical secretion without entering the ER/Golgi-pathway and includes cytokines such as interleukin 1β (IL1B) and mitogens such as fibroblast growth factor 2 (FGF2).

A list of plasma proteins obtained from the Plasma Proteome Database can be found here.

The secretory pathway


In the secretory pathway (Figure 7), proteins with a signal sequence that guides them to the endoplasmic reticulum (ER) are transported from the ER through the Golgi apparatus via vesicles to arrive at the surface of the cell. The signal sequence targeting proteins for secretion, called a signal peptide, is a short, hydrophobic N-terminal sequence which is inserted into the ER membrane and subsequently cleaved off from the protein. Membrane proteins may also contain a SP, but most often the N-terminal transmembrane (TM) region function as the signal sequence. The ER signal sequences are recognized by chaperone proteins which guide the synthesizing ribosomes to the rough ER where translocation of the protein sequence occurs in a protein complex named the translocon. Membrane proteins are transferred to the lipid bilayer of the ER membrane via the translocon whereas secretory proteins are transported into the ER lumen. Once inside the ER lumen, other chaperone proteins make sure that the protein is folded and assembled correctly and the oxidative environment allows for formation of disulfide bonds, addition of carbohydrates and proteolytic cleavages. The proteins that pass the ER quality control are transported via vesicles to the Golgi apparatus, where they are further modified in important processes such as glycosylation and phosphorylation. The Golgi is also responsible for sorting of proteins for transport to their final destination, which most often is the plasma membrane, lysosomes or secretion out from the cell.

Figure 7. Overview of the secretory pathway.



Relevant links and publications


UhlÚn M et al, 2015. Tissue-based map of the human proteome. Science
PubMed: 25613900 DOI: 10.1126/science.1260419