An introduction to the human secretome
A secretory protein can be defined as a protein which is actively transported out of the cell. In humans, cells such as endocrine cells and B-lymphocytes are specialized in the secretion of proteins, but all cells in the body secrete proteins to a varying degree. Proteins that are secreted from the cell play a crucial role in many physiological, developmental and pathological processes and are important for both intercellular and intracellular communication. The function of secreted proteins is diverse and range from exocrine secretion including enzymes in the digestive tract to endocrine secretion including insulin and other hormones released into the blood stream. In addition to being a rich source of new therapeutics and drug targets, a large fraction of the blood diagnostic tests used in the clinic are directed towards secreted proteins, emphasizing the importance of this class of proteins for medicine and biology.
Predicting secreted proteins
Secreted proteins were predicted based on N-terminal signal sequence (signal peptide) predictions and transmembrane region predictions, and in part according to subcellular location in UniProt. Signal peptides are found in most secreted proteins and some types of membrane proteins and the presence of trans-membrane regions was therefore used to filter out the membrane proteins from the secreted proteins. A whole-proteome scan of all ensembl transcripts was performed using three methods for signal peptide prediction: SignalP4.0, Phobius and SPOCTOPUS, which all have been shown to give reliable prediction results in a comparative analysis, and a majority decision-based method (MDM) based on seven methods for membrane protein topology prediction (Fagerberg L et al, 2010). All proteins with a predicted SP by at least two of the three methods and no predicted TM region according to the MDM are considered secreted. The number of ensembl genes encoding at least one predicted secreted protein is 3207.
Defining the secretome
The human secretome is here defined as all ensembl genes with at least one predicted secreted transcript according to HPA predictions, including also HPA non-secreted genes corresponding to UniProt entries with at least one form or isoform having keyword "Secreted". In total the human secretome set will then consist of 2793 genes (14% of all human protein-coding genes).
Figure 1. Categorisation of the 2793 secretome genes. By clicking the numbers the related gene lists are obtained.
Categorising the secretome
The secretome genes were subsequently annotated individually based on literature, subcellular localisation data, function, and protein -and RNA expression data from different sources including HPA, UniProt, GTEx and FANTOM, to try to determine involvement in local or systemic secretion and to investigate their spatial distribution in the human body. The results are shown in the figure above, and the individual categories can be further explored by clicking on the respective link in the text below. In summary, 784 proteins were identified as blood proteins, with a fraction of these having other main location, 525 were annotated as secreted to local compartments, e.g. male or female tissues, brain or other local tissues e.g. the eye or the skin. 92 proteins were identified as secreted to the digestive system, and there are also 237 proteins involved in the forming and function of the extracellular matrix, including laminins, collagens, elastin and fibronectin. 890 proteins were annotated to be intracellular and membrane bound proteins including e.g. ER/Golgi residing proteins, mitochondrial proteins, lysosomal proteins and membrane-associated proteins. 117 genes were believed to be secreted but the final location is unkown and this might be an interesting group to investigate further to explore their function and location. The last category includes the 148 predicted secreted genes encoding some of the constant, variable, joining and diversity regions of immunoglobulin genes. After excluding the genes that were annotated as intracellular or membrane-bound we suggest that the human secretome consists of 1903 genes having at least one secreted protein variant.
Figure 2. The tissue specificity of genes belonging to the different secretome categories. Categories include: tissue enriched, defined as mRNA level in one tissue at least four-fold higher than in all other tissues; group enriched, defined as four-fold higher average mRNA level in a group of two to five tissues compared to all other tissues; tissue enhanced, defined as four-fold higher average mRNA level in one or more tissues compared to the mean mRNA level of all tissues; expressed in all, defined as ≥ 1 nTPM in all tissues; and not detected, defined as < 1 nTPM in all tissues.