The human isoform proteome
The structural space of the human proteome is large and diverse due to the presence of various protein variants (isoforms), including post-translational modifications, splice variants, proteolytic products, genetic variations and somatic recombination. For example, there are tens of million different IgG-molecules in a human body at a given time due to an elaborate process of somatic recombination and targeted mutation. In addition, a large portion of the protein-coding genes (approximately 80%) have splice variants that yield protein products of different sizes. Similarly, more than hundreds of thousands post-translational modifications have been reported as part of various proteomics efforts and many proteins depend on precise proteolysis for activation. Furthermore, approximately 320000 variations between individuals in the population have been reported in protein-coding regions as a result of the 1000 Genomes Project. In summary, the human diversity of the 20162 protein-coding genes is increased immensely by the presence of numerous protein isoforms.
Alternative splicing is a widely used mechanism for the formation of isoforms. In this process, which occurs during gene expression, the exons of a gene may be included or excluded in the processed mRNA. Proteins translated from alternatively spliced mRNAs will therefore contain differences in their amino acid sequence, and therefore they often differ in their functional properties.
The four major subtypes of alternative splicing:
Figure 1. The major types of alternative splicing.
Many genes encode for multiple protein isoforms (splice variants) with alternative subcellular locations, including 226 genes with both secreted and membrane-bound isoforms. These genes are of particular interest. In Figure 2, the fractions of the various categories are shown for all 20162 genes.
Figure 2. Venn diagram showing the overlap between the number of genes that are intracellular, membrane-spanning, secreted, or with isoforms belonging to more than one of the three categories.
Post-translational modifications (PTMs) are chemical modifications that play a key role in the function of a protein, since they regulate activity, localization and interaction with other cellular molecules such as proteins, nucleic acids, lipids, and cofactors. They also have the possibility to regulate cellular activity. PTMs occur at distinct amino acid side chains or peptide linkages and are most often mediated by enzymatic activity. Post-translational modification can occur at any step in the "life cycle" of a protein.
Some common and important types of PTMs:
Other common post-translational modifications are S-nitrosylation, methylation, N-acetylation, lipidation, disulfide bond formation, sulfation, acylation, deamination etc.
After translation, some proteins undergo proteolytic processing. This process is highly specific and as a result of the cleavage of one or more bonds in the target protein by proteases, the activity of the protein will be altered.
A large number of proteins are synthesized as inactive precursors, so called proproteins. To activate these proteins, removal of the propeptide via proteolytic processing is needed. Proteolysis of the precursor proteins will result in regulation of many cellular processes. Well-studied proteins that undergo this process are insulin (INS) and factor VIII (F8).
Although all humans are almost identical biochemically (99.9%), there are large variations between individuals in the population as a result of allele-specific genetic variations in the protein-coding regions. Many of the genetic variations are in non-coding regions of the genome, but some also affect the amino acids in the protein-coding parts of a particular gene. Approximately 17800 genes have been described with genetic variations yielding protein isoforms based on the 1000 Genomes Project.
Somatic recombination is a mechanism of genetic recombination that is unique to the immunoglobulin and T-cell receptor genes. In this process immunoglobulins and T-cell receptors of high diversity are produced.
Relevant links and publications