The human isoform proteome
The structural space of the human proteome is large and diverse due to the presence of various protein variants (isoforms), including post-translational modifications, splice variants, proteolytic products, genetic variations and somatic recombination. For example, there are tens of million different IgG-molecules in a human body at a given time due to an elaborate process of somatic recombination and targeted mutation. In addition, a large portion of the protein-coding genes (approximately 80%) have splice variants that yield protein products of different sizes. Similarly, more than hundreds of thousands post-translational modifications have been reported as part of various proteomics efforts and many proteins depend on precise proteolysis for activation. Furthermore, approximately 320000 variations between individuals in the population have been reported in protein-coding regions as a result of the 1000 Genomes Project. In summary, the human diversity of the 19613 protein-coding genes is increased immensely by the presence of numerous protein isoforms.
Alternative splicing is a widely used mechanism for the formation of isoforms. In this process, which occurs during gene expression, the exons of a gene may be included or excluded in the processed mRNA. Proteins translated from alternatively spliced mRNAs will therefore contain differences in their amino acid sequence, and therefore they often differ in their functional properties.
The four major subtypes of alternative splicing:
- Exon skipping (Cassette exons) is the most prevalent form of alternative splicing. In this mode, the exon is spliced out of the primary transcript together with its flanking introns.
- Alternative donor site is the type when two or more splice sites are recognized at the 3' end of an exon. This mode is also called Alternative 5' splice site.
- Alternative acceptor site is the type when two or more splice sites are recognized at the 5' end of an exon. This mode is also called Alternative 3' splice site.
- Intron retention is the mode in which an intron can remain in the mature mRNA molecule.
Figure 1. The major types of alternative splicing.
Many genes encode for multiple protein isoforms (splice variants) with alternative subcellular locations, including
684 genes with both secreted and membrane-bound isoforms.
These genes are of particular interest. In Figure 2, the fractions of the various categories are shown for all 19613 genes.
Figure 2. Venn diagram showing the overlap between the number of genes that are intracellular, membrane-spanning, secreted, or with isoforms belonging to more than one of the three categories.
Post-translational modifications (PTMs) are chemical modifications that play a key role in the function of a protein, since they regulate activity, localization and interaction with other cellular molecules such as proteins, nucleic acids, lipids, and cofactors. They also have the possibility to regulate cellular activity. PTMs occur at distinct amino acid side chains or peptide linkages and are most often mediated by enzymatic activity. Post-translational modification can occur at any step in the "life cycle" of a protein.
Some common and important types of PTMs:
- Glycosylation: addition of sugar chains, either at the amide nitrogen on the side-chain of asparagine (N-glycosylation) or on the hydroxyl oxygen on the side-chain of serine or threonine (O-glycosylation). The list of glycoproteins is long and they can serve a number of different functions, for example in the immune response (immunoglobulins family), as structural molecules (collagen family), hormones (HCG, TSH, EPO), transport molecules (transferrin), enzymes (alkaline phosphatase) and receptors.
- Phosphorylation: addition of a phosphate group, usually to tyrosine, serine, threonine, histidine or aspartate. This modification is reversible and can for example activate/inactivate enzymes and receptors. A classical example where phosphorylation plays a very important role is in the regulation of the p53 tumor suppressor protein and proteins in various signal pathways, such as RAS pathway and STAT.
- Ubiquitination: addition of ubiquitin will give a signal for degradation, alter the cellular location or affect the activity or interactions.
Other common post-translational modifications are S-nitrosylation, methylation, N-acetylation, lipidation, disulfide bond formation, sulfation, acylation, deamination etc.
After translation, some proteins undergo proteolytic processing. This process is highly specific and as a result of the cleavage of one or more bonds in the target protein by proteases, the activity of the protein will be altered.
A large number of proteins are synthesized as inactive precursors, so called proproteins. To activate these proteins, removal of the propeptide via proteolytic processing is needed. Proteolysis of the precursor proteins will result in regulation of many cellular processes. Well-studied proteins that undergo this process are insulin (INS) and factor VIII (F8).
Although all humans are almost identical biochemically (99.9%), there are large variations between individuals in the population as a result of allele-specific genetic variations in the protein-coding regions. Many of the genetic variations are in non-coding regions of the genome, but some also affect the amino acids in the protein-coding parts of a particular gene. Approximately 17800 genes have been described with genetic variations yielding protein isoforms based on the 1000 Genomes Project.
Somatic recombination is a mechanism of genetic recombination that is unique to the immunoglobulin and T-cell receptor genes. In this process immunoglobulins and T-cell receptors of high diversity are produced.
Relevant links and publications
Uhlén M et al, 2015. Tissue-based map of the human proteome. Science
PubMed: 25613900 DOI: 10.1126/science.1260419