The gastrointestinal tract-specific proteome

The main function of the gastrointestinal tract (GIT) is the uptake of nutrients and water, providing the fundament for the continued construction and maintenance of the human body. The intake of food regulates the secretion of various hormones from the GIT, including factors involved in appetite control and mood management. Another major challenge of the GIT is the protection against pathogens while at the same time maintaining homeostasis among the vast population of beneficial microorganisms in the gut. The GIT stretches from the oral cavity via the esophagus, stomach, small intestine, large intestine to the rectum.

The transcriptome analysis shows that 75% of all human proteins (n=19628) are expressed in at least one region of the GIT. Only 1% of all genes were classified as tissue- or group-enriched, including genes primarily expressed in one or several of the GIT tissues and up to six additional tissues. The function of these proteins that are specifically elevated in the GIT are well in line with the features and functions of the different anatomical and functional regions of the GIT and include proteins involved in nutrient breakdown, transport and metabolism, the entero-endocrine system, host protection and the maintenance of tissue morphology.

Tissues that have been included in the composite GIT analyses include stomach, duodenum, small intestine and colon. In addition, the specific proteomes of the salivary gland and the esophagus are presented separately.

  • 35 GIT-enriched genes
  • Highest number of single tissue enriched genes in the stomach
  • 623 genes defined as elevated in the GIT

Figure 1. The distribution of all genes across the five categories based on transcript abundance in GIT as well as in all other tissues.

623 genes show some level of elevated expression in the GIT compared to other tissues. The four categories of genes with elevated expression in the GIT compared to other organs are shown in Table 1. The list of tissue-enriched (n=35) and group-enriched genes (n=226) is well in line with the various functions of the GIT.

Table 1. The genes with elevated expression in GIT.



Number of genes


Tissue enriched stomach 23 At least five-fold higher mRNA levels in stomach
duodenum 5 At least five-fold higher mRNA levels in duodenum
small intestine 6 At least five-fold higher mRNA levels in small intestine
colon 1 At least five-fold higher mRNA levels in colon
Group enriched 226 Higher mRNA levels in a group of tissues including at least one GIT tissue
Tissue enhanced 362 At least five-fold higher mRNA levels in a particular tissue as compared to average levels in all tissues
Total 623 Total number of elevated genes in GIT

161 of the 226 group enriched genes have higher mRNA levels in a group of tissues including more than one GIT tissue.

Table 2. The 12 genes with the highest level of tissue type-specific expression in the GIT.


Tissue (Group)





Small intestine


PGA4 Stomach 4166 27159.5 0 0 0
GIF Stomach 1634 1264.8 0 0 0
GKN1 Stomach 1194 31692.5 0 0 0
LCT Duodenum, Small intestine 640 0 229.4 146.2 0
PGA5 Stomach 590 2980.0 0 0 0
DEFA6 Duodenum, Small intestine 521 0 5909.3 10744.7 0
PGA3 Stomach 480 35557.1 0 0 0
DEFA5 Duodenum, Small intestine 462 0 10985.4 19506.1 0
CCL25 Duodenum, Small intestine 320 0 544.3 667.7 0
ATP4A Stomach 250 537.7 0 0 0
RBP2 Duodenum, Small intestine 207 0 2824.3 2085.7 0
GAST Stomach 197 11343.0 0 0 0

The GIT transcriptome

An analysis of the expression levels of each gene makes it possible to calculate the relative mRNA pool for each of the categories. The analysis showed that 76% of the mRNA molecules in the GIT corresponded to housekeeping genes and only 15% of the mRNA pool represented genes categorized to be tissue-enriched, group-enriched or enhanced involving at least one tissue of the GIT. Thus, most of the transcriptional activity in the GIT relates to proteins with presumed housekeeping functions as they are found in all tissues and cells analyzed.

Protein expression of genes elevated in the GIT

In-depth analysis of genes with a GIT-enriched expression pattern using antibody-based protein profiling allowed us to create a map of where these proteins are expressed within the GIT with respect to cell type and subcellular localization. In the following paragraphs, a selection of these proteins is discussed within the context of specific functions of the GIT.

Nutrient breakdown, transport and metabolism

The most apparent function of the GIT is the digestion of food and uptake of nutrients. The stomach secretes a number of unique enzymes involved in nutrient breakdown including the pepsinogens PGA3, PGA4 and PGC and the lipase LIPF. Both enzyme types are produced by gastric chief cells and become active under acidic conditions. The respective environment is provided by the gastric hydrogen-potassium ATPase, composed of the alpha subunit ATP4A and the beta subunit ATP4B, which is expressed in the parietal cells of the stomach. Gastric intrinsic factor (GIF), another protein produced in human parietal cells, binds to the essential vitamin B12 to allow for receptor mediated uptake in the ileum.

In more distal regions of the GIT, specific expression of an increasing number of transport-related proteins was found. These included the solute carrier family 5, member 1 (SLC5A1), a protein mediating the sodium-dependent uptake of glucose from the small intestine and the fatty acid binding proteins FABP2 and FABP6, involved in the intracellular transport of fatty acids within endothelial cells along the length of the small intestine (FABP2) or specifically the small intestine area (FABP6). The cellular retinol binding protein 2 (RBP2), involved in uptake and metabolism of vitamin A, is another fatty acid binding protein that is specifically expressed in the small intestine. The pancreatic and duodenal homeobox 1 (PDX1) is a transcription factor that activates insulin transcription and plays a role in beta-cell function and survival. Here, we found the highest expression of PDX1 mRNA and protein in the human duodenum.

PGC - stomach
LIPF - stomach
ATP4B - stomach

GIF - stomach
SLC5A1 - small intestine
FABP6 - small intestine

RBP2 - duodenum
PDX1 - duodenum

The entero-endocrine system

The uptake of food is also linked to the secretion of a range of hormones from specialized neuroendocrine cells within all regions of the GIT. Gastrin (GAST) is primarily expressed in G-cells of the stomach, where it stimulates the secretion of hydrochloric acid from parietal cells and contributes to the maintenance of tissue morphology. Peptide YY (PYY) expressing cells are primarily found in colon and small intestine. PYY controls the appetite and has a stabilizing effect on mood-related behavior. Another hormone governing food intake is the gastric inhibitory polypeptide (GIP). Additional functions of GIP include the regulation of glucose-dependent insulin secretion, beta-cell survival, lipogenesis and enhanced bone formation. The insulin-like peptide 5 (INSL5) has only recently been described as a marker for endocrine cells and tumors of the colon. Its function has not yet been fully elucidated and information regarding its tissue-specific expression has differed. Here, we show marked enrichment of INSL5 gene expression in the large intestine and protein localization to a specific cell population within colon and rectum, supporting its role as a protein specifically expressed in enteroendocrine cells within this area.

GAST - stomach
PYY - small intestine
GIP - duodenum

INSL5 - rectum

Host protection

A prominent function of the GIT is the protection from pathogens and concurrent maintenance of homeostasis among a diverse community of commensal microorganisms. The mucous layer lining the GIT forms a primary barrier between endothelium and microbiota. Of the 21 mucin (MUC) genes encoded by the human genome, several were found to be specifically expressed in defined areas of the human GIT, including MUC5AC expressed in the stomach and MUC6 expressed in both the stomach and duodenum. Paneth cells play a pivotal role in the enteric innate immune defense by secreting a range of antimicrobial proteins, including defensin alpha 6 (DEFA6). Direct antimicrobial activity of DEFA6 is low. However, it has been demonstrated that upon contact with bacterial surface proteins, secreted DEFA6 assembles into net-like structures, entangling potentially harmful intruders.

Intelectin 2 (ITLN2) is an antimicrobial molecule involved in the protection from parasite infection, found to be specifically expressed in paneth and goblet cells of the human small intestine with lower levels detected in colon and lung. Another member of the enteric innate immune system is NLR family, pyrin domain containing 6 (NLRP6). NLRP6 has been shown to participate in inflammasomes, mediating activation of the NF-κB pathway and interleukin-associated inflammation. Mice deficient in NLRP6 showed a characteristically different composition of gut bacteria compared to wild-type individuals, resulting in greater susceptibility to colitis and colorectal cancer. In human tissue, the NLRP6 protein is expressed in glandular cells of the small intestine.

MUC5AC - stomach
DEFA6 - duodenum
ITLN2 - duodenum

NLRP6 - duodenum

Maintenance of tissue morphology

Caudal-related homeobox (CDX) transcription factors regulate many genes essential for the maintenance of intestinal morphology and are expressed in the small intestine and colon. CDX2 expression has been associated with the regulation of intestinal cell proliferation and differentiation, cell adhesion and migration but also with intestinal inflammation. Similar reactivity has been described for the glycoprotein A33 (GPA33), a transmembrane protein with analogous tissue specificity to CDX2. Both, CDX2 and GPA33 are maintained in a high fraction of colorectal cancers and CDX2 antibodies are often used as diagnostic markers for cancer of colorectal origin.

The transmembrane protein MS4A12 is a transcriptional target of CDX2 in the colon. MS4A12 is highly enriched in the brush membrane where it is involved in store-operated Ca2+ entry and as such participates in epidermal growth factor receptor signaling. Another protein found localized to the brush membrane of the small intestine was the cadherin-related family member 2 (CDHR2). The function of CDHR2 remains ambiguous. Earlier studies suggest an involvement in the inhibition of proliferation in response to cell contact by maintenance of sub-membranous localization of beta-catenin.

CDX2 - colon
GPA33 - colon
MS4A12 - colon

CDHR2 - duodenum

Genes shared between the GIT and other tissues

There are 226 group-enriched genes expressed in the GIT. Group-enriched genes are defined as genes showing a 5-fold higher average level of mRNA expression in a group of 2-7 tissues, including at least one tissue of the GIT, compared to all other tissues.

In order to illustrate the relation of GIT tissues to other tissue types, a network plot was generated, displaying the number of commonly expressed genes between different tissue types.

Figure 2. An interactive network plot of theáGITáenriched and group enriched genes connected to their respective enriched tissues (grey circles).áRedánodes represent the number ofáGIT enriched genes andáorangeánodes represent the number of genes that are group enriched. The sizes of the red and orange nodes are related to the number of genes displayed within the node. Each node is clickable and results in a list of all enriched genes connected to the highlighted edges. The network is limited to group enriched genes in combinations of up toá5átissues, but the resulting lists show the complete set of group enriched genes in the particular tissue.

Within the GIT, the most common locations of shared gene expression involved duodenum and small intestine (43 genes), followed by genes expressed in duodenum, small intestine, colon and rectum (22 genes). Including organs outside of the GIT, duodenum and small intestine shared the highest number of expressed genes with the liver (14 genes), followed by duodenum, small intestine and kidney (5 genes).

Two examples of genes with shared expression between a tissue in the GIT and a tissue outside of the GIT are a member of the cytochrome P450 enzyme family (CYP3A4) which is necessary for several reactions including drug metabolism, and the homeobox protein PAX6 which regulates gene transcription and has been suggested to have a function during brain development.

CYP3A4 - duodenum
CYP3A4 - liver

PAX6 - stomach
PAX6 - cerebellum

General histology of the GIT from the stomach to the rectum

The general structure of all parts of the GIT is

  1. Tunica serosa /adventitia: Loose connective tissue with elastic and collagen fibers, nerves and vessels, covered by a single layer of flat mesothelial cells. Where there is no mesothelial cover the outermost layer is called adventitia.
  2. Tela subserosa: Thin layer of loose connective tissue separating the serosa and muscle layer.
  3. Tunica muscularis: For most parts composed of an inner circular and outer longitudinal smooth muscle layer. Between the muscle fibers the myenteric plexus of Auerbach can be identified.
  4. Tela submucosa: A thick layer of loose connective tissue with numerous of blood and lymphatic vessels. Here is where the ganglion cells of the submucosal plexus of Meissner might be seen.
  5. Tunica mucosa: The innermost layer that comes in contact with the gastrointestinal content. It has secretory and absorptive function. The mucosa consists of the innermost epithelium that forms surface cells and glands, embedded in the lamina propria containing mainly of loose connective tissue with small blood vessels and immune cells. A thin layer of smooth muscle, lamina muscularis mucosae, demarcates the division of the mucosa and submucosa.


Here, the protein-coding genes expressed in the GIT are described and characterized, together with examples of immunohistochemically stained tissue sections that visualize protein expression patterns of proteins that correspond to genes with elevated expression in the GIT.

Transcript profiling and RNA-data analyses based on normal human tissues have been described previously (Fagerberg et al., 2013). Analyses of mRNA expression including over 99% of all human protein-coding genes was performed using deep RNA sequencing of 172 individual samples corresponding to 37 different human normal tissue types. RNA sequencing results of 23 fresh frozen tissues representing normal GIT was compared to 149 other tissue samples corresponding to 33 tissue types, in order to determine genes with elevated expression in GIT. A tissue-specific score, defined as the ratio between mRNA levels in GIT compared to the mRNA levels in all other tissues, was used to divide the genes into different categories of expression. These categories include: genes with elevated expression in GIT, genes expressed in all tissues, genes with a mixed expression pattern, genes not expressed in GIT, and genes not expressed in any tissue. Genes with elevated expression in GIT were further sub-categorized as i) genes with enriched expression in GIT, ii) genes with group enriched expression including GIT and iii) genes with enhanced expression in GIT.

Human tissue samples used for protein and mRNA expression analyses were collected and handled in accordance with Swedish laws and regulation and obtained from the Department of Pathology, Uppsala University Hospital, Uppsala, Sweden as part of the sample collection governed by the Uppsala Biobank. All human tissue samples used in the present study were anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board.

UhlÚn M et al, 2015. Tissue-based map of the human proteome. Science
PubMed: 25613900 DOI: 10.1126/science.1260419

Yu NY et al, 2015. Complementing tissue characterization by integrating transcriptome profiling from the Human Protein Atlas and from the FANTOM5 consortium. Nucleic Acids Res.
PubMed: 26117540 DOI: 10.1093/nar/gkv608

Fagerberg L et al, 2014. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics.
PubMed: 24309898 DOI: 10.1074/mcp.M113.035600

Gremel G et al, 2014. The human gastrointestinal tract-specific transcriptome and proteome as defined by RNA sequencing and antibody-based profiling. J Gastroenterol.
PubMed: 24789573 DOI: 10.1007/s00535-014-0958-7

Histology dictionary