The lung-specific proteome

The lung is a respiratory organ essential for breathing and responsible for the gaseous exchange between air and blood. The branching airways end in alveoli where the gaseous exchange occurs and the cell types present in lung tissue are dominated by pneumocytes, bronchial epithelium, alveolar macrophages, endothelial cells and interstitial cells. The transcriptome analysis shows that 73% of all human proteins (n=19628) are expressed in the lung and 183 of these genes show an elevated expression in lung compared to other tissue types. An analysis of the genes with elevated expression in the lung reveals that the corresponding proteins are expressed in the various cell types present in lung.

  • 19 lung enriched genes
  • Most group enriched genes share expression with testis
  • 183 genes defined as elevated in the lung
  • Most elevated genes encode secreted proteins

Figure 1. The distribution of all genes across the five categories based on transcript abundance in lung as well as in all other tissues.

183 genes show some level of elevated expression in the lung compared to other tissues. The three categories of genes with elevated expression in lung compared to other organs are shown in Table 1.

Table 1. The genes with elevated expression in lung


Number of genes


Tissue enriched 19 At least five-fold higher mRNA levels in a particular tissue as compared to all other tissues
Group enriched 44 At least five-fold higher mRNA levels in a group of 2-7 tissues
Tissue enhanced 120 At least five-fold higher mRNA levels in a particular tissue as compared to average levels in all tissues
Total 183 Total number of elevated genes in lung

Table 2. The 12 genes with the highest level of enriched expression in lung. "Predicted localization" shows the classification of each gene into three main classes: Secreted, Membrane, and Intracellular, where the latter consists of genes without any predicted membrane and secreted features. "mRNA (tissue)" shows the transcript level as TPM values, TS-score (Tissue Specificity score) corresponds to the score calculated as the fold change to the second highest tissue.



Predicted localization

mRNA (tissue)


SFTPA1 surfactant protein A1 Secreted 10520.7 1177
SFTPB surfactant protein B Intracellular,Secreted 5334.1 911
SFTPC surfactant protein C Intracellular 25336.5 832
SFTPA2 surfactant protein A2 Secreted 15999.4 446
SCGB3A2 secretoglobin, family 3A, member 2 Intracellular,Secreted 1862.5 187
AGER advanced glycosylation end product-specific receptor Membrane,Secreted 891.2 181
SFTPD surfactant protein D Secreted 783.7 56
NAPSA napsin A aspartic peptidase Secreted 1413.9 14
MS4A15 membrane-spanning 4-domains, subfamily A, member 15 Membrane 71.5 11
SFTA2 surfactant associated 2 Intracellular,Secreted 59.9 11
LRRN4 leucine rich repeat neuronal 4 Membrane 13.6 11
SLC34A2 solute carrier family 34 (type II sodium/phosphate cotransporter), member 2 Membrane 1127.5 10

Some of the proteins predicted to be membrane-spanning are intracellular, e.g. in the Golgi or mitochondrial membranes, and some of the proteins predicted to be secreted can potentially be retained in a compartment belonging to the secretory pathway, such as the ER, or remain attached to the outer face of the cell membrane by a GPI anchor.

The lung transcriptome

An analysis of the expression levels of each gene makes it possible to calculate the relative mRNA pool for each of the categories. The analysis shows that 77% of the mRNA molecules in the lung correspond to housekeeping genes and only 11% of the mRNA pool corresponds to genes categorized to be either lung enriched, group enriched or, lung enhanced. Thus, most of the transcriptional activity in the lung relates to proteins with presumed housekeeping functions as they are found in all tissues and cells analyzed.

Gene Ontology-based analysis of all the 183 genes elevated in lung indicates a clear overrepresentation of proteins associated with respiratory gaseous exchange, cilium movement and surfactant homeostasis. A majority of the 183 genes encodes for secreted proteins.

Protein expression of genes elevated in lung

In-depth analysis of the genes elevated in lung using antibody-based protein profiling allowed us to create a map of where the corresponding proteins are expressed within the lung, including pneumocytes, cilliated and mucus secreting cells in the respiratory mucosa and endothelial cells.

Proteins specifically expressed in pneumocytes of the lung

The pneumocytes make up the alveolar structure and are essential for normal respiration. Pneumocytes produce surfactant, a liquid lowering surface tension, which is crucial for the gaseous exchange between air and blood. The surfactant is also important for protecting the lung from infections. Examples of genes expressed in type II pneumocytes responsible for production and maintenance of surfactant include SFTPA1, SFTPC and NAPSA.

Proteins specifically expressed in macrophages of the lung

Airborne microorganisms entering the lungs are digested and destroyed by macrophages, which play an important role in the host defense. Examples of proteins expressed in macrophages include MRC1, which mediates endocytosis of pathogenic viruses, bacteria and fungi; MARCO, a scavenger receptor part of the innate antimicrobial immune system, that may bind both Gram-negative and Gram-positive bacteria; and MCEMP1, a protein with unknown function suggested to be expressed in mast cells.

Proteins specifically expressed in ciliated cells of the lung

Other cells important to free the airways from inhaled contaminants are the ciliated cells, which are present along bronchi. One example of a protein expressed in ciliated cells is DNAH5, a dynein protein with ATPase activity. It functions as a force-generating protein, which will induce the power stroke in cilia.

Proteins specifically expressed in mucus-secreting cells of the lung

Mucus-secreting cells are present in both bronchial epithelium and peribronchial glands. The secreted mucus is important for maintaining a suitable environment for ciliary function and protection against airborne infectious agents and solid particles. One example of a protein expressed in mucus-secreting cells is SCGB1A1, implicated in anti-inflammation and epithelial regeneration after oxidant-induced injury. Defects in SCGB1A1 are associated with asthma.

Proteins specifically expressed in endothelial cells of the lung

Up to 30% of the cells in lung are endothelial cells, outlining the alveoli and participating in the gaseous exchange. One example of a protein expressed in lung endothelial cells is PRX. PRX encodes a protein suggested to be required for maintenance of peripheral nerve myelin sheath, also playing a role in axon–glial interaction. Distinct expression in endothelial cells in lung has previously not been described.

Genes shared between lung and other tissues

There are 44 group enriched genes expressed in the lung. Group enriched genes are defined as genes showing a 5-fold higher average level of mRNA expression in a group of 2-7 tissues, including lung, compared to all other tissues.

In order to illustrate the relation of lung tissue to other tissue types, a network plot was generated, displaying the number of commonly expressed genes between different tissue types.

Figure 2. An interactive network plot of the lung enriched and group enriched genes connected to their respective enriched tissues (grey circles). Red nodes represent the number of lung enriched genes and orange nodes represent the number of genes that are group enriched. The sizes of the red and orange nodes are related to the number of genes displayed within the node. Each node is clickable and results in a list of all enriched genes connected to the highlighted edges. The network is limited to group enriched genes in combinations of up to 3 tissues, but the resulting lists show the complete set of group enriched genes in the particular tissue.

Lung shares group enriched genes with fallopian tube (5 genes) and testis (8 genes). A Gene Ontology (GO)-based analysis of these shared genes shows enrichment for genes related to cilia function and movement. One example of a protein expressed in both lung and fallopian tube is mesothelin (MSLN), a cell-surface protein that may function as a cell adhesion protein.

MSLN - lung
MSLN - fallopian tube

Lung function

The lungs are one of the largest organs in the human body, and responsible for supplying capillaries with oxygen, which will be transported to other organs throughout the body. When breathing, air is transported from the nose or mouth via trachea, to bronchi, bronchioli and finally alveoli of the lung, where the gaseous exchange occurs. The oxygen is exchanged with carbon dioxide, which is transported back in the opposite direction and exhaled.

The physiological function of the lung is regulated by a complex molecular concert of specialized cell types, such as pneumocytes, macrophages, ciliated cells, mucus-secreting cells and endothelial cells.

Lung histology

The pulmonary alveolus is responsible for the gaseous exchange, composed of a continuous layer of epithelial cells overlying a thin interstitium. Two morphologically distinct cells - type I and type II pneumocytes line the alveoli. Alveolar macrophages are also present on the epithelial surface. The interstitium contains capillaries involved in gaseous exchange, as well as connective tissue and a variety of cells involved in alveolar shape and defense. The trachea, bronchi and bronchioli are air-filled branching tubes that include basal cells, neuroendocrine cells, ciliated cells, serous cells, Clara cells and goblet cells.

The histology of human lung including detailed images and information about the different cell types can be viewed in the Protein Atlas Histology Dictionary.


Here, the protein-coding genes expressed in the lung are described and characterized, together with examples of immunohistochemically stained tissue sections that visualize protein expression patterns of proteins that correspond to genes with elevated expression in the lung.

Transcript profiling and RNA-data analyses based on normal human tissues have been described previously (Fagerberg et al., 2013). Analyses of mRNA expression including over 99% of all human protein-coding genes was performed using deep RNA sequencing of 172 individual samples corresponding to 37 different human normal tissue types. RNA sequencing results of 9 fresh frozen tissues representing normal lung was compared to 163 other tissue samples corresponding to 36 tissue types, in order to determine genes with elevated expression in lung. A tissue-specific score, defined as the ratio between mRNA levels in lung compared to the mRNA levels in all other tissues, was used to divide the genes into different categories of expression. These categories include: genes with elevated expression in lung, genes expressed in all tissues, genes with a mixed expression pattern, genes not expressed in lung, and genes not expressed in any tissue. Genes with elevated expression in lung were further sub-categorized as i) genes with enriched expression in lung, ii) genes with group enriched expression including lung and iii) genes with enhanced expression in lung.

Human tissue samples used for protein and mRNA expression analyses were collected and handled in accordance with Swedish laws and regulation and obtained from the Department of Pathology, Uppsala University Hospital, Uppsala, Sweden as part of the sample collection governed by the Uppsala Biobank. All human tissue samples used in the present study were anonymized in accordance with approval and advisory report from the Uppsala Ethical Review Board.

Uhlén M et al, 2015. Tissue-based map of the human proteome. Science
PubMed: 25613900 DOI: 10.1126/science.1260419

Yu NY et al, 2015. Complementing tissue characterization by integrating transcriptome profiling from the Human Protein Atlas and from the FANTOM5 consortium. Nucleic Acids Res.
PubMed: 26117540 DOI: 10.1093/nar/gkv608

Fagerberg L et al, 2014. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics.
PubMed: 24309898 DOI: 10.1074/mcp.M113.035600

Lindskog C et al, 2014. The lung-specific proteome defined by integration of transcriptomics and antibody-based profiling. FASEB J.
PubMed: 25169055 DOI: 10.1096/fj.14-254862

Histology dictionary - the lung