The cell line transcriptome
The word transcriptome refers to the full set of RNA molecules that are transcribed from the genome in a population of cells, or in a specific cell, at a given time point. In contrast to the genome, which is characterized by its stability across different cell types within an organism, the transcriptome varies greatly between cell types, developmental stages, and in response to internal or external cues. The plastic nature of the transcriptome, and its potential to serve as a proxy for cellular identity and diversity, makes it appealing to study and the advances in high-throughput technologies has made it possible to analyze RNA expression in great detail.
In the Cell Atlas, the expression of 19670 protein-coding genes are analyzed by RNA sequencing of mRNA extracted from unsynchronized log phase growing cells. The expression level of gene-specific transcripts are given as normalized expression (NX) values, and transcripts with NX values ≥1 are considered as detected. Genes are then classified according to the specificity and distribution of mRNA expression across a panel of 69 different human cell lines (Figure 1, Thul PJ et al. (2017)).
The Cell Atlas presents RNA expression for 98% (n=19242) of all protein-coding human genes, which can be used for various analyses of transcriptomics, as well as a resource for selection of cell lines expressing particular genes of interest.
A diversity of cell lines
The 69 different cell lines used in the Cell Atlas have been selected to represent various cell populations in different tissue types and organs of the human body. The selection also aims at mimicking to the origin and phenotype of solid cancer types represented in the Pathology Atlas (Uhlen et al., 2017), abut with an additional emphasis on cancer cell types in the hematopoietic and immune systems. In addition to cancer-derived cell lines, there is a number of cell lines that have been generated through in vitro protocols for immortalization of normal cells, some primary cell lines and one type of induced pluripotent stem cells. Details regarding the different cell lines can be found here.
Cell lines are adapted to cultivation in vitro and many of the cell lines used in the Cell Atlas are human cancer cell lines. While this in some aspects limit their ressemblance to normal human cells in the context of tissues and organs, unbiased hierarchical clustering of global RNA expression (Figure 1) shows that the cell lines cluster well in agreement with similarities in origin and phenotype of the cancer cells from which thy are derived. Groups of related cell lines, such as the immortalized and transformed fibroblastic cell lines (BJ derivatives), the glioma cell lines(U-138 MG and U-251 MG), the melanoma cell lines (WM-115 and SK-MEL-30), the breast cancer cell lines (SK-BR-3, MCF7 and T47d) and the endothelial cell lines (TIME and HUVEC), cluster closely together. At the highest level of separation, cell lines that grow in solution and also represent hematopoietic and lymphoid cell systems cluster together and separate into two major clusters dependent on their myeloid or lymphoid origin/phenotype.
Figure 1. Hierarchical clustering based on RNA sequencing data for the 69 cell lines. The color of the cell line name represents its origin: Grey - Lymphoid, Light red - Muscle, Dark red - Myeloid, Bright green - Mesenchymal, Green - Pancreas, Dark green - Lung, Yellow bold - Brain, Yellow thin - Eye, Light pink - Proximal digestive tract, Pink - Female reproductive system, Dark pink - Endothelial, Beige - Skin, Orange - Kidney and urinary bladder, Blue - Gastrointestinal tract, Light blue - Male reproductive system, Light purple - Liver and gallbladder.
Specificity of RNA expression
Approximately one third of all protein-coding genes (n=6186) are expressed in all cell lines, which is indicative of roles in fundamental cellular functions, or 'house-keeping' functions, for the corresponding proteins (Figure 2). In contrast, 2% (n=428) of all protein-coding genes were not detected in any of the analyzed cell lines, suggesting that the corresponding proteins are only expressed in unrepresented cell types, during specific developmental stages or under specific conditions, such as cellular stress. 1640 of the protein-coding genes display high RNA expression in a single cell line, while 1517 display high RNA expression in a smaller group of cell lines, relative to any of the other cell lines. 8849 of the protein-coding genes show elevated RNA expression in a group of cell lines compared to the average expression in all other cell lines. Table 1 shows the distribution of genes within these expression categoried for each of the analyzed cell lines.
Figure 2. Pie chart showing the number of genes in the different RNA-based categories of gene expression in the panel of cell lines.
Table 1. Table showing the number of detected genes per cell line based on RNA sequencing (NX ≥1), and the number of genes in the enriched and enhanced categories.
The cell line transcriptomes have been compared to the bulk transcriptomes of 37 different normal tissues and organs analyzed in the Tissue Atlas (Uhlén M et al. (2015)).There are 65 protein-coding genes that are only expressed in the panel of cell lines and not detected in any of the analyzed normal tissue types, while there are 277 protein-coding genes that are only expressed in normal human tissues and not detected in any of the analyzed cell lines. Several of the proteins in the latter category encode proteins that have functions associated with differentiated cells in specialized tissues or subcompartments of tissues, which are not represented in the cell line panel. One example is ADAM30, which is expressed in spermatids of human testis.
Cell line enriched genes
Overall, there is a large degree of agreement between the RNA expression categories in cell lines and tissues. A majority of the cell line enriched genes, defined as having at least four times higher RNA expression in a single cell line compared to any other cell line, also belong to the tissue elevated gene expression categories (tissue enriched, group enriched and tissue enhanced). For example, the secreted proteins AHSG and ALB that are only expressed in normal liver tissue, are also highly enriched in the liver derived cell line Hep-G2, where immunofluorescent analysis shows localizations to the secretory pathway. The transcription factor HOXB13 that shows expression inthe prostate, colon and rectum, is also enriched in the prostate-derived cell line PC-3, where it is localized to the nucleoplasm. The adhesion glycoprotein CDH15 that is enriched in skeletal muscle tissue is also enriched in the sarcoma cell line RH-30, with some expression in the other sarcoma cell line LHCN-M2. The enzyme TYR that is exclusively expressed in skin is highly enriched in the melanoma-derived skin cell line SK-MEL-30, while the epidermal growth factor receptor EGFR that is enriched in female tissues and skin, is enriched in the other skin-derived cell line A-431. The expression pattern in normal tissues and function of these proteins relate to the specific traits and functions of the corresponding normal tissue type and organ.
Figure 3. Examples of proteins with enriched expression in a cell line and the corresponding tissue of origin. The proteins are AHSG, ALB, HOXB13, CDH15, TYR, and EGFR. The immunohistochemical (IHC) staining shows the protein expression pattern in tissue in brown. The immunofluorescent (IF) staining shows the protein subcellular expression pattern in cell lines in green. The nucleus and microtubules are shown in blue and red respectively in the IF images.
Relevant links and publications
Clegg JS., Properties and metabolism of the aqueous cytoplasm and its boundaries. Am J Physiol. (1984)