The cell line transcriptome

The word transcriptome refers to the full set of RNA molecules that are transcribed from the genome in a population of cells, or in a specific cell, at a given time point. In contrast to the genome, which is characterized by its stability across different cell types within an organism, the transcriptome varies greatly between cell types, developmental stages, and in response to internal or external cues. The plastic nature of the transcriptome, and its potential to serve as a proxy for cellular identity and diversity, makes it appealing to study and the advances in high-throughput technologies has made it possible to analyze RNA expression in great detail.

In the Cell Atlas, the expression of 19670 protein-coding genes are analyzed by RNA sequencing of mRNA extracted from unsynchronized log phase growing cells. The expression level of gene-specific transcripts are given as normalized expression (NX) values, and transcripts with NX values ≥1 are considered as detected. Genes are then classified according to the specificity and distribution of mRNA expression across a panel of 69 different human cell lines (Figure 1, Thul PJ et al. (2017)).

The Cell Atlas presents RNA expression for 98% (n=19242) of all protein-coding human genes, which can be used for various analyses of transcriptomics, as well as a resource for selection of cell lines expressing particular genes of interest.

A diversity of cell lines

The 69 different cell lines used in the Cell Atlas have been selected to represent various cell populations in different tissue types and organs of the human body. The selection also aims at mimicking to the origin and phenotype of solid cancer types represented in the Pathology Atlas (Uhlen et al., 2017), abut with an additional emphasis on cancer cell types in the hematopoietic and immune systems. In addition to cancer-derived cell lines, there is a number of cell lines that have been generated through in vitro protocols for immortalization of normal cells, some primary cell lines and one type of induced pluripotent stem cells. Details regarding the different cell lines can be found here.

Cell lines are adapted to cultivation in vitro and many of the cell lines used in the Cell Atlas are human cancer cell lines. While this in some aspects limit their ressemblance to normal human cells in the context of tissues and organs, unbiased hierarchical clustering of global RNA expression (Figure 1) shows that the cell lines cluster well in agreement with similarities in origin and phenotype of the cancer cells from which thy are derived. Groups of related cell lines, such as the immortalized and transformed fibroblastic cell lines (BJ derivatives), the glioma cell lines(U-138 MG and U-251 MG), the melanoma cell lines (WM-115 and SK-MEL-30), the breast cancer cell lines (SK-BR-3, MCF7 and T47d) and the endothelial cell lines (TIME and HUVEC), cluster closely together. At the highest level of separation, cell lines that grow in solution and also represent hematopoietic and lymphoid cell systems cluster together and separate into two major clusters dependent on their myeloid or lymphoid origin/phenotype.


Figure 1. Hierarchical clustering based on RNA sequencing data for the 69 cell lines. The color of the cell line name represents its origin: Grey - Lymphoid, Light red - Muscle, Dark red - Myeloid, Bright green - Mesenchymal, Green - Pancreas, Dark green - Lung, Yellow bold - Brain, Yellow thin - Eye, Light pink - Proximal digestive tract, Pink - Female reproductive system, Dark pink - Endothelial, Beige - Skin, Orange - Kidney and urinary bladder, Blue - Gastrointestinal tract, Light blue - Male reproductive system, Light purple - Liver and gallbladder.

Specificity of RNA expression

Approximately one third of all protein-coding genes (n=6186) are expressed in all cell lines, which is indicative of roles in fundamental cellular functions, or 'house-keeping' functions, for the corresponding proteins (Figure 2). In contrast, 2% (n=428) of all protein-coding genes were not detected in any of the analyzed cell lines, suggesting that the corresponding proteins are only expressed in unrepresented cell types, during specific developmental stages or under specific conditions, such as cellular stress. 1640 of the protein-coding genes display high RNA expression in a single cell line, while 1517 display high RNA expression in a smaller group of cell lines, relative to any of the other cell lines. 8849 of the protein-coding genes show elevated RNA expression in a group of cell lines compared to the average expression in all other cell lines. Table 1 shows the distribution of genes within these expression categoried for each of the analyzed cell lines.

Figure 2. Pie chart showing the number of genes in the different RNA-based categories of gene expression in the panel of cell lines.

Table 1. Table showing the number of detected genes per cell line based on RNA sequencing (NX ≥1), and the number of genes in the enriched and enhanced categories.

Cell line Detectable genes Enriched genes Group enriched genes Enhanced genes
A-431 11378 8 26 270
A549 11761 7 35 324
AF22 11829 26 82 535
AN3-CA 11349 20 31 354
ASC diff 11377 31 65 571
ASC TERT1 11413 2 37 481
BEWO 11780 54 114 620
BJ 11655 3 17 276
BJ hTERT+ 11579 14 36 403
BJ hTERT+ SV40 Large T+ 11316 0 7 120
BJ hTERT+ SV40 Large T+ RasG12V 11355 1 6 142
CACO-2 11525 20 81 430
CAPAN-2 12003 12 59 530
Daudi 10312 13 74 395
EFO-21 12273 22 63 448
fHDF/TERT166 11440 8 22 390
GAMG 11829 14 31 312
HaCaT 11766 19 79 475
HAP1 11297 6 35 258
HBEC3-KT 11148 6 27 251
HBF TERT88 10878 0 2 99
HDLM-2 11134 85 74 590
HEK 293 11911 12 35 407
HEL 11166 54 113 483
HeLa 11877 18 41 407
Hep G2 11370 97 125 472
HHSteC 11309 6 29 322
HL-60 10202 3 29 233
HMC-1 11534 71 101 675
HSkMC 11799 15 68 491
hTCEpi 11291 19 51 377
hTEC/SVTERT24-B 11321 2 8 164
hTERT-HME1 10823 3 22 237
hTERT-RPE1 11674 7 19 406
HUVEC TERT2 11101 16 64 346
JURKAT 11374 7 58 305
K-562 10735 22 75 331
Karpas-707 11095 37 89 668
LHCN-M2 11209 11 25 257
MCF7 11380 11 36 456
MOLT-4 10412 20 60 280
NB-4 11275 28 83 517
NTERA-2 12345 45 120 597
OE19 11290 55 108 570
PC-3 11747 6 40 338
REH 10918 20 51 349
RH-30 11197 37 42 370
RPMI-8226 11107 35 93 507
RPTEC TERT1 11753 34 67 451
RT4 11634 39 83 533
SCLC-21H 12411 110 182 819
SH-SY5Y 12198 54 129 660
SiHa 11420 4 26 237
SK-BR-3 11252 36 64 559
SK-MEL-30 11420 31 44 376
SuSa 12401 20 99 487
T-47d 11779 20 59 504
THP-1 11539 38 80 455
TIME 11372 5 52 452
U-138 MG 11448 7 13 257
U-2 OS 12631 39 73 439
U-2197 11396 19 34 375
U-251 MG 11110 2 10 140
U-266/70 11678 51 108 737
U-266/84 11075 30 84 485
U-698 10250 21 65 392
U-87 MG 11817 14 33 416
U-937 10954 21 74 411
WM-115 11707 17 42 362

The cell line transcriptomes have been compared to the bulk transcriptomes of 37 different normal tissues and organs analyzed in the Tissue Atlas (Uhlén M et al. (2015)).There are 65 protein-coding genes that are only expressed in the panel of cell lines and not detected in any of the analyzed normal tissue types, while there are 277 protein-coding genes that are only expressed in normal human tissues and not detected in any of the analyzed cell lines. Several of the proteins in the latter category encode proteins that have functions associated with differentiated cells in specialized tissues or subcompartments of tissues, which are not represented in the cell line panel. One example is ADAM30, which is expressed in spermatids of human testis.

  • 65 genes found only in cell lines and not tissues
  • 277 genes found only in tissues and not cell lines

Cell line enriched genes

Overall, there is a large degree of agreement between the RNA expression categories in cell lines and tissues. A majority of the cell line enriched genes, defined as having at least four times higher RNA expression in a single cell line compared to any other cell line, also belong to the tissue elevated gene expression categories (tissue enriched, group enriched and tissue enhanced). For example, the secreted proteins AHSG and ALB that are only expressed in normal liver tissue, are also highly enriched in the liver derived cell line Hep-G2, where immunofluorescent analysis shows localizations to the secretory pathway. The transcription factor HOXB13 that shows expression inthe prostate, colon and rectum, is also enriched in the prostate-derived cell line PC-3, where it is localized to the nucleoplasm. The adhesion glycoprotein CDH15 that is enriched in skeletal muscle tissue is also enriched in the sarcoma cell line RH-30, with some expression in the other sarcoma cell line LHCN-M2. The enzyme TYR that is exclusively expressed in skin is highly enriched in the melanoma-derived skin cell line SK-MEL-30, while the epidermal growth factor receptor EGFR that is enriched in female tissues and skin, is enriched in the other skin-derived cell line A-431. The expression pattern in normal tissues and function of these proteins relate to the specific traits and functions of the corresponding normal tissue type and organ.


AHSG

ALB

HOXB13

AHSG - Hep G2

ALB - Hep G2

HOXB13 - PC-3

CDH15

TYR

EGFR

CDH15 - RH-30

TYR - SK-MEL-30

EGFR - A-431

Figure 3. Examples of proteins with enriched expression in a cell line and the corresponding tissue of origin. The proteins are AHSG, ALB, HOXB13, CDH15, TYR, and EGFR. The immunohistochemical (IHC) staining shows the protein expression pattern in tissue in brown. The immunofluorescent (IF) staining shows the protein subcellular expression pattern in cell lines in green. The nucleus and microtubules are shown in blue and red respectively in the IF images.

Relevant links and publications

Clegg JS., Properties and metabolism of the aqueous cytoplasm and its boundaries. Am J Physiol. (1984)
PubMed: 6364846 

Luby-Phelps K., The physical chemistry of cytoplasm and its influence on cell function: an update. Mol Biol Cell. (2013)
PubMed: 23989722 DOI: 10.1091/mbc.E12-08-0617

Luby-Phelps K., Cytoarchitecture and physical properties of cytoplasm: volume, viscosity, diffusion, intracellular surface area. Int Rev Cytol. (2000)
PubMed: 10553280 

Ellis RJ., Macromolecular crowding: obvious but underappreciated. Trends Biochem Sci. (2001)
PubMed: 11590012 

Bright GR et al., Fluorescence ratio imaging microscopy: temporal and spatial measurements of cytoplasmic pH. J Cell Biol. (1987)
PubMed: 3558476 

Kopito RR., Aggresomes, inclusion bodies and protein aggregation. Trends Cell Biol. (2000)
PubMed: 11121744 

Aizer A et al., Intracellular trafficking and dynamics of P bodies. Prion. (2008)
PubMed: 19242093 

Carcamo WC et al., Molecular cell biology and immunobiology of mammalian rod/ring structures. Int Rev Cell Mol Biol. (2014)
PubMed: 24411169 DOI: 10.1016/B978-0-12-800097-7.00002-6

Lang F., Mechanisms and significance of cell volume regulation. J Am Coll Nutr. (2007)
PubMed: 17921474 

Thul PJ et al., A subcellular map of the human proteome. Science. (2017)
PubMed: 28495876 DOI: 10.1126/science.aal3321

Uhlén M et al., Tissue-based map of the human proteome. Science (2015)
PubMed: 25613900 DOI: 10.1126/science.1260419

Cellosaurus